2023-09-29 02:42:21

by Tony Luck

[permalink] [raw]
Subject: [PATCH v6 0/8] Add support for Sub-NUMA cluster (SNC) systems

The Sub-NUMA cluster feature on some Intel processors partitions
the CPUs that share an L3 cache into two or more sets. This plays
havoc with the Resource Director Technology (RDT) monitoring features.
Prior to this patch Intel has advised that SNC and RDT are incompatible.

Some of these CPU support an MSR that can partition the RMID
counters in the same way. This allows for monitoring features
to be used (with the caveat that memory accesses between different
SNC NUMA nodes may still not be counted accuratlely.

Note that this patch series improves resctrl reporting considerably
on systems with SNC enabled, but there will still be some anomalies
for processes accessing memory from other sub-NUMA nodes.

Signed-off-by: Tony Luck <[email protected]>

---

Summary of changes since v5 - see each patch commit for more specifics

Rebased to v6.6-rc3

0001 Define "scope" enum with values 2, 3 for caches to simplify some
code (but sanity check before each such usage).
Better warning messages when scope lookup fails

0002 New patch so that some code can be shared between looking up
control and monitor domains

0003 Spell "mondomains" as "mon_domains" and be consistent with all
the other "mon" identifiers also having similar "_".
Don't leave control stuff with old names, change those too
so now have ctrl_scope, ctrl_domains, etc.

0004 Use infrastructure from 0002 to have a common rdt_find_domain()
function for both types of domain structure.
0003 was using same "rdt_domain" structure for both control
and monitor domains. Divide it into rdt_ctrl_domain and
rdt_mon_domain structures with just the fields they need.
Ditto for rdt_hw_domain. Also split and rename many support
functions and macros.
Lots of "fir tree local declaration order" changes because
lengths of typenames changed.

0005 Better commit description

0006 Better commit and code comments

0007 More explanations in commit and code comments.
Use consistent naming for "snc_*()" functions.

Patch to update selftests dropped from this series. Someone else
has taken over that work.

Tony Luck (8):
x86/resctrl: Prepare for new domain scope
x86/resctrl: Prepare to split rdt_domain structure
x86/resctrl: Prepare for different scope for control/monitor
operations
x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
x86/resctrl: Add node-scope to the options for feature scope
x86/resctrl: Introduce snc_nodes_per_l3_cache
x86/resctrl: Sub NUMA Cluster detection and enable
x86/resctrl: Update documentation with Sub-NUMA cluster changes

Documentation/arch/x86/resctrl.rst | 34 +-
include/linux/resctrl.h | 78 +++--
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 66 ++--
arch/x86/kernel/cpu/resctrl/core.c | 380 +++++++++++++++++-----
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 52 +--
arch/x86/kernel/cpu/resctrl/monitor.c | 58 ++--
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 14 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 131 ++++----
9 files changed, 567 insertions(+), 247 deletions(-)


base-commit: 6465e260f48790807eef06b583b38ca9789b6072
--
2.41.0


2023-09-29 03:17:38

by Tony Luck

[permalink] [raw]
Subject: [PATCH v6 3/8] x86/resctrl: Prepare for different scope for control/monitor operations

Resctrl assumes that control and monitor operations on a resource are
performed at the same scope.

Prepare for systems that use different scope (specifically L3 scope for
cache control and NODE scope for cache occupancy and memory bandwidth
monitoring).

Create separate domain lists for control and monitor operations.

Note that errors during initialization of either control or monitor
functions on a domain would previously result in that domain being
excluded from both control and monitor operations. Now the domains are
allocated independently it is no longer required to disable both control
and monitor operations if either fail.

Signed-off-by: Tony Luck <[email protected]>

---

Changes since v5:

Commit comment: s/Existing resctrl assumes/Resctrl assumes/

Many new names. Put an underscore in "mon_domains" for consistency
with "mon_scope". Do same with all the other "mon" changes.

Also rename "scope" to "ctrl_scope", "domains" to "ctrl_domains"
and all the assocated functions and macros.
---
include/linux/resctrl.h | 18 +-
arch/x86/kernel/cpu/resctrl/internal.h | 4 +-
arch/x86/kernel/cpu/resctrl/core.c | 198 ++++++++++++++++------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 12 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 2 +-
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 4 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 54 +++---
7 files changed, 200 insertions(+), 92 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index a583fa88ea5a..0af5c5aa5a6f 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -163,10 +163,12 @@ enum resctrl_scope {
* @alloc_capable: Is allocation available on this machine
* @mon_capable: Is monitor feature available on this machine
* @num_rmid: Number of RMIDs available
- * @scope: Scope of this resource
+ * @ctrl_scope: Scope of this resource for control functions
+ * @mon_scope: Scope of this resource for monitor functions
* @cache: Cache allocation related data
* @membw: If the component has bandwidth controls, their properties.
- * @domains: All domains for this resource
+ * @ctrl_domains: Control domains for this resource
+ * @mon_domains: Monitor domains for this resource
* @name: Name to use in "schemata" file.
* @data_width: Character width of data when displaying
* @default_ctrl: Specifies default cache cbm or memory B/W percent.
@@ -181,10 +183,12 @@ struct rdt_resource {
bool alloc_capable;
bool mon_capable;
int num_rmid;
- enum resctrl_scope scope;
+ enum resctrl_scope ctrl_scope;
+ enum resctrl_scope mon_scope;
struct resctrl_cache cache;
struct resctrl_membw membw;
- struct list_head domains;
+ struct list_head ctrl_domains;
+ struct list_head mon_domains;
char *name;
int data_width;
u32 default_ctrl;
@@ -230,8 +234,10 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,

u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
-int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
-void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
+int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d);

/**
* resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 85ceaf9a31ac..e9a2a8993d14 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -511,8 +511,8 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn);
int rdtgroup_kn_mode_restrict(struct rdtgroup *r, const char *name);
int rdtgroup_kn_mode_restore(struct rdtgroup *r, const char *name,
umode_t mask);
-struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
- struct list_head **pos);
+struct rdt_domain_hdr *rdt_find_domain(struct list_head *h, int id,
+ struct list_head **pos);
ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off);
int rdtgroup_schemata_show(struct kernfs_open_file *of,
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 05369add4578..7ef178fb7c77 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -57,7 +57,8 @@ static void
mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
struct rdt_resource *r);

-#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.domains)
+#define ctrl_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.ctrl_domains)
+#define mon_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.mon_domains)

struct rdt_hw_resource rdt_resources_all[] = {
[RDT_RESOURCE_L3] =
@@ -65,8 +66,10 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_L3,
.name = "L3",
- .scope = RESCTRL_L3_CACHE,
- .domains = domain_init(RDT_RESOURCE_L3),
+ .ctrl_scope = RESCTRL_L3_CACHE,
+ .mon_scope = RESCTRL_L3_CACHE,
+ .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_L3),
+ .mon_domains = mon_domain_init(RDT_RESOURCE_L3),
.parse_ctrlval = parse_cbm,
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
@@ -79,8 +82,8 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_L2,
.name = "L2",
- .scope = RESCTRL_L2_CACHE,
- .domains = domain_init(RDT_RESOURCE_L2),
+ .ctrl_scope = RESCTRL_L2_CACHE,
+ .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_L2),
.parse_ctrlval = parse_cbm,
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
@@ -93,8 +96,8 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_MBA,
.name = "MB",
- .scope = RESCTRL_L3_CACHE,
- .domains = domain_init(RDT_RESOURCE_MBA),
+ .ctrl_scope = RESCTRL_L3_CACHE,
+ .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_MBA),
.parse_ctrlval = parse_bw,
.format_str = "%d=%*u",
.fflags = RFTYPE_RES_MB,
@@ -105,8 +108,8 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_SMBA,
.name = "SMBA",
- .scope = RESCTRL_L3_CACHE,
- .domains = domain_init(RDT_RESOURCE_SMBA),
+ .ctrl_scope = RESCTRL_L3_CACHE,
+ .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_SMBA),
.parse_ctrlval = parse_bw,
.format_str = "%d=%*u",
.fflags = RFTYPE_RES_MB,
@@ -352,7 +355,7 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
{
struct rdt_domain *d;

- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
/* Find the domain that contains this CPU */
if (cpumask_test_cpu(cpu, &d->cpu_mask))
return d;
@@ -384,29 +387,39 @@ void rdt_ctrl_update(void *arg)
}

/*
- * rdt_find_domain - Find a domain in a resource that matches input resource id
+ * rdt_find_domain - Find a domain in one of a resource domain lists.
*
- * Search resource r's domain list to find the resource id. If the resource
- * id is found in a domain, return the domain. Otherwise, if requested by
- * caller, return the first domain whose id is bigger than the input id.
+ * Search the list to find the resource id. If the resource id is found
+ * in a domain, return the domain. Otherwise, if requested by caller,
+ * return the first domain whose id is bigger than the input id.
* The domain list is sorted by id in ascending order.
+ *
+ * If an existing domain in the resource r's domain list matches the cpu's
+ * resource id, add the cpu in the domain.
+ *
+ * Otherwise, caller will allocate a new domain and insert into the right position
+ * in the domain list sorted by id in ascending order.
+ *
+ * The order in the domain list is visible to users when we print entries
+ * in the schemata file and schemata input is validated to have the same order
+ * as this list.
*/
-struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
- struct list_head **pos)
+struct rdt_domain_hdr *rdt_find_domain(struct list_head *h, int id,
+ struct list_head **pos)
{
- struct rdt_domain *d;
+ struct rdt_domain_hdr *d;
struct list_head *l;

if (id < 0)
return ERR_PTR(-ENODEV);

- list_for_each(l, &r->domains) {
- d = list_entry(l, struct rdt_domain, hdr.list);
+ list_for_each(l, h) {
+ d = list_entry(l, struct rdt_domain_hdr, list);
/* When id is found, return its domain. */
- if (id == d->hdr.id)
+ if (id == d->id)
return d;
/* Stop searching when finding id's position in sorted list. */
- if (id < d->hdr.id)
+ if (id < d->id)
break;
}

@@ -500,37 +513,27 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
return -EINVAL;
}

-/*
- * domain_add_cpu - Add a cpu to a resource's domain list.
- *
- * If an existing domain in the resource r's domain list matches the cpu's
- * resource id, add the cpu in the domain.
- *
- * Otherwise, a new domain is allocated and inserted into the right position
- * in the domain list sorted by id in ascending order.
- *
- * The order in the domain list is visible to users when we print entries
- * in the schemata file and schemata input is validated to have the same order
- * as this list.
- */
-static void domain_add_cpu(int cpu, struct rdt_resource *r)
+static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
{
- int id = get_domain_id_from_scope(cpu, r->scope);
+ int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
struct list_head *add_pos = NULL;
struct rdt_hw_domain *hw_dom;
+ struct rdt_domain_hdr *hdr;
struct rdt_domain *d;
int err;

if (id < 0) {
- pr_warn_once("Can't find domain id for CPU:%d scope:%d for resource %s\n",
- cpu, r->scope, r->name);
+ pr_warn_once("Can't find control domain id for CPU:%d scope:%d for resource %s\n",
+ cpu, r->ctrl_scope, r->name);
return;
}
- d = rdt_find_domain(r, id, &add_pos);
- if (IS_ERR(d)) {
- pr_warn("Couldn't find cache id for CPU %d\n", cpu);
+
+ hdr = rdt_find_domain(&r->ctrl_domains, id, &add_pos);
+ if (IS_ERR(hdr)) {
+ pr_warn("Couldn't find control scope id=%d for CPU %d\n", id, cpu);
return;
}
+ d = container_of(hdr, struct rdt_domain, hdr);

if (d) {
cpumask_set_cpu(cpu, &d->cpu_mask);
@@ -549,44 +552,101 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)

rdt_domain_reconfigure_cdp(r);

- if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
+ if (domain_setup_ctrlval(r, d)) {
domain_free(hw_dom);
return;
}

- if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+ list_add_tail(&d->hdr.list, add_pos);
+
+ err = resctrl_online_ctrl_domain(r, d);
+ if (err) {
+ list_del(&d->hdr.list);
domain_free(hw_dom);
+ }
+}
+
+static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
+{
+ int id = get_domain_id_from_scope(cpu, r->mon_scope);
+ struct list_head *add_pos = NULL;
+ struct rdt_hw_domain *hw_mondom;
+ struct rdt_domain_hdr *hdr;
+ struct rdt_domain *d;
+ int err;
+
+ if (id < 0) {
+ pr_warn_once("Can't find monitor domain id for CPU:%d scope:%d for resource %s\n",
+ cpu, r->mon_scope, r->name);
+ return;
+ }
+
+ hdr = rdt_find_domain(&r->mon_domains, id, &add_pos);
+ if (IS_ERR(hdr)) {
+ pr_warn("Couldn't find monitor scope id=%d for CPU %d\n", id, cpu);
+ return;
+ }
+ d = container_of(hdr, struct rdt_domain, hdr);
+
+ if (d) {
+ cpumask_set_cpu(cpu, &d->cpu_mask);
+ return;
+ }
+
+ hw_mondom = kzalloc_node(sizeof(*hw_mondom), GFP_KERNEL, cpu_to_node(cpu));
+ if (!hw_mondom)
+ return;
+
+ d = &hw_mondom->d_resctrl;
+ d->hdr.id = id;
+ cpumask_set_cpu(cpu, &d->cpu_mask);
+
+ if (arch_domain_mbm_alloc(r->num_rmid, hw_mondom)) {
+ domain_free(hw_mondom);
return;
}

list_add_tail(&d->hdr.list, add_pos);

- err = resctrl_online_domain(r, d);
+ err = resctrl_online_mon_domain(r, d);
if (err) {
list_del(&d->hdr.list);
- domain_free(hw_dom);
+ domain_free(hw_mondom);
}
}

-static void domain_remove_cpu(int cpu, struct rdt_resource *r)
+/*
+ * domain_add_cpu - Add a cpu to either/both resource's domain lists.
+ */
+static void domain_add_cpu(int cpu, struct rdt_resource *r)
+{
+ if (r->alloc_capable)
+ domain_add_cpu_ctrl(cpu, r);
+ if (r->mon_capable)
+ domain_add_cpu_mon(cpu, r);
+}
+
+static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
{
- int id = get_domain_id_from_scope(cpu, r->scope);
+ int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
struct rdt_hw_domain *hw_dom;
+ struct rdt_domain_hdr *hdr;
struct rdt_domain *d;

if (id < 0)
return;

- d = rdt_find_domain(r, id, NULL);
- if (IS_ERR_OR_NULL(d)) {
- pr_warn("Couldn't find cache id for CPU %d\n", cpu);
+ hdr = rdt_find_domain(&r->ctrl_domains, id, NULL);
+ if (IS_ERR_OR_NULL(hdr)) {
+ pr_warn("Couldn't find control scope id=%d for CPU %d\n", id, cpu);
return;
}
+ d = container_of(hdr, struct rdt_domain, hdr);
hw_dom = resctrl_to_arch_dom(d);

cpumask_clear_cpu(cpu, &d->cpu_mask);
if (cpumask_empty(&d->cpu_mask)) {
- resctrl_offline_domain(r, d);
+ resctrl_offline_ctrl_domain(r, d);
list_del(&d->hdr.list);

/*
@@ -599,6 +659,34 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)

return;
}
+}
+
+static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
+{
+ int id = get_domain_id_from_scope(cpu, r->mon_scope);
+ struct rdt_hw_domain *hw_mondom;
+ struct rdt_domain_hdr *hdr;
+ struct rdt_domain *d;
+
+ if (id < 0)
+ return;
+
+ hdr = rdt_find_domain(&r->mon_domains, id, NULL);
+ if (IS_ERR_OR_NULL(hdr)) {
+ pr_warn("Couldn't find scope id=%d for CPU %d\n", id, cpu);
+ return;
+ }
+ d = container_of(hdr, struct rdt_domain, hdr);
+ hw_mondom = resctrl_to_arch_dom(d);
+
+ cpumask_clear_cpu(cpu, &d->cpu_mask);
+ if (cpumask_empty(&d->cpu_mask)) {
+ resctrl_offline_mon_domain(r, d);
+ list_del(&d->hdr.list);
+ domain_free(hw_mondom);
+
+ return;
+ }

if (r == &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl) {
if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {
@@ -613,6 +701,14 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
}
}

+static void domain_remove_cpu(int cpu, struct rdt_resource *r)
+{
+ if (r->alloc_capable)
+ domain_remove_cpu_ctrl(cpu, r);
+ if (r->mon_capable)
+ domain_remove_cpu_mon(cpu, r);
+}
+
static void clear_closid_rmid(int cpu)
{
struct resctrl_pqr_state *state = this_cpu_ptr(&pqr_state);
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 8bce591a1018..a6261e177cc1 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -224,7 +224,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
return -EINVAL;
}
dom = strim(dom);
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
if (d->hdr.id == dom_id) {
data.buf = dom;
data.rdtgrp = rdtgrp;
@@ -316,7 +316,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
return -ENOMEM;

msr_param.res = NULL;
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
hw_dom = resctrl_to_arch_dom(d);
for (t = 0; t < CDP_NUM_TYPES; t++) {
cfg = &hw_dom->d_resctrl.staged_config[t];
@@ -464,7 +464,7 @@ static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int clo
u32 ctrl_val;

seq_printf(s, "%*s:", max_name_width, schema->name);
- list_for_each_entry(dom, &r->domains, hdr.list) {
+ list_for_each_entry(dom, &r->ctrl_domains, hdr.list) {
if (sep)
seq_puts(s, ";");

@@ -540,6 +540,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
int rdtgroup_mondata_show(struct seq_file *m, void *arg)
{
struct kernfs_open_file *of = m->private;
+ struct rdt_domain_hdr *hdr;
u32 resid, evtid, domid;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
@@ -560,11 +561,12 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
evtid = md.u.evtid;

r = &rdt_resources_all[resid].r_resctrl;
- d = rdt_find_domain(r, domid, NULL);
- if (IS_ERR_OR_NULL(d)) {
+ hdr = rdt_find_domain(&r->mon_domains, domid, NULL);
+ if (IS_ERR_OR_NULL(hdr)) {
ret = -ENOENT;
goto out;
}
+ d = container_of(hdr, struct rdt_domain, hdr);

mon_event_read(&rr, r, d, rdtgrp, evtid, false);

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 27cda5988d7f..3265b8499e2a 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -340,7 +340,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)

entry->busy = 0;
cpu = get_cpu();
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
err = resctrl_arch_rmid_read(r, d, entry->rmid,
QOS_L3_OCCUP_EVENT_ID,
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 18b6183a1b48..bda32b4e1c1e 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -292,7 +292,7 @@ static void pseudo_lock_region_clear(struct pseudo_lock_region *plr)
*/
static int pseudo_lock_region_init(struct pseudo_lock_region *plr)
{
- int scope = plr->s->res->scope;
+ int scope = plr->s->res->ctrl_scope;
struct cpu_cacheinfo *ci;
int ret;
int i;
@@ -856,7 +856,7 @@ bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
* associated with them.
*/
for_each_alloc_capable_rdt_resource(r) {
- list_for_each_entry(d_i, &r->domains, hdr.list) {
+ list_for_each_entry(d_i, &r->ctrl_domains, hdr.list) {
if (d_i->plr)
cpumask_or(cpu_with_psl, cpu_with_psl,
&d_i->cpu_mask);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 42adf17ea6fa..8132f81f31bb 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -86,7 +86,7 @@ void rdt_staged_configs_clear(void)
lockdep_assert_held(&rdtgroup_mutex);

for_each_alloc_capable_rdt_resource(r) {
- list_for_each_entry(dom, &r->domains, hdr.list)
+ list_for_each_entry(dom, &r->ctrl_domains, hdr.list)
memset(dom->staged_config, 0, sizeof(dom->staged_config));
}
}
@@ -928,7 +928,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,

mutex_lock(&rdtgroup_mutex);
hw_shareable = r->cache.shareable_bits;
- list_for_each_entry(dom, &r->domains, hdr.list) {
+ list_for_each_entry(dom, &r->ctrl_domains, hdr.list) {
if (sep)
seq_putc(seq, ';');
sw_shareable = 0;
@@ -1233,7 +1233,7 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
if (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)
continue;
has_cache = true;
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
ctrl = resctrl_arch_get_config(r, d, closid,
s->conf_type);
if (rdtgroup_cbm_overlaps(s, d, ctrl, closid, false)) {
@@ -1345,13 +1345,13 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
unsigned int size = 0;
int num_b, i;

- if (WARN_ON_ONCE(r->scope != RESCTRL_L2_CACHE && r->scope != RESCTRL_L3_CACHE))
+ if (WARN_ON_ONCE(r->ctrl_scope != RESCTRL_L2_CACHE && r->ctrl_scope != RESCTRL_L3_CACHE))
return -EINVAL;

num_b = bitmap_weight(&cbm, r->cache.cbm_len);
ci = get_cpu_cacheinfo(cpumask_any(&d->cpu_mask));
for (i = 0; i < ci->num_leaves; i++) {
- if (ci->info_list[i].level == r->scope) {
+ if (ci->info_list[i].level == r->ctrl_scope) {
size = ci->info_list[i].size / r->cache.cbm_len * num_b;
break;
}
@@ -1410,7 +1410,7 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
type = schema->conf_type;
sep = false;
seq_printf(s, "%*s:", max_name_width, schema->name);
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
if (sep)
seq_putc(s, ';');
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
@@ -1499,7 +1499,7 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid

mutex_lock(&rdtgroup_mutex);

- list_for_each_entry(dom, &r->domains, hdr.list) {
+ list_for_each_entry(dom, &r->mon_domains, hdr.list) {
if (sep)
seq_puts(s, ";");

@@ -1622,7 +1622,7 @@ static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
return -EINVAL;
}

- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
if (d->hdr.id == dom_id) {
ret = mbm_config_write_domain(r, d, evtid, val);
if (ret)
@@ -2141,7 +2141,7 @@ static int set_cache_qos_cfg(int level, bool enable)
return -ENOMEM;

r_l = &rdt_resources_all[level].r_resctrl;
- list_for_each_entry(d, &r_l->domains, hdr.list) {
+ list_for_each_entry(d, &r_l->ctrl_domains, hdr.list) {
if (r_l->cache.arch_has_per_cpu_cfg)
/* Pick all the CPUs in the domain instance */
for_each_cpu(cpu, &d->cpu_mask)
@@ -2226,7 +2226,7 @@ static int set_mba_sc(bool mba_sc)

r->membw.mba_sc = mba_sc;

- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
for (i = 0; i < num_closid; i++)
d->mbps_val[i] = MBA_MAX_MBPS;
}
@@ -2528,7 +2528,7 @@ static int rdt_get_tree(struct fs_context *fc)

if (is_mbm_enabled()) {
r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- list_for_each_entry(dom, &r->domains, hdr.list)
+ list_for_each_entry(dom, &r->mon_domains, hdr.list)
mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL);
}

@@ -2649,10 +2649,10 @@ static int reset_all_ctrls(struct rdt_resource *r)

/*
* Disable resource control for this resource by setting all
- * CBMs in all domains to the maximum mask value. Pick one CPU
+ * CBMs in all ctrl_domains to the maximum mask value. Pick one CPU
* from each domain to update the MSRs below.
*/
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
hw_dom = resctrl_to_arch_dom(d);
cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);

@@ -2922,7 +2922,7 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn,
struct rdt_domain *dom;
int ret;

- list_for_each_entry(dom, &r->domains, hdr.list) {
+ list_for_each_entry(dom, &r->mon_domains, hdr.list) {
ret = mkdir_mondata_subdir(parent_kn, dom, r, prgrp);
if (ret)
return ret;
@@ -3104,7 +3104,7 @@ static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid)
struct rdt_domain *d;
int ret;

- list_for_each_entry(d, &s->res->domains, hdr.list) {
+ list_for_each_entry(d, &s->res->ctrl_domains, hdr.list) {
ret = __init_one_rdt_domain(d, s, closid);
if (ret < 0)
return ret;
@@ -3119,7 +3119,7 @@ static void rdtgroup_init_mba(struct rdt_resource *r, u32 closid)
struct resctrl_staged_config *cfg;
struct rdt_domain *d;

- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
if (is_mba_sc(r)) {
d->mbps_val[closid] = MBA_MAX_MBPS;
continue;
@@ -3711,16 +3711,16 @@ static void domain_destroy_mon_state(struct rdt_domain *d)
kfree(d->mbm_local);
}

-void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
+void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
{
lockdep_assert_held(&rdtgroup_mutex);

if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
mba_sc_domain_destroy(r, d);
+}

- if (!r->mon_capable)
- return;
-
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
/*
* If resctrl is mounted, remove all the
* per domain monitor data directories.
@@ -3776,18 +3776,22 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

-int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
+int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
{
- int err;
-
lockdep_assert_held(&rdtgroup_mutex);

if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
/* RDT_RESOURCE_MBA is never mon_capable */
return mba_sc_domain_allocate(r, d);

- if (!r->mon_capable)
- return 0;
+ return 0;
+}
+
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+ int err;
+
+ lockdep_assert_held(&rdtgroup_mutex);

err = domain_setup_mon_state(r, d);
if (err)
--
2.41.0

2023-09-29 19:11:57

by Peter Newman

[permalink] [raw]
Subject: Re: [PATCH v6 3/8] x86/resctrl: Prepare for different scope for control/monitor operations

Hi Tony,

On Thu, Sep 28, 2023 at 9:14 PM Tony Luck <[email protected]> wrote:
>
> @@ -352,7 +355,7 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
> {
> struct rdt_domain *d;
>
> - list_for_each_entry(d, &r->domains, hdr.list) {
> + list_for_each_entry(d, &r->ctrl_domains, hdr.list) {

If someone were to call get_domain_from_cpu() looking for a
mon_domain, I don't think they'd be happy with the result.

This problem seems adequately addressed by the next patch where a type
mismatch on the return value would result.

In any case, perhaps the name could be updated to set expectations better.


> @@ -549,44 +552,101 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
>
> rdt_domain_reconfigure_cdp(r);
>
> - if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
> + if (domain_setup_ctrlval(r, d)) {
> domain_free(hw_dom);
> return;
> }
>
> - if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
> + list_add_tail(&d->hdr.list, add_pos);
> +
> + err = resctrl_online_ctrl_domain(r, d);
> + if (err) {
> + list_del(&d->hdr.list);
> domain_free(hw_dom);
> + }
> +}
> +
> +static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
> +{
> + int id = get_domain_id_from_scope(cpu, r->mon_scope);
> + struct list_head *add_pos = NULL;
> + struct rdt_hw_domain *hw_mondom;

It's still hw_dom in domain_add_cpu_ctrl(), so why hw_mondom here?


> @@ -3711,16 +3711,16 @@ static void domain_destroy_mon_state(struct rdt_domain *d)
> kfree(d->mbm_local);
> }
>
> -void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
> +void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
> {
> lockdep_assert_held(&rdtgroup_mutex);
>
> if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
> mba_sc_domain_destroy(r, d);
> +}
>
> - if (!r->mon_capable)
> - return;
> -
> +void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
> +{
> /*
> * If resctrl is mounted, remove all the
> * per domain monitor data directories.

We did a lockdep_assert_held(&rdtgroup_mutex) for both types before.
Should we continue to do so here?


> --
> 2.41.0
>

In the resctrl2 prototype I complained that resctrl_resource was
awkwardly disjoint in its support for control and monitoring
groups[1]. In this patch, you seem to have already done most of the
hard work in separating the control and monitoring functionality, so
taking the next step and using a different structure to represent
control and monitoring resources would further improve the code by
statically typing monitoring and control resources, which would be
less error-prone than run-time checks on the alloc_capable and
mon_capable fields, which seem easy to forget.

I don't think this is necessary to complete SNC support, but it would
give me confidence that there isn't a misplaced {alloc,mon}_capable
check resulting in the wrong domain list being traversed. I will
probably do this myself later if you don't.

Thanks!
-Peter

[1] https://lore.kernel.org/all/CALPaoCj_oa=nATvOO_uysYvu+PdTQtd0pvssvm9_M1+fP-Z8JA@mail.gmail.com/

2023-09-29 23:47:11

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v6 3/8] x86/resctrl: Prepare for different scope for control/monitor operations

On Fri, Sep 29, 2023 at 03:10:18PM +0200, Peter Newman wrote:
> Hi Tony,
>
> On Thu, Sep 28, 2023 at 9:14 PM Tony Luck <[email protected]> wrote:
> >
> > @@ -352,7 +355,7 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
> > {
> > struct rdt_domain *d;
> >
> > - list_for_each_entry(d, &r->domains, hdr.list) {
> > + list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
>
> If someone were to call get_domain_from_cpu() looking for a
> mon_domain, I don't think they'd be happy with the result.

Indeed not. Type checking in "C" doesn't seem adequate to address this
(when using "container_of()" which blindly trusts the user provided the
right tyep/fieldname). I'm using the rdt_domain_hdr.type field to
provide necessary checks.

>
> This problem seems adequately addressed by the next patch where a type
> mismatch on the return value would result.
>
> In any case, perhaps the name could be updated to set expectations better.
>
>
> > @@ -549,44 +552,101 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
> >
> > rdt_domain_reconfigure_cdp(r);
> >
> > - if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
> > + if (domain_setup_ctrlval(r, d)) {
> > domain_free(hw_dom);
> > return;
> > }
> >
> > - if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
> > + list_add_tail(&d->hdr.list, add_pos);
> > +
> > + err = resctrl_online_ctrl_domain(r, d);
> > + if (err) {
> > + list_del(&d->hdr.list);
> > domain_free(hw_dom);
> > + }
> > +}
> > +
> > +static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
> > +{
> > + int id = get_domain_id_from_scope(cpu, r->mon_scope);
> > + struct list_head *add_pos = NULL;
> > + struct rdt_hw_domain *hw_mondom;
>
> It's still hw_dom in domain_add_cpu_ctrl(), so why hw_mondom here?

No good reason. I'll change it to "hw_dom".

>
>
> > @@ -3711,16 +3711,16 @@ static void domain_destroy_mon_state(struct rdt_domain *d)
> > kfree(d->mbm_local);
> > }
> >
> > -void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
> > +void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
> > {
> > lockdep_assert_held(&rdtgroup_mutex);
> >
> > if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
> > mba_sc_domain_destroy(r, d);
> > +}
> >
> > - if (!r->mon_capable)
> > - return;
> > -
> > +void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
> > +{
> > /*
> > * If resctrl is mounted, remove all the
> > * per domain monitor data directories.
>
> We did a lockdep_assert_held(&rdtgroup_mutex) for both types before.
> Should we continue to do so here?

Yes. Added it.

>
>
> > --
> > 2.41.0
> >
>
> In the resctrl2 prototype I complained that resctrl_resource was
> awkwardly disjoint in its support for control and monitoring
> groups[1]. In this patch, you seem to have already done most of the
> hard work in separating the control and monitoring functionality, so
> taking the next step and using a different structure to represent
> control and monitoring resources would further improve the code by
> statically typing monitoring and control resources, which would be
> less error-prone than run-time checks on the alloc_capable and
> mon_capable fields, which seem easy to forget.
>
> I don't think this is necessary to complete SNC support, but it would
> give me confidence that there isn't a misplaced {alloc,mon}_capable
> check resulting in the wrong domain list being traversed. I will
> probably do this myself later if you don't.

Simple change. It's split between previous patch to add the field
and current patch to initialize and check.

>
> Thanks!
> -Peter
>
> [1] https://lore.kernel.org/all/CALPaoCj_oa=nATvOO_uysYvu+PdTQtd0pvssvm9_M1+fP-Z8JA@mail.gmail.com/

Thanks

-Tony

2023-09-30 02:06:21

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v6 0/8] Add support for Sub-NUMA cluster (SNC) systems

On Fri, Sep 29, 2023 at 04:33:17PM +0200, Peter Newman wrote:
> Hi Tony,
>
> On Thu, Sep 28, 2023 at 9:14 PM Tony Luck <[email protected]> wrote:
> >
> > The Sub-NUMA cluster feature on some Intel processors partitions
> > the CPUs that share an L3 cache into two or more sets. This plays
> > havoc with the Resource Director Technology (RDT) monitoring features.
> > Prior to this patch Intel has advised that SNC and RDT are incompatible.
> >
> > Some of these CPU support an MSR that can partition the RMID
> > counters in the same way. This allows for monitoring features
> > to be used (with the caveat that memory accesses between different
> > SNC NUMA nodes may still not be counted accuratlely.
>
> Is an "SNC NUMA node" a "sub-NUMA node", or a NUMA node on which SNC
> has been enabled?

It would be architecturally possible to enable SNC mode on
a subset of CPU sockets. But there isn't a BIOS setup option
to do that. You either have SNC everywhere, or nowhere.

I prefer "SNC NUMA node" == "sub-NUMA node".

This version "NUMA node on which SNC has been enabled"
makes it sound like there is a control on a NUMA node
that can be switched. The control is on the CPU socket.
That's often equivalent to a NUMA node, but Intel has
had CPUs in the past where this isn't the case (e.g.
Cascade Lake -AP and Cooper Lake).
>
> Thanks!
> -Peter


Thanks for the review of the series. I've applied changes
to my local tree. Will post v7 of the series early next
week if no other reviews come in.

-Tony

2023-10-03 16:08:33

by Tony Luck

[permalink] [raw]
Subject: [PATCH v7 0/8] Add support for Sub-NUMA cluster (SNC) systems

The Sub-NUMA cluster feature on some Intel processors partitions
the CPUs that share an L3 cache into two or more sets. This plays
havoc with the Resource Director Technology (RDT) monitoring features.
Prior to this patch Intel has advised that SNC and RDT are incompatible.

Some of these CPU support an MSR that can partition the RMID
counters in the same way. This allows for monitoring features
to be used (with the caveat that memory accesses between different
SNC NUMA nodes may still not be counted accuratlely.

Note that this patch series improves resctrl reporting considerably
on systems with SNC enabled, but there will still be some anomalies
for processes accessing memory from other sub-NUMA nodes.

Signed-off-by: Tony Luck <[email protected]>

Tony Luck (8):
x86/resctrl: Prepare for new domain scope
x86/resctrl: Prepare to split rdt_domain structure
x86/resctrl: Prepare for different scope for control/monitor
operations
x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
x86/resctrl: Add node-scope to the options for feature scope
x86/resctrl: Introduce snc_nodes_per_l3_cache
x86/resctrl: Sub NUMA Cluster detection and enable
x86/resctrl: Update documentation with Sub-NUMA cluster changes

Documentation/arch/x86/resctrl.rst | 23 +-
include/linux/resctrl.h | 85 +++--
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 66 ++--
arch/x86/kernel/cpu/resctrl/core.c | 400 +++++++++++++++++-----
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 58 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 58 ++--
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 14 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 132 +++----
9 files changed, 591 insertions(+), 246 deletions(-)


base-commit: 6465e260f48790807eef06b583b38ca9789b6072
--
2.41.0

2023-10-03 16:08:39

by Tony Luck

[permalink] [raw]
Subject: [PATCH v7 3/8] x86/resctrl: Prepare for different scope for control/monitor operations

Resctrl assumes that control and monitor operations on a resource are
performed at the same scope.

Prepare for systems that use different scope (specifically L3 scope for
cache control and NODE scope for cache occupancy and memory bandwidth
monitoring).

Create separate domain lists for control and monitor operations.

Note that errors during initialization of either control or monitor
functions on a domain would previously result in that domain being
excluded from both control and monitor operations. Now the domains are
allocated independently it is no longer required to disable both control
and monitor operations if either fail.

Signed-off-by: Tony Luck <[email protected]>

---

Changes since last version:

Initialize the "type" in rdt_domain_hdr when creating domains.
Check type has expected value before using container_of() to
get to the surrounding structure.

Rename "hw_mondom" to "hw_dom" in domain_add_cpu_mon() and
in domain_remove_cpu_mon().

Add lockdep_assert_held(&rdtgroup_mutex) to resctrl_offline_mon_domain()

include/linux/resctrl.h | 18 +-
arch/x86/kernel/cpu/resctrl/internal.h | 4 +-
arch/x86/kernel/cpu/resctrl/core.c | 214 +++++++++++++++++-----
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 18 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 2 +-
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 4 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 54 +++---
7 files changed, 224 insertions(+), 90 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 4c8870345ff8..f47b01ae28ca 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -170,10 +170,12 @@ enum resctrl_scope {
* @alloc_capable: Is allocation available on this machine
* @mon_capable: Is monitor feature available on this machine
* @num_rmid: Number of RMIDs available
- * @scope: Scope of this resource
+ * @ctrl_scope: Scope of this resource for control functions
+ * @mon_scope: Scope of this resource for monitor functions
* @cache: Cache allocation related data
* @membw: If the component has bandwidth controls, their properties.
- * @domains: All domains for this resource
+ * @ctrl_domains: Control domains for this resource
+ * @mon_domains: Monitor domains for this resource
* @name: Name to use in "schemata" file.
* @data_width: Character width of data when displaying
* @default_ctrl: Specifies default cache cbm or memory B/W percent.
@@ -188,10 +190,12 @@ struct rdt_resource {
bool alloc_capable;
bool mon_capable;
int num_rmid;
- enum resctrl_scope scope;
+ enum resctrl_scope ctrl_scope;
+ enum resctrl_scope mon_scope;
struct resctrl_cache cache;
struct resctrl_membw membw;
- struct list_head domains;
+ struct list_head ctrl_domains;
+ struct list_head mon_domains;
char *name;
int data_width;
u32 default_ctrl;
@@ -237,8 +241,10 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,

u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
-int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
-void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
+int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d);

/**
* resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 85ceaf9a31ac..e9a2a8993d14 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -511,8 +511,8 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn);
int rdtgroup_kn_mode_restrict(struct rdtgroup *r, const char *name);
int rdtgroup_kn_mode_restore(struct rdtgroup *r, const char *name,
umode_t mask);
-struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
- struct list_head **pos);
+struct rdt_domain_hdr *rdt_find_domain(struct list_head *h, int id,
+ struct list_head **pos);
ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off);
int rdtgroup_schemata_show(struct kernfs_open_file *of,
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 05369add4578..ae2b66d38afc 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -57,7 +57,8 @@ static void
mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
struct rdt_resource *r);

-#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.domains)
+#define ctrl_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.ctrl_domains)
+#define mon_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.mon_domains)

struct rdt_hw_resource rdt_resources_all[] = {
[RDT_RESOURCE_L3] =
@@ -65,8 +66,10 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_L3,
.name = "L3",
- .scope = RESCTRL_L3_CACHE,
- .domains = domain_init(RDT_RESOURCE_L3),
+ .ctrl_scope = RESCTRL_L3_CACHE,
+ .mon_scope = RESCTRL_L3_CACHE,
+ .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_L3),
+ .mon_domains = mon_domain_init(RDT_RESOURCE_L3),
.parse_ctrlval = parse_cbm,
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
@@ -79,8 +82,8 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_L2,
.name = "L2",
- .scope = RESCTRL_L2_CACHE,
- .domains = domain_init(RDT_RESOURCE_L2),
+ .ctrl_scope = RESCTRL_L2_CACHE,
+ .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_L2),
.parse_ctrlval = parse_cbm,
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
@@ -93,8 +96,8 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_MBA,
.name = "MB",
- .scope = RESCTRL_L3_CACHE,
- .domains = domain_init(RDT_RESOURCE_MBA),
+ .ctrl_scope = RESCTRL_L3_CACHE,
+ .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_MBA),
.parse_ctrlval = parse_bw,
.format_str = "%d=%*u",
.fflags = RFTYPE_RES_MB,
@@ -105,8 +108,8 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_SMBA,
.name = "SMBA",
- .scope = RESCTRL_L3_CACHE,
- .domains = domain_init(RDT_RESOURCE_SMBA),
+ .ctrl_scope = RESCTRL_L3_CACHE,
+ .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_SMBA),
.parse_ctrlval = parse_bw,
.format_str = "%d=%*u",
.fflags = RFTYPE_RES_MB,
@@ -352,7 +355,7 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
{
struct rdt_domain *d;

- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
/* Find the domain that contains this CPU */
if (cpumask_test_cpu(cpu, &d->cpu_mask))
return d;
@@ -384,29 +387,39 @@ void rdt_ctrl_update(void *arg)
}

/*
- * rdt_find_domain - Find a domain in a resource that matches input resource id
+ * rdt_find_domain - Find a domain in one of a resource domain lists.
*
- * Search resource r's domain list to find the resource id. If the resource
- * id is found in a domain, return the domain. Otherwise, if requested by
- * caller, return the first domain whose id is bigger than the input id.
+ * Search the list to find the resource id. If the resource id is found
+ * in a domain, return the domain. Otherwise, if requested by caller,
+ * return the first domain whose id is bigger than the input id.
* The domain list is sorted by id in ascending order.
+ *
+ * If an existing domain in the resource r's domain list matches the cpu's
+ * resource id, add the cpu in the domain.
+ *
+ * Otherwise, caller will allocate a new domain and insert into the right position
+ * in the domain list sorted by id in ascending order.
+ *
+ * The order in the domain list is visible to users when we print entries
+ * in the schemata file and schemata input is validated to have the same order
+ * as this list.
*/
-struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
- struct list_head **pos)
+struct rdt_domain_hdr *rdt_find_domain(struct list_head *h, int id,
+ struct list_head **pos)
{
- struct rdt_domain *d;
+ struct rdt_domain_hdr *d;
struct list_head *l;

if (id < 0)
return ERR_PTR(-ENODEV);

- list_for_each(l, &r->domains) {
- d = list_entry(l, struct rdt_domain, hdr.list);
+ list_for_each(l, h) {
+ d = list_entry(l, struct rdt_domain_hdr, list);
/* When id is found, return its domain. */
- if (id == d->hdr.id)
+ if (id == d->id)
return d;
/* Stop searching when finding id's position in sorted list. */
- if (id < d->hdr.id)
+ if (id < d->id)
break;
}

@@ -500,38 +513,32 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
return -EINVAL;
}

-/*
- * domain_add_cpu - Add a cpu to a resource's domain list.
- *
- * If an existing domain in the resource r's domain list matches the cpu's
- * resource id, add the cpu in the domain.
- *
- * Otherwise, a new domain is allocated and inserted into the right position
- * in the domain list sorted by id in ascending order.
- *
- * The order in the domain list is visible to users when we print entries
- * in the schemata file and schemata input is validated to have the same order
- * as this list.
- */
-static void domain_add_cpu(int cpu, struct rdt_resource *r)
+static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
{
- int id = get_domain_id_from_scope(cpu, r->scope);
+ int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
struct list_head *add_pos = NULL;
struct rdt_hw_domain *hw_dom;
+ struct rdt_domain_hdr *hdr;
struct rdt_domain *d;
int err;

if (id < 0) {
- pr_warn_once("Can't find domain id for CPU:%d scope:%d for resource %s\n",
- cpu, r->scope, r->name);
+ pr_warn_once("Can't find control domain id for CPU:%d scope:%d for resource %s\n",
+ cpu, r->ctrl_scope, r->name);
return;
}
- d = rdt_find_domain(r, id, &add_pos);
- if (IS_ERR(d)) {
- pr_warn("Couldn't find cache id for CPU %d\n", cpu);
+
+ hdr = rdt_find_domain(&r->ctrl_domains, id, &add_pos);
+ if (IS_ERR(hdr)) {
+ pr_warn("Couldn't find control scope id=%d for CPU %d\n", id, cpu);
return;
}

+ if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
+ return;
+
+ d = container_of(hdr, struct rdt_domain, hdr);
+
if (d) {
cpumask_set_cpu(cpu, &d->cpu_mask);
if (r->cache.arch_has_per_cpu_cfg)
@@ -545,48 +552,115 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)

d = &hw_dom->d_resctrl;
d->hdr.id = id;
+ d->hdr.type = RESCTRL_CTRL_DOMAIN;
cpumask_set_cpu(cpu, &d->cpu_mask);

rdt_domain_reconfigure_cdp(r);

- if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
+ if (domain_setup_ctrlval(r, d)) {
domain_free(hw_dom);
return;
}

- if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+ list_add_tail(&d->hdr.list, add_pos);
+
+ err = resctrl_online_ctrl_domain(r, d);
+ if (err) {
+ list_del(&d->hdr.list);
+ domain_free(hw_dom);
+ }
+}
+
+static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
+{
+ int id = get_domain_id_from_scope(cpu, r->mon_scope);
+ struct list_head *add_pos = NULL;
+ struct rdt_hw_domain *hw_dom;
+ struct rdt_domain_hdr *hdr;
+ struct rdt_domain *d;
+ int err;
+
+ if (id < 0) {
+ pr_warn_once("Can't find monitor domain id for CPU:%d scope:%d for resource %s\n",
+ cpu, r->mon_scope, r->name);
+ return;
+ }
+
+ hdr = rdt_find_domain(&r->mon_domains, id, &add_pos);
+ if (IS_ERR(hdr)) {
+ pr_warn("Couldn't find monitor scope id=%d for CPU %d\n", id, cpu);
+ return;
+ }
+
+ if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
+ return;
+
+ d = container_of(hdr, struct rdt_domain, hdr);
+
+ if (d) {
+ cpumask_set_cpu(cpu, &d->cpu_mask);
+ return;
+ }
+
+ hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu));
+ if (!hw_dom)
+ return;
+
+ d = &hw_dom->d_resctrl;
+ d->hdr.id = id;
+ d->hdr.type = RESCTRL_MON_DOMAIN;
+ cpumask_set_cpu(cpu, &d->cpu_mask);
+
+ if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
domain_free(hw_dom);
return;
}

list_add_tail(&d->hdr.list, add_pos);

- err = resctrl_online_domain(r, d);
+ err = resctrl_online_mon_domain(r, d);
if (err) {
list_del(&d->hdr.list);
domain_free(hw_dom);
}
}

-static void domain_remove_cpu(int cpu, struct rdt_resource *r)
+/*
+ * domain_add_cpu - Add a cpu to either/both resource's domain lists.
+ */
+static void domain_add_cpu(int cpu, struct rdt_resource *r)
+{
+ if (r->alloc_capable)
+ domain_add_cpu_ctrl(cpu, r);
+ if (r->mon_capable)
+ domain_add_cpu_mon(cpu, r);
+}
+
+static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
{
- int id = get_domain_id_from_scope(cpu, r->scope);
+ int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
struct rdt_hw_domain *hw_dom;
+ struct rdt_domain_hdr *hdr;
struct rdt_domain *d;

if (id < 0)
return;

- d = rdt_find_domain(r, id, NULL);
- if (IS_ERR_OR_NULL(d)) {
- pr_warn("Couldn't find cache id for CPU %d\n", cpu);
+ hdr = rdt_find_domain(&r->ctrl_domains, id, NULL);
+ if (IS_ERR_OR_NULL(hdr)) {
+ pr_warn("Couldn't find control scope id=%d for CPU %d\n", id, cpu);
return;
}
+
+ if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
+ return;
+
+ d = container_of(hdr, struct rdt_domain, hdr);
hw_dom = resctrl_to_arch_dom(d);

cpumask_clear_cpu(cpu, &d->cpu_mask);
if (cpumask_empty(&d->cpu_mask)) {
- resctrl_offline_domain(r, d);
+ resctrl_offline_ctrl_domain(r, d);
list_del(&d->hdr.list);

/*
@@ -599,6 +673,38 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)

return;
}
+}
+
+static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
+{
+ int id = get_domain_id_from_scope(cpu, r->mon_scope);
+ struct rdt_hw_domain *hw_dom;
+ struct rdt_domain_hdr *hdr;
+ struct rdt_domain *d;
+
+ if (id < 0)
+ return;
+
+ hdr = rdt_find_domain(&r->mon_domains, id, NULL);
+ if (IS_ERR_OR_NULL(hdr)) {
+ pr_warn("Couldn't find scope id=%d for CPU %d\n", id, cpu);
+ return;
+ }
+
+ if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
+ return;
+
+ d = container_of(hdr, struct rdt_domain, hdr);
+ hw_dom = resctrl_to_arch_dom(d);
+
+ cpumask_clear_cpu(cpu, &d->cpu_mask);
+ if (cpumask_empty(&d->cpu_mask)) {
+ resctrl_offline_mon_domain(r, d);
+ list_del(&d->hdr.list);
+ domain_free(hw_dom);
+
+ return;
+ }

if (r == &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl) {
if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {
@@ -613,6 +719,14 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
}
}

+static void domain_remove_cpu(int cpu, struct rdt_resource *r)
+{
+ if (r->alloc_capable)
+ domain_remove_cpu_ctrl(cpu, r);
+ if (r->mon_capable)
+ domain_remove_cpu_mon(cpu, r);
+}
+
static void clear_closid_rmid(int cpu)
{
struct resctrl_pqr_state *state = this_cpu_ptr(&pqr_state);
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 8bce591a1018..33ff4d00a08c 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -224,7 +224,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
return -EINVAL;
}
dom = strim(dom);
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
if (d->hdr.id == dom_id) {
data.buf = dom;
data.rdtgrp = rdtgrp;
@@ -316,7 +316,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
return -ENOMEM;

msr_param.res = NULL;
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
hw_dom = resctrl_to_arch_dom(d);
for (t = 0; t < CDP_NUM_TYPES; t++) {
cfg = &hw_dom->d_resctrl.staged_config[t];
@@ -464,7 +464,7 @@ static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int clo
u32 ctrl_val;

seq_printf(s, "%*s:", max_name_width, schema->name);
- list_for_each_entry(dom, &r->domains, hdr.list) {
+ list_for_each_entry(dom, &r->ctrl_domains, hdr.list) {
if (sep)
seq_puts(s, ";");

@@ -540,6 +540,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
int rdtgroup_mondata_show(struct seq_file *m, void *arg)
{
struct kernfs_open_file *of = m->private;
+ struct rdt_domain_hdr *hdr;
u32 resid, evtid, domid;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
@@ -560,12 +561,19 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
evtid = md.u.evtid;

r = &rdt_resources_all[resid].r_resctrl;
- d = rdt_find_domain(r, domid, NULL);
- if (IS_ERR_OR_NULL(d)) {
+ hdr = rdt_find_domain(&r->mon_domains, domid, NULL);
+ if (IS_ERR_OR_NULL(hdr)) {
ret = -ENOENT;
goto out;
}

+ if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ d = container_of(hdr, struct rdt_domain, hdr);
+
mon_event_read(&rr, r, d, rdtgrp, evtid, false);

if (rr.err == -EIO)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 27cda5988d7f..3265b8499e2a 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -340,7 +340,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)

entry->busy = 0;
cpu = get_cpu();
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
err = resctrl_arch_rmid_read(r, d, entry->rmid,
QOS_L3_OCCUP_EVENT_ID,
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 18b6183a1b48..bda32b4e1c1e 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -292,7 +292,7 @@ static void pseudo_lock_region_clear(struct pseudo_lock_region *plr)
*/
static int pseudo_lock_region_init(struct pseudo_lock_region *plr)
{
- int scope = plr->s->res->scope;
+ int scope = plr->s->res->ctrl_scope;
struct cpu_cacheinfo *ci;
int ret;
int i;
@@ -856,7 +856,7 @@ bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
* associated with them.
*/
for_each_alloc_capable_rdt_resource(r) {
- list_for_each_entry(d_i, &r->domains, hdr.list) {
+ list_for_each_entry(d_i, &r->ctrl_domains, hdr.list) {
if (d_i->plr)
cpumask_or(cpu_with_psl, cpu_with_psl,
&d_i->cpu_mask);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 739e56cfb31f..90565bb44d0e 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -86,7 +86,7 @@ void rdt_staged_configs_clear(void)
lockdep_assert_held(&rdtgroup_mutex);

for_each_alloc_capable_rdt_resource(r) {
- list_for_each_entry(dom, &r->domains, hdr.list)
+ list_for_each_entry(dom, &r->ctrl_domains, hdr.list)
memset(dom->staged_config, 0, sizeof(dom->staged_config));
}
}
@@ -928,7 +928,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,

mutex_lock(&rdtgroup_mutex);
hw_shareable = r->cache.shareable_bits;
- list_for_each_entry(dom, &r->domains, hdr.list) {
+ list_for_each_entry(dom, &r->ctrl_domains, hdr.list) {
if (sep)
seq_putc(seq, ';');
sw_shareable = 0;
@@ -1233,7 +1233,7 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
if (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)
continue;
has_cache = true;
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
ctrl = resctrl_arch_get_config(r, d, closid,
s->conf_type);
if (rdtgroup_cbm_overlaps(s, d, ctrl, closid, false)) {
@@ -1345,13 +1345,13 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
unsigned int size = 0;
int num_b, i;

- if (WARN_ON_ONCE(r->scope != RESCTRL_L2_CACHE && r->scope != RESCTRL_L3_CACHE))
+ if (WARN_ON_ONCE(r->ctrl_scope != RESCTRL_L2_CACHE && r->ctrl_scope != RESCTRL_L3_CACHE))
return size;

num_b = bitmap_weight(&cbm, r->cache.cbm_len);
ci = get_cpu_cacheinfo(cpumask_any(&d->cpu_mask));
for (i = 0; i < ci->num_leaves; i++) {
- if (ci->info_list[i].level == r->scope) {
+ if (ci->info_list[i].level == r->ctrl_scope) {
size = ci->info_list[i].size / r->cache.cbm_len * num_b;
break;
}
@@ -1410,7 +1410,7 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
type = schema->conf_type;
sep = false;
seq_printf(s, "%*s:", max_name_width, schema->name);
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
if (sep)
seq_putc(s, ';');
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
@@ -1499,7 +1499,7 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid

mutex_lock(&rdtgroup_mutex);

- list_for_each_entry(dom, &r->domains, hdr.list) {
+ list_for_each_entry(dom, &r->mon_domains, hdr.list) {
if (sep)
seq_puts(s, ";");

@@ -1622,7 +1622,7 @@ static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
return -EINVAL;
}

- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
if (d->hdr.id == dom_id) {
ret = mbm_config_write_domain(r, d, evtid, val);
if (ret)
@@ -2141,7 +2141,7 @@ static int set_cache_qos_cfg(int level, bool enable)
return -ENOMEM;

r_l = &rdt_resources_all[level].r_resctrl;
- list_for_each_entry(d, &r_l->domains, hdr.list) {
+ list_for_each_entry(d, &r_l->ctrl_domains, hdr.list) {
if (r_l->cache.arch_has_per_cpu_cfg)
/* Pick all the CPUs in the domain instance */
for_each_cpu(cpu, &d->cpu_mask)
@@ -2226,7 +2226,7 @@ static int set_mba_sc(bool mba_sc)

r->membw.mba_sc = mba_sc;

- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
for (i = 0; i < num_closid; i++)
d->mbps_val[i] = MBA_MAX_MBPS;
}
@@ -2528,7 +2528,7 @@ static int rdt_get_tree(struct fs_context *fc)

if (is_mbm_enabled()) {
r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- list_for_each_entry(dom, &r->domains, hdr.list)
+ list_for_each_entry(dom, &r->mon_domains, hdr.list)
mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL);
}

@@ -2649,10 +2649,10 @@ static int reset_all_ctrls(struct rdt_resource *r)

/*
* Disable resource control for this resource by setting all
- * CBMs in all domains to the maximum mask value. Pick one CPU
+ * CBMs in all ctrl_domains to the maximum mask value. Pick one CPU
* from each domain to update the MSRs below.
*/
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
hw_dom = resctrl_to_arch_dom(d);
cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);

@@ -2922,7 +2922,7 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn,
struct rdt_domain *dom;
int ret;

- list_for_each_entry(dom, &r->domains, hdr.list) {
+ list_for_each_entry(dom, &r->mon_domains, hdr.list) {
ret = mkdir_mondata_subdir(parent_kn, dom, r, prgrp);
if (ret)
return ret;
@@ -3104,7 +3104,7 @@ static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid)
struct rdt_domain *d;
int ret;

- list_for_each_entry(d, &s->res->domains, hdr.list) {
+ list_for_each_entry(d, &s->res->ctrl_domains, hdr.list) {
ret = __init_one_rdt_domain(d, s, closid);
if (ret < 0)
return ret;
@@ -3119,7 +3119,7 @@ static void rdtgroup_init_mba(struct rdt_resource *r, u32 closid)
struct resctrl_staged_config *cfg;
struct rdt_domain *d;

- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
if (is_mba_sc(r)) {
d->mbps_val[closid] = MBA_MAX_MBPS;
continue;
@@ -3711,15 +3711,17 @@ static void domain_destroy_mon_state(struct rdt_domain *d)
kfree(d->mbm_local);
}

-void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
+void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
{
lockdep_assert_held(&rdtgroup_mutex);

if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
mba_sc_domain_destroy(r, d);
+}

- if (!r->mon_capable)
- return;
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+ lockdep_assert_held(&rdtgroup_mutex);

/*
* If resctrl is mounted, remove all the
@@ -3776,18 +3778,22 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

-int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
+int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
{
- int err;
-
lockdep_assert_held(&rdtgroup_mutex);

if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
/* RDT_RESOURCE_MBA is never mon_capable */
return mba_sc_domain_allocate(r, d);

- if (!r->mon_capable)
- return 0;
+ return 0;
+}
+
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+ int err;
+
+ lockdep_assert_held(&rdtgroup_mutex);

err = domain_setup_mon_state(r, d);
if (err)
--
2.41.0

2023-10-03 16:08:43

by Tony Luck

[permalink] [raw]
Subject: [PATCH v7 6/8] x86/resctrl: Introduce snc_nodes_per_l3_cache

Intel Sub-NUMA Cluster (SNC) is a feature that subdivides the CPU cores
and memory controllers on a socket into two or more groups. These are
presented to the operating system as NUMA nodes.

This may enable some workloads to have slightly lower latency to memory
as the memory controller(s) in an SNC node are electrically closer to the
CPU cores on that SNC node. This cost may be offset by lower bandwidth
since the memory accesses for each core can only be interleaved between
the memory controllers on the same SNC node.

Resctrl monitoring on Intel system depends upon attaching RMIDs to tasks
to track L3 cache occupancy and memory bandwidth. There is an MSR that
controls how the RMIDs are shared between SNC nodes.

The default mode divides them numerically. E.g. when there are two SNC
nodes on a socket the lower number half of the RMIDs are given to the
first node, the remainder to the second node. This would be difficult
to use with the Linux resctrl interface as specific RMID values assigned
to resctrl groups are not visible to users.

The other mode divides the RMIDs and renumbers the ones on the second
SNC node to start from zero.

Even with this renumbering SNC mode requires several changes in resctrl
behavior for correct operation.

Add a global integer "snc_nodes_per_l3_cache" that will show how many
SNC nodes share each L3 cache. When this is "1", SNC mode is either
not implemented, or not enabled.

A later patch will detect SNC mode and set snc_nodes_per_l3_cache to
the appropriate value. For now it remains at the default "1" to
indicate SNC mode is not active.

Code that needs to take action when SNC is enabled is:
1) The number of logical RMIDs per L3 cache available for use is the
number of physical RMIDs divided by the number of SNC nodes.
2) Likewise the "mon_scale" value must be adjusted for the number
of SNC nodes.
3) The RMID renumbering operates when using the value from the
IA32_PQR_ASSOC MSR to count accesses by a task. When reading an RMID
counter, code must adjust from the logical RMID used to the physical
RMID value for the SNC node that it wishes to read and load the
adjusted value into the IA32_QM_EVTSEL MSR.
4) The L3 cache is divided between the SNC nodes. So the value
reported in the resctrl "size" file is adjusted.
5) The "-o mba_MBps" mount option must be disabled in SNC mode
because the monitoring is being done per SNC node, while the
bandwidth allocation is still done at the L3 cache scope.
Trying to use this feedback loop might result in contradictory
changes to the throttling level coming from each of the SNC
node bandwidth measurements.

Signed-off-by: Tony Luck <[email protected]>
Reviewed-by: Peter Newman <[email protected]>

---

Changes since last version:

In commit comment s/redumbering/renumbering/

Move check that SNC is not enabled into supports_mba_mbps().

arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
arch/x86/kernel/cpu/resctrl/core.c | 6 ++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 16 +++++++++++++---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 5 +++--
4 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 3aed8e7b8487..3fddda401b83 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -446,6 +446,8 @@ DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);

extern struct dentry *debugfs_resctrl;

+extern int snc_nodes_per_l3_cache;
+
enum resctrl_res_level {
RDT_RESOURCE_L3,
RDT_RESOURCE_L2,
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 6b937da36e4c..cd189b7ca6ea 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -48,6 +48,12 @@ int max_name_width, max_data_width;
*/
bool rdt_alloc_capable;

+/*
+ * Number of SNC nodes that share each L3 cache. Default is 1 for
+ * systems that do not support SNC, or have SNC disabled.
+ */
+int snc_nodes_per_l3_cache = 1;
+
static void
mba_wrmsr_intel(struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 97d2ed829f5d..e6e566921a60 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -148,8 +148,18 @@ static inline struct rmid_entry *__rmid_entry(u32 rmid)

static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ int cpu = smp_processor_id();
+ int rmid_offset = 0;
u64 msr_val;

+ /*
+ * When SNC mode is on, need to compute the offset to read the
+ * physical RMID counter for the node to which this CPU belongs.
+ */
+ if (snc_nodes_per_l3_cache > 1)
+ rmid_offset = (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->num_rmid;
+
/*
* As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
* with a valid event code for supported resource type and the bits
@@ -158,7 +168,7 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
* IA32_QM_CTR.Error (bit 63) and IA32_QM_CTR.Unavailable (bit 62)
* are error bits.
*/
- wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
+ wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid + rmid_offset);
rdmsrl(MSR_IA32_QM_CTR, msr_val);

if (msr_val & RMID_VAL_ERROR)
@@ -783,8 +793,8 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
int ret;

resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
- hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
- r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1;
+ hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_nodes_per_l3_cache;
+ r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;

if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index afa7a8dca48d..def203c40d70 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1357,7 +1357,7 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
}
}

- return size;
+ return size / snc_nodes_per_l3_cache;
}

/**
@@ -2207,7 +2207,8 @@ static bool supports_mba_mbps(void)
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;

return (is_mbm_local_enabled() &&
- r->alloc_capable && is_mba_linear());
+ r->alloc_capable && is_mba_linear() &&
+ snc_nodes_per_l3_cache == 1);
}

/*
--
2.41.0

2023-10-03 16:08:49

by Tony Luck

[permalink] [raw]
Subject: [PATCH v7 4/8] x86/resctrl: Split the rdt_domain and rdt_hw_domain structures

The same rdt_domain structure is used for both control and monitor
functions. But this results in wasted memory as some of the fields are
only used by control functions, while most are only used for monitor
functions.

Split into separate rdt_ctrl_domain and rdt_mon_domain structures with
just the fields required for control and monitoring respectively.

Similar split of the rdt_hw_domain structure into rdt_hw_ctrl_domain
and rdt_hw_mon_domain.

Signed-off-by: Tony Luck <[email protected]>
Reviewed-by: Peter Newman <[email protected]>

---

Changes since last version:

"wrapped line not quite aligned anymore" * 2. Both fixed.

include/linux/resctrl.h | 50 +++++++------
arch/x86/kernel/cpu/resctrl/internal.h | 60 ++++++++++------
arch/x86/kernel/cpu/resctrl/core.c | 87 ++++++++++++-----------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 32 ++++-----
arch/x86/kernel/cpu/resctrl/monitor.c | 40 +++++------
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 6 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 62 ++++++++--------
7 files changed, 184 insertions(+), 153 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 0af5c5aa5a6f..1c925e3db2ea 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -63,7 +63,25 @@ struct rdt_domain_hdr {
};

/**
- * struct rdt_domain - group of CPUs sharing a resctrl resource
+ * struct rdt_ctrl_domain - group of CPUs sharing a resctrl control resource
+ * @hdr: common header for different domain types
+ * @cpu_mask: which CPUs share this resource
+ * @plr: pseudo-locked region (if any) associated with domain
+ * @staged_config: parsed configuration to be applied
+ * @mbps_val: When mba_sc is enabled, this holds the array of user
+ * specified control values for mba_sc in MBps, indexed
+ * by closid
+ */
+struct rdt_ctrl_domain {
+ struct rdt_domain_hdr hdr;
+ struct cpumask cpu_mask;
+ struct pseudo_lock_region *plr;
+ struct resctrl_staged_config staged_config[CDP_NUM_TYPES];
+ u32 *mbps_val;
+};
+
+/**
+ * struct rdt_mon_domain - group of CPUs sharing a resctrl control resource
* @hdr: common header for different domain types
* @cpu_mask: which CPUs share this resource
* @rmid_busy_llc: bitmap of which limbo RMIDs are above threshold
@@ -73,13 +91,8 @@ struct rdt_domain_hdr {
* @cqm_limbo: worker to periodically read CQM h/w counters
* @mbm_work_cpu: worker CPU for MBM h/w counters
* @cqm_work_cpu: worker CPU for CQM h/w counters
- * @plr: pseudo-locked region (if any) associated with domain
- * @staged_config: parsed configuration to be applied
- * @mbps_val: When mba_sc is enabled, this holds the array of user
- * specified control values for mba_sc in MBps, indexed
- * by closid
*/
-struct rdt_domain {
+struct rdt_mon_domain {
struct rdt_domain_hdr hdr;
struct cpumask cpu_mask;
unsigned long *rmid_busy_llc;
@@ -89,9 +102,6 @@ struct rdt_domain {
struct delayed_work cqm_limbo;
int mbm_work_cpu;
int cqm_work_cpu;
- struct pseudo_lock_region *plr;
- struct resctrl_staged_config staged_config[CDP_NUM_TYPES];
- u32 *mbps_val;
};

/**
@@ -195,7 +205,7 @@ struct rdt_resource {
const char *format_str;
int (*parse_ctrlval)(struct rdt_parse_data *data,
struct resctrl_schema *s,
- struct rdt_domain *d);
+ struct rdt_ctrl_domain *d);
struct list_head evt_list;
unsigned long fflags;
bool cdp_capable;
@@ -229,15 +239,15 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
* Update the ctrl_val and apply this config right now.
* Must be called on one of the domain's CPUs.
*/
-int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type t, u32 cfg_val);

-u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
+u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type type);
-int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
-int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
-void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
-void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
+int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d);
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d);
+void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d);
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d);

/**
* resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
@@ -253,7 +263,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
* Return:
* 0 on success, or -EIO, -EINVAL etc on error.
*/
-int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 rmid, enum resctrl_event_id eventid, u64 *val);

/**
@@ -266,7 +276,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
*
* This can be called from any CPU.
*/
-void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 rmid, enum resctrl_event_id eventid);

/**
@@ -278,7 +288,7 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
*
* This can be called from any CPU.
*/
-void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);

extern unsigned int resctrl_rmid_realloc_threshold;
extern unsigned int resctrl_rmid_realloc_limit;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e9a2a8993d14..ee38249c6f1d 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -106,7 +106,7 @@ union mon_data_bits {
struct rmid_read {
struct rdtgroup *rgrp;
struct rdt_resource *r;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;
enum resctrl_event_id evtid;
bool first;
int err;
@@ -191,7 +191,7 @@ struct mongroup {
*/
struct pseudo_lock_region {
struct resctrl_schema *s;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
u32 cbm;
wait_queue_head_t lock_thread_wq;
int thread_done;
@@ -319,25 +319,41 @@ struct arch_mbm_state {
};

/**
- * struct rdt_hw_domain - Arch private attributes of a set of CPUs that share
- * a resource
+ * struct rdt_hw_ctrl_domain - Arch private attributes of a set of CPUs that share
+ * a resource for a control function
* @d_resctrl: Properties exposed to the resctrl file system
* @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID)
+ *
+ * Members of this structure are accessed via helpers that provide abstraction.
+ */
+struct rdt_hw_ctrl_domain {
+ struct rdt_ctrl_domain d_resctrl;
+ u32 *ctrl_val;
+};
+
+/**
+ * struct rdt_hw_mon_domain - Arch private attributes of a set of CPUs that share
+ * a resource for a monitor function
+ * @d_resctrl: Properties exposed to the resctrl file system
* @arch_mbm_total: arch private state for MBM total bandwidth
* @arch_mbm_local: arch private state for MBM local bandwidth
*
* Members of this structure are accessed via helpers that provide abstraction.
*/
-struct rdt_hw_domain {
- struct rdt_domain d_resctrl;
- u32 *ctrl_val;
+struct rdt_hw_mon_domain {
+ struct rdt_mon_domain d_resctrl;
struct arch_mbm_state *arch_mbm_total;
struct arch_mbm_state *arch_mbm_local;
};

-static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
+static inline struct rdt_hw_ctrl_domain *resctrl_to_arch_ctrl_dom(struct rdt_ctrl_domain *r)
{
- return container_of(r, struct rdt_hw_domain, d_resctrl);
+ return container_of(r, struct rdt_hw_ctrl_domain, d_resctrl);
+}
+
+static inline struct rdt_hw_mon_domain *resctrl_to_arch_mon_dom(struct rdt_mon_domain *r)
+{
+ return container_of(r, struct rdt_hw_mon_domain, d_resctrl);
}

/**
@@ -405,7 +421,7 @@ struct rdt_hw_resource {
struct rdt_resource r_resctrl;
u32 num_closid;
unsigned int msr_base;
- void (*msr_update) (struct rdt_domain *d, struct msr_param *m,
+ void (*msr_update) (struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r);
unsigned int mon_scale;
unsigned int mbm_width;
@@ -418,9 +434,9 @@ static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r
}

int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,
- struct rdt_domain *d);
+ struct rdt_ctrl_domain *d);
int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
- struct rdt_domain *d);
+ struct rdt_ctrl_domain *d);

extern struct mutex rdtgroup_mutex;

@@ -517,21 +533,21 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off);
int rdtgroup_schemata_show(struct kernfs_open_file *of,
struct seq_file *s, void *v);
-bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_domain *d,
+bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_ctrl_domain *d,
unsigned long cbm, int closid, bool exclusive);
-unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, struct rdt_domain *d,
+unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, struct rdt_ctrl_domain *d,
unsigned long cbm);
enum rdtgrp_mode rdtgroup_mode_by_closid(int closid);
int rdtgroup_tasks_assigned(struct rdtgroup *r);
int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp);
-bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, unsigned long cbm);
-bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d);
+bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_ctrl_domain *d, unsigned long cbm);
+bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_ctrl_domain *d);
int rdt_pseudo_lock_init(void);
void rdt_pseudo_lock_release(void);
int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp);
void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp);
-struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
+struct rdt_ctrl_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
int closids_supported(void);
void closid_free(int closid);
int alloc_rmid(void);
@@ -541,17 +557,17 @@ bool __init rdt_cpu_has(int flag);
void mon_event_count(void *info);
int rdtgroup_mondata_show(struct seq_file *m, void *arg);
void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
- struct rdt_domain *d, struct rdtgroup *rdtgrp,
+ struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
int evtid, int first);
-void mbm_setup_overflow_handler(struct rdt_domain *dom,
+void mbm_setup_overflow_handler(struct rdt_mon_domain *dom,
unsigned long delay_ms);
void mbm_handle_overflow(struct work_struct *work);
void __init intel_rdt_mbm_apply_quirk(void);
bool is_mba_sc(struct rdt_resource *r);
-void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
+void cqm_setup_limbo_handler(struct rdt_mon_domain *dom, unsigned long delay_ms);
void cqm_handle_limbo(struct work_struct *work);
-bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
-void __check_limbo(struct rdt_domain *d, bool force_free);
+bool has_busy_rmid(struct rdt_resource *r, struct rdt_mon_domain *d);
+void __check_limbo(struct rdt_mon_domain *d, bool force_free);
void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
void __init thread_throttle_mode_init(void);
void __init mbm_config_rftype_init(const char *config);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 7ef178fb7c77..726f00c01079 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -49,12 +49,12 @@ int max_name_width, max_data_width;
bool rdt_alloc_capable;

static void
-mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
+mba_wrmsr_intel(struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r);
static void
-cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r);
+cat_wrmsr(struct rdt_ctrl_domain *d, struct msr_param *m, struct rdt_resource *r);
static void
-mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
+mba_wrmsr_amd(struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r);

#define ctrl_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.ctrl_domains)
@@ -303,11 +303,11 @@ static void rdt_get_cdp_l2_config(void)
}

static void
-mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
+mba_wrmsr_amd(struct rdt_ctrl_domain *d, struct msr_param *m, struct rdt_resource *r)
{
- unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ unsigned int i;

for (i = m->low; i < m->high; i++)
wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
@@ -328,12 +328,12 @@ static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
}

static void
-mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
+mba_wrmsr_intel(struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r)
{
- unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ unsigned int i;

/* Write the delay values for mba. */
for (i = m->low; i < m->high; i++)
@@ -341,19 +341,19 @@ mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
}

static void
-cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
+cat_wrmsr(struct rdt_ctrl_domain *d, struct msr_param *m, struct rdt_resource *r)
{
- unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ unsigned int i;

for (i = m->low; i < m->high; i++)
wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
}

-struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
+struct rdt_ctrl_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
{
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;

list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
/* Find the domain that contains this CPU */
@@ -375,7 +375,7 @@ void rdt_ctrl_update(void *arg)
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
struct rdt_resource *r = m->res;
int cpu = smp_processor_id();
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;

d = get_domain_from_cpu(cpu, r);
if (d) {
@@ -443,18 +443,23 @@ static void setup_default_ctrlval(struct rdt_resource *r, u32 *dc)
*dc = r->default_ctrl;
}

-static void domain_free(struct rdt_hw_domain *hw_dom)
+static void ctrl_domain_free(struct rdt_hw_ctrl_domain *hw_dom)
+{
+ kfree(hw_dom->ctrl_val);
+ kfree(hw_dom);
+}
+
+static void mon_domain_free(struct rdt_hw_mon_domain *hw_dom)
{
kfree(hw_dom->arch_mbm_total);
kfree(hw_dom->arch_mbm_local);
- kfree(hw_dom->ctrl_val);
kfree(hw_dom);
}

-static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
+static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *d)
{
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct msr_param m;
u32 *dc;

@@ -477,7 +482,7 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
* @num_rmid: The size of the MBM counter array
* @hw_dom: The domain that owns the allocated arrays
*/
-static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
+static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_mon_domain *hw_dom)
{
size_t tsize;

@@ -516,10 +521,10 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
+ struct rdt_hw_ctrl_domain *hw_dom;
struct list_head *add_pos = NULL;
- struct rdt_hw_domain *hw_dom;
struct rdt_domain_hdr *hdr;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
int err;

if (id < 0) {
@@ -533,7 +538,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
pr_warn("Couldn't find control scope id=%d for CPU %d\n", id, cpu);
return;
}
- d = container_of(hdr, struct rdt_domain, hdr);
+ d = container_of(hdr, struct rdt_ctrl_domain, hdr);

if (d) {
cpumask_set_cpu(cpu, &d->cpu_mask);
@@ -553,7 +558,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
rdt_domain_reconfigure_cdp(r);

if (domain_setup_ctrlval(r, d)) {
- domain_free(hw_dom);
+ ctrl_domain_free(hw_dom);
return;
}

@@ -562,17 +567,17 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
err = resctrl_online_ctrl_domain(r, d);
if (err) {
list_del(&d->hdr.list);
- domain_free(hw_dom);
+ ctrl_domain_free(hw_dom);
}
}

static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->mon_scope);
+ struct rdt_hw_mon_domain *hw_mondom;
struct list_head *add_pos = NULL;
- struct rdt_hw_domain *hw_mondom;
struct rdt_domain_hdr *hdr;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;
int err;

if (id < 0) {
@@ -586,7 +591,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
pr_warn("Couldn't find monitor scope id=%d for CPU %d\n", id, cpu);
return;
}
- d = container_of(hdr, struct rdt_domain, hdr);
+ d = container_of(hdr, struct rdt_mon_domain, hdr);

if (d) {
cpumask_set_cpu(cpu, &d->cpu_mask);
@@ -602,7 +607,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
cpumask_set_cpu(cpu, &d->cpu_mask);

if (arch_domain_mbm_alloc(r->num_rmid, hw_mondom)) {
- domain_free(hw_mondom);
+ mon_domain_free(hw_mondom);
return;
}

@@ -611,7 +616,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
err = resctrl_online_mon_domain(r, d);
if (err) {
list_del(&d->hdr.list);
- domain_free(hw_mondom);
+ mon_domain_free(hw_mondom);
}
}

@@ -629,9 +634,9 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
- struct rdt_hw_domain *hw_dom;
+ struct rdt_hw_ctrl_domain *hw_dom;
struct rdt_domain_hdr *hdr;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;

if (id < 0)
return;
@@ -641,8 +646,8 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
pr_warn("Couldn't find control scope id=%d for CPU %d\n", id, cpu);
return;
}
- d = container_of(hdr, struct rdt_domain, hdr);
- hw_dom = resctrl_to_arch_dom(d);
+ d = container_of(hdr, struct rdt_ctrl_domain, hdr);
+ hw_dom = resctrl_to_arch_ctrl_dom(d);

cpumask_clear_cpu(cpu, &d->cpu_mask);
if (cpumask_empty(&d->cpu_mask)) {
@@ -650,12 +655,12 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
list_del(&d->hdr.list);

/*
- * rdt_domain "d" is going to be freed below, so clear
+ * rdt_ctrl_domain "d" is going to be freed below, so clear
* its pointer from pseudo_lock_region struct.
*/
if (d->plr)
d->plr->d = NULL;
- domain_free(hw_dom);
+ ctrl_domain_free(hw_dom);

return;
}
@@ -664,9 +669,9 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->mon_scope);
- struct rdt_hw_domain *hw_mondom;
+ struct rdt_hw_mon_domain *hw_mondom;
struct rdt_domain_hdr *hdr;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;

if (id < 0)
return;
@@ -676,14 +681,14 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
pr_warn("Couldn't find scope id=%d for CPU %d\n", id, cpu);
return;
}
- d = container_of(hdr, struct rdt_domain, hdr);
- hw_mondom = resctrl_to_arch_dom(d);
+ d = container_of(hdr, struct rdt_mon_domain, hdr);
+ hw_mondom = resctrl_to_arch_mon_dom(d);

cpumask_clear_cpu(cpu, &d->cpu_mask);
if (cpumask_empty(&d->cpu_mask)) {
resctrl_offline_mon_domain(r, d);
list_del(&d->hdr.list);
- domain_free(hw_mondom);
+ mon_domain_free(hw_mondom);

return;
}
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index a6261e177cc1..7513eba9feaf 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -58,7 +58,7 @@ static bool bw_validate(char *buf, unsigned long *data, struct rdt_resource *r)
}

int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
- struct rdt_domain *d)
+ struct rdt_ctrl_domain *d)
{
struct resctrl_staged_config *cfg;
u32 closid = data->rdtgrp->closid;
@@ -135,7 +135,7 @@ static bool cbm_validate(char *buf, u32 *data, struct rdt_resource *r)
* resource type.
*/
int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,
- struct rdt_domain *d)
+ struct rdt_ctrl_domain *d)
{
struct rdtgroup *rdtgrp = data->rdtgrp;
struct resctrl_staged_config *cfg;
@@ -205,7 +205,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
struct rdt_resource *r = s->res;
struct rdt_parse_data data;
char *dom = NULL, *id;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
unsigned long dom_id;

if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
@@ -265,11 +265,11 @@ static u32 get_config_index(u32 closid, enum resctrl_conf_type type)
}
}

-static bool apply_config(struct rdt_hw_domain *hw_dom,
+static bool apply_config(struct rdt_hw_ctrl_domain *hw_dom,
struct resctrl_staged_config *cfg, u32 idx,
cpumask_var_t cpu_mask)
{
- struct rdt_domain *dom = &hw_dom->d_resctrl;
+ struct rdt_ctrl_domain *dom = &hw_dom->d_resctrl;

if (cfg->new_ctrl != hw_dom->ctrl_val[idx]) {
cpumask_set_cpu(cpumask_any(&dom->cpu_mask), cpu_mask);
@@ -281,11 +281,11 @@ static bool apply_config(struct rdt_hw_domain *hw_dom,
return false;
}

-int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type t, u32 cfg_val)
{
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
u32 idx = get_config_index(closid, t);
struct msr_param msr_param;

@@ -305,11 +305,11 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
{
struct resctrl_staged_config *cfg;
- struct rdt_hw_domain *hw_dom;
+ struct rdt_hw_ctrl_domain *hw_dom;
struct msr_param msr_param;
enum resctrl_conf_type t;
+ struct rdt_ctrl_domain *d;
cpumask_var_t cpu_mask;
- struct rdt_domain *d;
u32 idx;

if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
@@ -317,7 +317,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)

msr_param.res = NULL;
list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
- hw_dom = resctrl_to_arch_dom(d);
+ hw_dom = resctrl_to_arch_ctrl_dom(d);
for (t = 0; t < CDP_NUM_TYPES; t++) {
cfg = &hw_dom->d_resctrl.staged_config[t];
if (!cfg->have_new_ctrl)
@@ -447,10 +447,10 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
return ret ?: nbytes;
}

-u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
+u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type type)
{
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
u32 idx = get_config_index(closid, type);

return hw_dom->ctrl_val[idx];
@@ -459,7 +459,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int closid)
{
struct rdt_resource *r = schema->res;
- struct rdt_domain *dom;
+ struct rdt_ctrl_domain *dom;
bool sep = false;
u32 ctrl_val;

@@ -521,7 +521,7 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
}

void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
- struct rdt_domain *d, struct rdtgroup *rdtgrp,
+ struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
int evtid, int first)
{
/*
@@ -541,11 +541,11 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
{
struct kernfs_open_file *of = m->private;
struct rdt_domain_hdr *hdr;
+ struct rdt_mon_domain *d;
u32 resid, evtid, domid;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
union mon_data_bits md;
- struct rdt_domain *d;
struct rmid_read rr;
int ret = 0;

@@ -566,7 +566,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
ret = -ENOENT;
goto out;
}
- d = container_of(hdr, struct rdt_domain, hdr);
+ d = container_of(hdr, struct rdt_mon_domain, hdr);

mon_event_read(&rr, r, d, rdtgrp, evtid, false);

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 3265b8499e2a..97d2ed829f5d 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -170,7 +170,7 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
return 0;
}

-static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain *hw_dom,
+static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_dom,
u32 rmid,
enum resctrl_event_id eventid)
{
@@ -189,10 +189,10 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain *hw_dom,
return NULL;
}

-void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 rmid, enum resctrl_event_id eventid)
{
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
struct arch_mbm_state *am;

am = get_arch_mbm_state(hw_dom, rmid, eventid);
@@ -208,9 +208,9 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
* Assumes that hardware counters are also reset and thus that there is
* no need to record initial non-zero counts.
*/
-void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d)
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
{
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);

if (is_mbm_total_enabled())
memset(hw_dom->arch_mbm_total, 0,
@@ -229,11 +229,11 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
return chunks >> shift;
}

-int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 rmid, enum resctrl_event_id eventid, u64 *val)
{
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct arch_mbm_state *am;
u64 msr_val, chunks;
int ret;
@@ -266,7 +266,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
* decrement the count. If the busy count gets to zero on an RMID, we
* free the RMID
*/
-void __check_limbo(struct rdt_domain *d, bool force_free)
+void __check_limbo(struct rdt_mon_domain *d, bool force_free)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
struct rmid_entry *entry;
@@ -305,7 +305,7 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
}
}

-bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d)
+bool has_busy_rmid(struct rdt_resource *r, struct rdt_mon_domain *d)
{
return find_first_bit(d->rmid_busy_llc, r->num_rmid) != r->num_rmid;
}
@@ -334,7 +334,7 @@ int alloc_rmid(void)
static void add_rmid_to_limbo(struct rmid_entry *entry)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;
int cpu, err;
u64 val = 0;

@@ -383,7 +383,7 @@ void free_rmid(u32 rmid)
list_add_tail(&entry->list, &rmid_free_lru);
}

-static struct mbm_state *get_mbm_state(struct rdt_domain *d, u32 rmid,
+static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 rmid,
enum resctrl_event_id evtid)
{
switch (evtid) {
@@ -516,13 +516,13 @@ void mon_event_count(void *info)
* throttle MSRs already have low percentage values. To avoid
* unnecessarily restricting such rdtgroups, we also increase the bandwidth.
*/
-static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
+static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
{
u32 closid, rmid, cur_msr_val, new_msr_val;
struct mbm_state *pmbm_data, *cmbm_data;
+ struct rdt_ctrl_domain *dom_mba;
u32 cur_bw, delta_bw, user_bw;
struct rdt_resource *r_mba;
- struct rdt_domain *dom_mba;
struct list_head *head;
struct rdtgroup *entry;

@@ -600,7 +600,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
}
}

-static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
+static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d, int rmid)
{
struct rmid_read rr;

@@ -640,13 +640,13 @@ void cqm_handle_limbo(struct work_struct *work)
{
unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
int cpu = smp_processor_id();
+ struct rdt_mon_domain *d;
struct rdt_resource *r;
- struct rdt_domain *d;

mutex_lock(&rdtgroup_mutex);

r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- d = container_of(work, struct rdt_domain, cqm_limbo.work);
+ d = container_of(work, struct rdt_mon_domain, cqm_limbo.work);

__check_limbo(d, false);

@@ -656,7 +656,7 @@ void cqm_handle_limbo(struct work_struct *work)
mutex_unlock(&rdtgroup_mutex);
}

-void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms)
+void cqm_setup_limbo_handler(struct rdt_mon_domain *dom, unsigned long delay_ms)
{
unsigned long delay = msecs_to_jiffies(delay_ms);
int cpu;
@@ -672,9 +672,9 @@ void mbm_handle_overflow(struct work_struct *work)
unsigned long delay = msecs_to_jiffies(MBM_OVERFLOW_INTERVAL);
struct rdtgroup *prgrp, *crgrp;
int cpu = smp_processor_id();
+ struct rdt_mon_domain *d;
struct list_head *head;
struct rdt_resource *r;
- struct rdt_domain *d;

mutex_lock(&rdtgroup_mutex);

@@ -682,7 +682,7 @@ void mbm_handle_overflow(struct work_struct *work)
goto out_unlock;

r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- d = container_of(work, struct rdt_domain, mbm_over.work);
+ d = container_of(work, struct rdt_mon_domain, mbm_over.work);

list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
mbm_update(r, d, prgrp->mon.rmid);
@@ -701,7 +701,7 @@ void mbm_handle_overflow(struct work_struct *work)
mutex_unlock(&rdtgroup_mutex);
}

-void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
+void mbm_setup_overflow_handler(struct rdt_mon_domain *dom, unsigned long delay_ms)
{
unsigned long delay = msecs_to_jiffies(delay_ms);
int cpu;
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index bda32b4e1c1e..675e9e47af54 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -814,7 +814,7 @@ int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp)
* Return: true if @cbm overlaps with pseudo-locked region on @d, false
* otherwise.
*/
-bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, unsigned long cbm)
+bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_ctrl_domain *d, unsigned long cbm)
{
unsigned int cbm_len;
unsigned long cbm_b;
@@ -841,11 +841,11 @@ bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, unsigned long cbm
* if it is not possible to test due to memory allocation issue,
* false otherwise.
*/
-bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
+bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_ctrl_domain *d)
{
+ struct rdt_ctrl_domain *d_i;
cpumask_var_t cpu_with_psl;
struct rdt_resource *r;
- struct rdt_domain *d_i;
bool ret = false;

if (!zalloc_cpumask_var(&cpu_with_psl, GFP_KERNEL))
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 8132f81f31bb..b0901fb95aa9 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -80,8 +80,8 @@ void rdt_last_cmd_printf(const char *fmt, ...)

void rdt_staged_configs_clear(void)
{
+ struct rdt_ctrl_domain *dom;
struct rdt_resource *r;
- struct rdt_domain *dom;

lockdep_assert_held(&rdtgroup_mutex);

@@ -920,7 +920,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
unsigned long sw_shareable = 0, hw_shareable = 0;
unsigned long exclusive = 0, pseudo_locked = 0;
struct rdt_resource *r = s->res;
- struct rdt_domain *dom;
+ struct rdt_ctrl_domain *dom;
int i, hwb, swb, excl, psl;
enum rdtgrp_mode mode;
bool sep = false;
@@ -1137,7 +1137,7 @@ static enum resctrl_conf_type resctrl_peer_type(enum resctrl_conf_type my_type)
*
* Return: false if CBM does not overlap, true if it does.
*/
-static bool __rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d,
+static bool __rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_ctrl_domain *d,
unsigned long cbm, int closid,
enum resctrl_conf_type type, bool exclusive)
{
@@ -1192,7 +1192,7 @@ static bool __rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d
*
* Return: true if CBM overlap detected, false if there is no overlap
*/
-bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_domain *d,
+bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_ctrl_domain *d,
unsigned long cbm, int closid, bool exclusive)
{
enum resctrl_conf_type peer_type = resctrl_peer_type(s->conf_type);
@@ -1222,10 +1222,10 @@ bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_domain *d,
static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
{
int closid = rdtgrp->closid;
+ struct rdt_ctrl_domain *d;
struct resctrl_schema *s;
struct rdt_resource *r;
bool has_cache = false;
- struct rdt_domain *d;
u32 ctrl;

list_for_each_entry(s, &resctrl_schema_all, list) {
@@ -1339,7 +1339,7 @@ static ssize_t rdtgroup_mode_write(struct kernfs_open_file *of,
* bitmap functions work correctly.
*/
unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
- struct rdt_domain *d, unsigned long cbm)
+ struct rdt_ctrl_domain *d, unsigned long cbm)
{
struct cpu_cacheinfo *ci;
unsigned int size = 0;
@@ -1372,9 +1372,9 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
{
struct resctrl_schema *schema;
enum resctrl_conf_type type;
+ struct rdt_ctrl_domain *d;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
- struct rdt_domain *d;
unsigned int size;
int ret = 0;
u32 closid;
@@ -1486,7 +1486,7 @@ static void mon_event_config_read(void *info)
mon_info->mon_config = msrval & MAX_EVT_CONFIG_BITS;
}

-static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
+static void mondata_config_read(struct rdt_mon_domain *d, struct mon_config_info *mon_info)
{
smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
}
@@ -1494,7 +1494,7 @@ static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mo
static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
{
struct mon_config_info mon_info = {0};
- struct rdt_domain *dom;
+ struct rdt_mon_domain *dom;
bool sep = false;

mutex_lock(&rdtgroup_mutex);
@@ -1551,7 +1551,7 @@ static void mon_event_config_write(void *info)
}

static int mbm_config_write_domain(struct rdt_resource *r,
- struct rdt_domain *d, u32 evtid, u32 val)
+ struct rdt_mon_domain *d, u32 evtid, u32 val)
{
struct mon_config_info mon_info = {0};
int ret = 0;
@@ -1601,7 +1601,7 @@ static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
{
char *dom_str = NULL, *id_str;
unsigned long dom_id, val;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;
int ret = 0;

next:
@@ -2125,9 +2125,9 @@ static inline bool is_mba_linear(void)
static int set_cache_qos_cfg(int level, bool enable)
{
void (*update)(void *arg);
+ struct rdt_ctrl_domain *d;
struct rdt_resource *r_l;
cpumask_var_t cpu_mask;
- struct rdt_domain *d;
int cpu;

if (level == RDT_RESOURCE_L3)
@@ -2174,7 +2174,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
l3_qos_cfg_update(&hw_res->cdp_enabled);
}

-static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_domain *d)
+static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_ctrl_domain *d)
{
u32 num_closid = resctrl_arch_get_num_closid(r);
int cpu = cpumask_any(&d->cpu_mask);
@@ -2192,7 +2192,7 @@ static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_domain *d)
}

static void mba_sc_domain_destroy(struct rdt_resource *r,
- struct rdt_domain *d)
+ struct rdt_ctrl_domain *d)
{
kfree(d->mbps_val);
d->mbps_val = NULL;
@@ -2218,7 +2218,7 @@ static int set_mba_sc(bool mba_sc)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
u32 num_closid = resctrl_arch_get_num_closid(r);
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
int i;

if (!supports_mba_mbps() || mba_sc == is_mba_sc(r))
@@ -2466,7 +2466,7 @@ static void schemata_list_destroy(void)
static int rdt_get_tree(struct fs_context *fc)
{
struct rdt_fs_context *ctx = rdt_fc2context(fc);
- struct rdt_domain *dom;
+ struct rdt_mon_domain *dom;
struct rdt_resource *r;
int ret;

@@ -2634,10 +2634,10 @@ static int rdt_init_fs_context(struct fs_context *fc)
static int reset_all_ctrls(struct rdt_resource *r)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- struct rdt_hw_domain *hw_dom;
+ struct rdt_hw_ctrl_domain *hw_dom;
struct msr_param msr_param;
+ struct rdt_ctrl_domain *d;
cpumask_var_t cpu_mask;
- struct rdt_domain *d;
int i;

if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
@@ -2653,7 +2653,7 @@ static int reset_all_ctrls(struct rdt_resource *r)
* from each domain to update the MSRs below.
*/
list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
- hw_dom = resctrl_to_arch_dom(d);
+ hw_dom = resctrl_to_arch_ctrl_dom(d);
cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);

for (i = 0; i < hw_res->num_closid; i++)
@@ -2848,7 +2848,7 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
}

static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
- struct rdt_domain *d,
+ struct rdt_mon_domain *d,
struct rdt_resource *r, struct rdtgroup *prgrp)
{
union mon_data_bits priv;
@@ -2897,7 +2897,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
* and "monitor" groups with given domain id.
*/
static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
- struct rdt_domain *d)
+ struct rdt_mon_domain *d)
{
struct kernfs_node *parent_kn;
struct rdtgroup *prgrp, *crgrp;
@@ -2919,7 +2919,7 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn,
struct rdt_resource *r,
struct rdtgroup *prgrp)
{
- struct rdt_domain *dom;
+ struct rdt_mon_domain *dom;
int ret;

list_for_each_entry(dom, &r->mon_domains, hdr.list) {
@@ -3021,7 +3021,7 @@ static u32 cbm_ensure_valid(u32 _val, struct rdt_resource *r)
* Set the RDT domain up to start off with all usable allocations. That is,
* all shareable and unused bits. All-zero CBM is invalid.
*/
-static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema *s,
+static int __init_one_rdt_domain(struct rdt_ctrl_domain *d, struct resctrl_schema *s,
u32 closid)
{
enum resctrl_conf_type peer_type = resctrl_peer_type(s->conf_type);
@@ -3101,7 +3101,7 @@ static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema *s,
*/
static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid)
{
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
int ret;

list_for_each_entry(d, &s->res->ctrl_domains, hdr.list) {
@@ -3117,7 +3117,7 @@ static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid)
static void rdtgroup_init_mba(struct rdt_resource *r, u32 closid)
{
struct resctrl_staged_config *cfg;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;

list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
if (is_mba_sc(r)) {
@@ -3704,14 +3704,14 @@ static int __init rdtgroup_setup_root(void)
return ret;
}

-static void domain_destroy_mon_state(struct rdt_domain *d)
+static void domain_destroy_mon_state(struct rdt_mon_domain *d)
{
bitmap_free(d->rmid_busy_llc);
kfree(d->mbm_total);
kfree(d->mbm_local);
}

-void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
+void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d)
{
lockdep_assert_held(&rdtgroup_mutex);

@@ -3719,7 +3719,7 @@ void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
mba_sc_domain_destroy(r, d);
}

-void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
{
/*
* If resctrl is mounted, remove all the
@@ -3746,7 +3746,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
domain_destroy_mon_state(d);
}

-static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
+static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain *d)
{
size_t tsize;

@@ -3776,7 +3776,7 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

-int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
+int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d)
{
lockdep_assert_held(&rdtgroup_mutex);

@@ -3787,7 +3787,7 @@ int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

-int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
{
int err;

--
2.41.0
---
include/linux/resctrl.h | 50 +++++++------
arch/x86/kernel/cpu/resctrl/internal.h | 60 ++++++++++------
arch/x86/kernel/cpu/resctrl/core.c | 87 ++++++++++++-----------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 32 ++++-----
arch/x86/kernel/cpu/resctrl/monitor.c | 40 +++++------
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 6 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 62 ++++++++--------
7 files changed, 184 insertions(+), 153 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index f47b01ae28ca..56fa04cedb50 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -70,7 +70,25 @@ struct rdt_domain_hdr {
};

/**
- * struct rdt_domain - group of CPUs sharing a resctrl resource
+ * struct rdt_ctrl_domain - group of CPUs sharing a resctrl control resource
+ * @hdr: common header for different domain types
+ * @cpu_mask: which CPUs share this resource
+ * @plr: pseudo-locked region (if any) associated with domain
+ * @staged_config: parsed configuration to be applied
+ * @mbps_val: When mba_sc is enabled, this holds the array of user
+ * specified control values for mba_sc in MBps, indexed
+ * by closid
+ */
+struct rdt_ctrl_domain {
+ struct rdt_domain_hdr hdr;
+ struct cpumask cpu_mask;
+ struct pseudo_lock_region *plr;
+ struct resctrl_staged_config staged_config[CDP_NUM_TYPES];
+ u32 *mbps_val;
+};
+
+/**
+ * struct rdt_mon_domain - group of CPUs sharing a resctrl control resource
* @hdr: common header for different domain types
* @cpu_mask: which CPUs share this resource
* @rmid_busy_llc: bitmap of which limbo RMIDs are above threshold
@@ -80,13 +98,8 @@ struct rdt_domain_hdr {
* @cqm_limbo: worker to periodically read CQM h/w counters
* @mbm_work_cpu: worker CPU for MBM h/w counters
* @cqm_work_cpu: worker CPU for CQM h/w counters
- * @plr: pseudo-locked region (if any) associated with domain
- * @staged_config: parsed configuration to be applied
- * @mbps_val: When mba_sc is enabled, this holds the array of user
- * specified control values for mba_sc in MBps, indexed
- * by closid
*/
-struct rdt_domain {
+struct rdt_mon_domain {
struct rdt_domain_hdr hdr;
struct cpumask cpu_mask;
unsigned long *rmid_busy_llc;
@@ -96,9 +109,6 @@ struct rdt_domain {
struct delayed_work cqm_limbo;
int mbm_work_cpu;
int cqm_work_cpu;
- struct pseudo_lock_region *plr;
- struct resctrl_staged_config staged_config[CDP_NUM_TYPES];
- u32 *mbps_val;
};

/**
@@ -202,7 +212,7 @@ struct rdt_resource {
const char *format_str;
int (*parse_ctrlval)(struct rdt_parse_data *data,
struct resctrl_schema *s,
- struct rdt_domain *d);
+ struct rdt_ctrl_domain *d);
struct list_head evt_list;
unsigned long fflags;
bool cdp_capable;
@@ -236,15 +246,15 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
* Update the ctrl_val and apply this config right now.
* Must be called on one of the domain's CPUs.
*/
-int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type t, u32 cfg_val);

-u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
+u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type type);
-int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
-int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
-void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
-void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
+int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d);
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d);
+void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d);
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d);

/**
* resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
@@ -260,7 +270,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
* Return:
* 0 on success, or -EIO, -EINVAL etc on error.
*/
-int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 rmid, enum resctrl_event_id eventid, u64 *val);

/**
@@ -273,7 +283,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
*
* This can be called from any CPU.
*/
-void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 rmid, enum resctrl_event_id eventid);

/**
@@ -285,7 +295,7 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
*
* This can be called from any CPU.
*/
-void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);

extern unsigned int resctrl_rmid_realloc_threshold;
extern unsigned int resctrl_rmid_realloc_limit;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e9a2a8993d14..3aed8e7b8487 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -106,7 +106,7 @@ union mon_data_bits {
struct rmid_read {
struct rdtgroup *rgrp;
struct rdt_resource *r;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;
enum resctrl_event_id evtid;
bool first;
int err;
@@ -191,7 +191,7 @@ struct mongroup {
*/
struct pseudo_lock_region {
struct resctrl_schema *s;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
u32 cbm;
wait_queue_head_t lock_thread_wq;
int thread_done;
@@ -319,25 +319,41 @@ struct arch_mbm_state {
};

/**
- * struct rdt_hw_domain - Arch private attributes of a set of CPUs that share
- * a resource
+ * struct rdt_hw_ctrl_domain - Arch private attributes of a set of CPUs that share
+ * a resource for a control function
* @d_resctrl: Properties exposed to the resctrl file system
* @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID)
+ *
+ * Members of this structure are accessed via helpers that provide abstraction.
+ */
+struct rdt_hw_ctrl_domain {
+ struct rdt_ctrl_domain d_resctrl;
+ u32 *ctrl_val;
+};
+
+/**
+ * struct rdt_hw_mon_domain - Arch private attributes of a set of CPUs that share
+ * a resource for a monitor function
+ * @d_resctrl: Properties exposed to the resctrl file system
* @arch_mbm_total: arch private state for MBM total bandwidth
* @arch_mbm_local: arch private state for MBM local bandwidth
*
* Members of this structure are accessed via helpers that provide abstraction.
*/
-struct rdt_hw_domain {
- struct rdt_domain d_resctrl;
- u32 *ctrl_val;
+struct rdt_hw_mon_domain {
+ struct rdt_mon_domain d_resctrl;
struct arch_mbm_state *arch_mbm_total;
struct arch_mbm_state *arch_mbm_local;
};

-static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
+static inline struct rdt_hw_ctrl_domain *resctrl_to_arch_ctrl_dom(struct rdt_ctrl_domain *r)
{
- return container_of(r, struct rdt_hw_domain, d_resctrl);
+ return container_of(r, struct rdt_hw_ctrl_domain, d_resctrl);
+}
+
+static inline struct rdt_hw_mon_domain *resctrl_to_arch_mon_dom(struct rdt_mon_domain *r)
+{
+ return container_of(r, struct rdt_hw_mon_domain, d_resctrl);
}

/**
@@ -405,7 +421,7 @@ struct rdt_hw_resource {
struct rdt_resource r_resctrl;
u32 num_closid;
unsigned int msr_base;
- void (*msr_update) (struct rdt_domain *d, struct msr_param *m,
+ void (*msr_update) (struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r);
unsigned int mon_scale;
unsigned int mbm_width;
@@ -418,9 +434,9 @@ static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r
}

int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,
- struct rdt_domain *d);
+ struct rdt_ctrl_domain *d);
int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
- struct rdt_domain *d);
+ struct rdt_ctrl_domain *d);

extern struct mutex rdtgroup_mutex;

@@ -517,21 +533,21 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off);
int rdtgroup_schemata_show(struct kernfs_open_file *of,
struct seq_file *s, void *v);
-bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_domain *d,
+bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_ctrl_domain *d,
unsigned long cbm, int closid, bool exclusive);
-unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, struct rdt_domain *d,
+unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, struct rdt_ctrl_domain *d,
unsigned long cbm);
enum rdtgrp_mode rdtgroup_mode_by_closid(int closid);
int rdtgroup_tasks_assigned(struct rdtgroup *r);
int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp);
-bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, unsigned long cbm);
-bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d);
+bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_ctrl_domain *d, unsigned long cbm);
+bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_ctrl_domain *d);
int rdt_pseudo_lock_init(void);
void rdt_pseudo_lock_release(void);
int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp);
void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp);
-struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
+struct rdt_ctrl_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
int closids_supported(void);
void closid_free(int closid);
int alloc_rmid(void);
@@ -541,17 +557,17 @@ bool __init rdt_cpu_has(int flag);
void mon_event_count(void *info);
int rdtgroup_mondata_show(struct seq_file *m, void *arg);
void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
- struct rdt_domain *d, struct rdtgroup *rdtgrp,
+ struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
int evtid, int first);
-void mbm_setup_overflow_handler(struct rdt_domain *dom,
+void mbm_setup_overflow_handler(struct rdt_mon_domain *dom,
unsigned long delay_ms);
void mbm_handle_overflow(struct work_struct *work);
void __init intel_rdt_mbm_apply_quirk(void);
bool is_mba_sc(struct rdt_resource *r);
-void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
+void cqm_setup_limbo_handler(struct rdt_mon_domain *dom, unsigned long delay_ms);
void cqm_handle_limbo(struct work_struct *work);
-bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
-void __check_limbo(struct rdt_domain *d, bool force_free);
+bool has_busy_rmid(struct rdt_resource *r, struct rdt_mon_domain *d);
+void __check_limbo(struct rdt_mon_domain *d, bool force_free);
void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
void __init thread_throttle_mode_init(void);
void __init mbm_config_rftype_init(const char *config);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index ae2b66d38afc..d92fdce4e44f 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -49,12 +49,12 @@ int max_name_width, max_data_width;
bool rdt_alloc_capable;

static void
-mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
+mba_wrmsr_intel(struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r);
static void
-cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r);
+cat_wrmsr(struct rdt_ctrl_domain *d, struct msr_param *m, struct rdt_resource *r);
static void
-mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
+mba_wrmsr_amd(struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r);

#define ctrl_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.ctrl_domains)
@@ -303,11 +303,11 @@ static void rdt_get_cdp_l2_config(void)
}

static void
-mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
+mba_wrmsr_amd(struct rdt_ctrl_domain *d, struct msr_param *m, struct rdt_resource *r)
{
- unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ unsigned int i;

for (i = m->low; i < m->high; i++)
wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
@@ -328,12 +328,12 @@ static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
}

static void
-mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
+mba_wrmsr_intel(struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r)
{
- unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ unsigned int i;

/* Write the delay values for mba. */
for (i = m->low; i < m->high; i++)
@@ -341,19 +341,19 @@ mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
}

static void
-cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
+cat_wrmsr(struct rdt_ctrl_domain *d, struct msr_param *m, struct rdt_resource *r)
{
- unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ unsigned int i;

for (i = m->low; i < m->high; i++)
wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
}

-struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
+struct rdt_ctrl_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
{
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;

list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
/* Find the domain that contains this CPU */
@@ -375,7 +375,7 @@ void rdt_ctrl_update(void *arg)
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
struct rdt_resource *r = m->res;
int cpu = smp_processor_id();
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;

d = get_domain_from_cpu(cpu, r);
if (d) {
@@ -443,18 +443,23 @@ static void setup_default_ctrlval(struct rdt_resource *r, u32 *dc)
*dc = r->default_ctrl;
}

-static void domain_free(struct rdt_hw_domain *hw_dom)
+static void ctrl_domain_free(struct rdt_hw_ctrl_domain *hw_dom)
+{
+ kfree(hw_dom->ctrl_val);
+ kfree(hw_dom);
+}
+
+static void mon_domain_free(struct rdt_hw_mon_domain *hw_dom)
{
kfree(hw_dom->arch_mbm_total);
kfree(hw_dom->arch_mbm_local);
- kfree(hw_dom->ctrl_val);
kfree(hw_dom);
}

-static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
+static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *d)
{
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct msr_param m;
u32 *dc;

@@ -477,7 +482,7 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
* @num_rmid: The size of the MBM counter array
* @hw_dom: The domain that owns the allocated arrays
*/
-static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
+static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_mon_domain *hw_dom)
{
size_t tsize;

@@ -516,10 +521,10 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
+ struct rdt_hw_ctrl_domain *hw_dom;
struct list_head *add_pos = NULL;
- struct rdt_hw_domain *hw_dom;
struct rdt_domain_hdr *hdr;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
int err;

if (id < 0) {
@@ -537,7 +542,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
return;

- d = container_of(hdr, struct rdt_domain, hdr);
+ d = container_of(hdr, struct rdt_ctrl_domain, hdr);

if (d) {
cpumask_set_cpu(cpu, &d->cpu_mask);
@@ -558,7 +563,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
rdt_domain_reconfigure_cdp(r);

if (domain_setup_ctrlval(r, d)) {
- domain_free(hw_dom);
+ ctrl_domain_free(hw_dom);
return;
}

@@ -567,17 +572,17 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
err = resctrl_online_ctrl_domain(r, d);
if (err) {
list_del(&d->hdr.list);
- domain_free(hw_dom);
+ ctrl_domain_free(hw_dom);
}
}

static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->mon_scope);
+ struct rdt_hw_mon_domain *hw_dom;
struct list_head *add_pos = NULL;
- struct rdt_hw_domain *hw_dom;
struct rdt_domain_hdr *hdr;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;
int err;

if (id < 0) {
@@ -595,7 +600,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
return;

- d = container_of(hdr, struct rdt_domain, hdr);
+ d = container_of(hdr, struct rdt_mon_domain, hdr);

if (d) {
cpumask_set_cpu(cpu, &d->cpu_mask);
@@ -612,7 +617,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
cpumask_set_cpu(cpu, &d->cpu_mask);

if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
- domain_free(hw_dom);
+ mon_domain_free(hw_dom);
return;
}

@@ -621,7 +626,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
err = resctrl_online_mon_domain(r, d);
if (err) {
list_del(&d->hdr.list);
- domain_free(hw_dom);
+ mon_domain_free(hw_dom);
}
}

@@ -639,9 +644,9 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
- struct rdt_hw_domain *hw_dom;
+ struct rdt_hw_ctrl_domain *hw_dom;
struct rdt_domain_hdr *hdr;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;

if (id < 0)
return;
@@ -655,8 +660,8 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
return;

- d = container_of(hdr, struct rdt_domain, hdr);
- hw_dom = resctrl_to_arch_dom(d);
+ d = container_of(hdr, struct rdt_ctrl_domain, hdr);
+ hw_dom = resctrl_to_arch_ctrl_dom(d);

cpumask_clear_cpu(cpu, &d->cpu_mask);
if (cpumask_empty(&d->cpu_mask)) {
@@ -664,12 +669,12 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
list_del(&d->hdr.list);

/*
- * rdt_domain "d" is going to be freed below, so clear
+ * rdt_ctrl_domain "d" is going to be freed below, so clear
* its pointer from pseudo_lock_region struct.
*/
if (d->plr)
d->plr->d = NULL;
- domain_free(hw_dom);
+ ctrl_domain_free(hw_dom);

return;
}
@@ -678,9 +683,9 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->mon_scope);
- struct rdt_hw_domain *hw_dom;
+ struct rdt_hw_mon_domain *hw_dom;
struct rdt_domain_hdr *hdr;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;

if (id < 0)
return;
@@ -694,14 +699,14 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
return;

- d = container_of(hdr, struct rdt_domain, hdr);
- hw_dom = resctrl_to_arch_dom(d);
+ d = container_of(hdr, struct rdt_mon_domain, hdr);
+ hw_dom = resctrl_to_arch_mon_dom(d);

cpumask_clear_cpu(cpu, &d->cpu_mask);
if (cpumask_empty(&d->cpu_mask)) {
resctrl_offline_mon_domain(r, d);
list_del(&d->hdr.list);
- domain_free(hw_dom);
+ mon_domain_free(hw_dom);

return;
}
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 33ff4d00a08c..b0b868baaaa3 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -58,7 +58,7 @@ static bool bw_validate(char *buf, unsigned long *data, struct rdt_resource *r)
}

int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
- struct rdt_domain *d)
+ struct rdt_ctrl_domain *d)
{
struct resctrl_staged_config *cfg;
u32 closid = data->rdtgrp->closid;
@@ -135,7 +135,7 @@ static bool cbm_validate(char *buf, u32 *data, struct rdt_resource *r)
* resource type.
*/
int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,
- struct rdt_domain *d)
+ struct rdt_ctrl_domain *d)
{
struct rdtgroup *rdtgrp = data->rdtgrp;
struct resctrl_staged_config *cfg;
@@ -205,7 +205,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
struct rdt_resource *r = s->res;
struct rdt_parse_data data;
char *dom = NULL, *id;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
unsigned long dom_id;

if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
@@ -265,11 +265,11 @@ static u32 get_config_index(u32 closid, enum resctrl_conf_type type)
}
}

-static bool apply_config(struct rdt_hw_domain *hw_dom,
+static bool apply_config(struct rdt_hw_ctrl_domain *hw_dom,
struct resctrl_staged_config *cfg, u32 idx,
cpumask_var_t cpu_mask)
{
- struct rdt_domain *dom = &hw_dom->d_resctrl;
+ struct rdt_ctrl_domain *dom = &hw_dom->d_resctrl;

if (cfg->new_ctrl != hw_dom->ctrl_val[idx]) {
cpumask_set_cpu(cpumask_any(&dom->cpu_mask), cpu_mask);
@@ -281,11 +281,11 @@ static bool apply_config(struct rdt_hw_domain *hw_dom,
return false;
}

-int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type t, u32 cfg_val)
{
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
u32 idx = get_config_index(closid, t);
struct msr_param msr_param;

@@ -305,11 +305,11 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
{
struct resctrl_staged_config *cfg;
- struct rdt_hw_domain *hw_dom;
+ struct rdt_hw_ctrl_domain *hw_dom;
struct msr_param msr_param;
enum resctrl_conf_type t;
+ struct rdt_ctrl_domain *d;
cpumask_var_t cpu_mask;
- struct rdt_domain *d;
u32 idx;

if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
@@ -317,7 +317,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)

msr_param.res = NULL;
list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
- hw_dom = resctrl_to_arch_dom(d);
+ hw_dom = resctrl_to_arch_ctrl_dom(d);
for (t = 0; t < CDP_NUM_TYPES; t++) {
cfg = &hw_dom->d_resctrl.staged_config[t];
if (!cfg->have_new_ctrl)
@@ -447,10 +447,10 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
return ret ?: nbytes;
}

-u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
+u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type type)
{
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
u32 idx = get_config_index(closid, type);

return hw_dom->ctrl_val[idx];
@@ -459,7 +459,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int closid)
{
struct rdt_resource *r = schema->res;
- struct rdt_domain *dom;
+ struct rdt_ctrl_domain *dom;
bool sep = false;
u32 ctrl_val;

@@ -521,7 +521,7 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
}

void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
- struct rdt_domain *d, struct rdtgroup *rdtgrp,
+ struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
int evtid, int first)
{
/*
@@ -541,11 +541,11 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
{
struct kernfs_open_file *of = m->private;
struct rdt_domain_hdr *hdr;
+ struct rdt_mon_domain *d;
u32 resid, evtid, domid;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
union mon_data_bits md;
- struct rdt_domain *d;
struct rmid_read rr;
int ret = 0;

@@ -572,7 +572,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
goto out;
}

- d = container_of(hdr, struct rdt_domain, hdr);
+ d = container_of(hdr, struct rdt_mon_domain, hdr);

mon_event_read(&rr, r, d, rdtgrp, evtid, false);

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 3265b8499e2a..97d2ed829f5d 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -170,7 +170,7 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
return 0;
}

-static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain *hw_dom,
+static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_dom,
u32 rmid,
enum resctrl_event_id eventid)
{
@@ -189,10 +189,10 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain *hw_dom,
return NULL;
}

-void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 rmid, enum resctrl_event_id eventid)
{
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
struct arch_mbm_state *am;

am = get_arch_mbm_state(hw_dom, rmid, eventid);
@@ -208,9 +208,9 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
* Assumes that hardware counters are also reset and thus that there is
* no need to record initial non-zero counts.
*/
-void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d)
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
{
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);

if (is_mbm_total_enabled())
memset(hw_dom->arch_mbm_total, 0,
@@ -229,11 +229,11 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
return chunks >> shift;
}

-int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 rmid, enum resctrl_event_id eventid, u64 *val)
{
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct arch_mbm_state *am;
u64 msr_val, chunks;
int ret;
@@ -266,7 +266,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
* decrement the count. If the busy count gets to zero on an RMID, we
* free the RMID
*/
-void __check_limbo(struct rdt_domain *d, bool force_free)
+void __check_limbo(struct rdt_mon_domain *d, bool force_free)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
struct rmid_entry *entry;
@@ -305,7 +305,7 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
}
}

-bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d)
+bool has_busy_rmid(struct rdt_resource *r, struct rdt_mon_domain *d)
{
return find_first_bit(d->rmid_busy_llc, r->num_rmid) != r->num_rmid;
}
@@ -334,7 +334,7 @@ int alloc_rmid(void)
static void add_rmid_to_limbo(struct rmid_entry *entry)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;
int cpu, err;
u64 val = 0;

@@ -383,7 +383,7 @@ void free_rmid(u32 rmid)
list_add_tail(&entry->list, &rmid_free_lru);
}

-static struct mbm_state *get_mbm_state(struct rdt_domain *d, u32 rmid,
+static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 rmid,
enum resctrl_event_id evtid)
{
switch (evtid) {
@@ -516,13 +516,13 @@ void mon_event_count(void *info)
* throttle MSRs already have low percentage values. To avoid
* unnecessarily restricting such rdtgroups, we also increase the bandwidth.
*/
-static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
+static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
{
u32 closid, rmid, cur_msr_val, new_msr_val;
struct mbm_state *pmbm_data, *cmbm_data;
+ struct rdt_ctrl_domain *dom_mba;
u32 cur_bw, delta_bw, user_bw;
struct rdt_resource *r_mba;
- struct rdt_domain *dom_mba;
struct list_head *head;
struct rdtgroup *entry;

@@ -600,7 +600,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
}
}

-static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
+static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d, int rmid)
{
struct rmid_read rr;

@@ -640,13 +640,13 @@ void cqm_handle_limbo(struct work_struct *work)
{
unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
int cpu = smp_processor_id();
+ struct rdt_mon_domain *d;
struct rdt_resource *r;
- struct rdt_domain *d;

mutex_lock(&rdtgroup_mutex);

r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- d = container_of(work, struct rdt_domain, cqm_limbo.work);
+ d = container_of(work, struct rdt_mon_domain, cqm_limbo.work);

__check_limbo(d, false);

@@ -656,7 +656,7 @@ void cqm_handle_limbo(struct work_struct *work)
mutex_unlock(&rdtgroup_mutex);
}

-void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms)
+void cqm_setup_limbo_handler(struct rdt_mon_domain *dom, unsigned long delay_ms)
{
unsigned long delay = msecs_to_jiffies(delay_ms);
int cpu;
@@ -672,9 +672,9 @@ void mbm_handle_overflow(struct work_struct *work)
unsigned long delay = msecs_to_jiffies(MBM_OVERFLOW_INTERVAL);
struct rdtgroup *prgrp, *crgrp;
int cpu = smp_processor_id();
+ struct rdt_mon_domain *d;
struct list_head *head;
struct rdt_resource *r;
- struct rdt_domain *d;

mutex_lock(&rdtgroup_mutex);

@@ -682,7 +682,7 @@ void mbm_handle_overflow(struct work_struct *work)
goto out_unlock;

r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- d = container_of(work, struct rdt_domain, mbm_over.work);
+ d = container_of(work, struct rdt_mon_domain, mbm_over.work);

list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
mbm_update(r, d, prgrp->mon.rmid);
@@ -701,7 +701,7 @@ void mbm_handle_overflow(struct work_struct *work)
mutex_unlock(&rdtgroup_mutex);
}

-void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
+void mbm_setup_overflow_handler(struct rdt_mon_domain *dom, unsigned long delay_ms)
{
unsigned long delay = msecs_to_jiffies(delay_ms);
int cpu;
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index bda32b4e1c1e..675e9e47af54 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -814,7 +814,7 @@ int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp)
* Return: true if @cbm overlaps with pseudo-locked region on @d, false
* otherwise.
*/
-bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, unsigned long cbm)
+bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_ctrl_domain *d, unsigned long cbm)
{
unsigned int cbm_len;
unsigned long cbm_b;
@@ -841,11 +841,11 @@ bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, unsigned long cbm
* if it is not possible to test due to memory allocation issue,
* false otherwise.
*/
-bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
+bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_ctrl_domain *d)
{
+ struct rdt_ctrl_domain *d_i;
cpumask_var_t cpu_with_psl;
struct rdt_resource *r;
- struct rdt_domain *d_i;
bool ret = false;

if (!zalloc_cpumask_var(&cpu_with_psl, GFP_KERNEL))
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 90565bb44d0e..afa7a8dca48d 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -80,8 +80,8 @@ void rdt_last_cmd_printf(const char *fmt, ...)

void rdt_staged_configs_clear(void)
{
+ struct rdt_ctrl_domain *dom;
struct rdt_resource *r;
- struct rdt_domain *dom;

lockdep_assert_held(&rdtgroup_mutex);

@@ -920,7 +920,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
unsigned long sw_shareable = 0, hw_shareable = 0;
unsigned long exclusive = 0, pseudo_locked = 0;
struct rdt_resource *r = s->res;
- struct rdt_domain *dom;
+ struct rdt_ctrl_domain *dom;
int i, hwb, swb, excl, psl;
enum rdtgrp_mode mode;
bool sep = false;
@@ -1137,7 +1137,7 @@ static enum resctrl_conf_type resctrl_peer_type(enum resctrl_conf_type my_type)
*
* Return: false if CBM does not overlap, true if it does.
*/
-static bool __rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d,
+static bool __rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_ctrl_domain *d,
unsigned long cbm, int closid,
enum resctrl_conf_type type, bool exclusive)
{
@@ -1192,7 +1192,7 @@ static bool __rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d
*
* Return: true if CBM overlap detected, false if there is no overlap
*/
-bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_domain *d,
+bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_ctrl_domain *d,
unsigned long cbm, int closid, bool exclusive)
{
enum resctrl_conf_type peer_type = resctrl_peer_type(s->conf_type);
@@ -1222,10 +1222,10 @@ bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_domain *d,
static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
{
int closid = rdtgrp->closid;
+ struct rdt_ctrl_domain *d;
struct resctrl_schema *s;
struct rdt_resource *r;
bool has_cache = false;
- struct rdt_domain *d;
u32 ctrl;

list_for_each_entry(s, &resctrl_schema_all, list) {
@@ -1339,7 +1339,7 @@ static ssize_t rdtgroup_mode_write(struct kernfs_open_file *of,
* bitmap functions work correctly.
*/
unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
- struct rdt_domain *d, unsigned long cbm)
+ struct rdt_ctrl_domain *d, unsigned long cbm)
{
struct cpu_cacheinfo *ci;
unsigned int size = 0;
@@ -1372,9 +1372,9 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
{
struct resctrl_schema *schema;
enum resctrl_conf_type type;
+ struct rdt_ctrl_domain *d;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
- struct rdt_domain *d;
unsigned int size;
int ret = 0;
u32 closid;
@@ -1486,7 +1486,7 @@ static void mon_event_config_read(void *info)
mon_info->mon_config = msrval & MAX_EVT_CONFIG_BITS;
}

-static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
+static void mondata_config_read(struct rdt_mon_domain *d, struct mon_config_info *mon_info)
{
smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
}
@@ -1494,7 +1494,7 @@ static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mo
static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
{
struct mon_config_info mon_info = {0};
- struct rdt_domain *dom;
+ struct rdt_mon_domain *dom;
bool sep = false;

mutex_lock(&rdtgroup_mutex);
@@ -1551,7 +1551,7 @@ static void mon_event_config_write(void *info)
}

static int mbm_config_write_domain(struct rdt_resource *r,
- struct rdt_domain *d, u32 evtid, u32 val)
+ struct rdt_mon_domain *d, u32 evtid, u32 val)
{
struct mon_config_info mon_info = {0};
int ret = 0;
@@ -1601,7 +1601,7 @@ static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
{
char *dom_str = NULL, *id_str;
unsigned long dom_id, val;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;
int ret = 0;

next:
@@ -2125,9 +2125,9 @@ static inline bool is_mba_linear(void)
static int set_cache_qos_cfg(int level, bool enable)
{
void (*update)(void *arg);
+ struct rdt_ctrl_domain *d;
struct rdt_resource *r_l;
cpumask_var_t cpu_mask;
- struct rdt_domain *d;
int cpu;

if (level == RDT_RESOURCE_L3)
@@ -2174,7 +2174,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
l3_qos_cfg_update(&hw_res->cdp_enabled);
}

-static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_domain *d)
+static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_ctrl_domain *d)
{
u32 num_closid = resctrl_arch_get_num_closid(r);
int cpu = cpumask_any(&d->cpu_mask);
@@ -2192,7 +2192,7 @@ static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_domain *d)
}

static void mba_sc_domain_destroy(struct rdt_resource *r,
- struct rdt_domain *d)
+ struct rdt_ctrl_domain *d)
{
kfree(d->mbps_val);
d->mbps_val = NULL;
@@ -2218,7 +2218,7 @@ static int set_mba_sc(bool mba_sc)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
u32 num_closid = resctrl_arch_get_num_closid(r);
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
int i;

if (!supports_mba_mbps() || mba_sc == is_mba_sc(r))
@@ -2466,7 +2466,7 @@ static void schemata_list_destroy(void)
static int rdt_get_tree(struct fs_context *fc)
{
struct rdt_fs_context *ctx = rdt_fc2context(fc);
- struct rdt_domain *dom;
+ struct rdt_mon_domain *dom;
struct rdt_resource *r;
int ret;

@@ -2634,10 +2634,10 @@ static int rdt_init_fs_context(struct fs_context *fc)
static int reset_all_ctrls(struct rdt_resource *r)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- struct rdt_hw_domain *hw_dom;
+ struct rdt_hw_ctrl_domain *hw_dom;
struct msr_param msr_param;
+ struct rdt_ctrl_domain *d;
cpumask_var_t cpu_mask;
- struct rdt_domain *d;
int i;

if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
@@ -2653,7 +2653,7 @@ static int reset_all_ctrls(struct rdt_resource *r)
* from each domain to update the MSRs below.
*/
list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
- hw_dom = resctrl_to_arch_dom(d);
+ hw_dom = resctrl_to_arch_ctrl_dom(d);
cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);

for (i = 0; i < hw_res->num_closid; i++)
@@ -2848,7 +2848,7 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
}

static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
- struct rdt_domain *d,
+ struct rdt_mon_domain *d,
struct rdt_resource *r, struct rdtgroup *prgrp)
{
union mon_data_bits priv;
@@ -2897,7 +2897,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
* and "monitor" groups with given domain id.
*/
static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
- struct rdt_domain *d)
+ struct rdt_mon_domain *d)
{
struct kernfs_node *parent_kn;
struct rdtgroup *prgrp, *crgrp;
@@ -2919,7 +2919,7 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn,
struct rdt_resource *r,
struct rdtgroup *prgrp)
{
- struct rdt_domain *dom;
+ struct rdt_mon_domain *dom;
int ret;

list_for_each_entry(dom, &r->mon_domains, hdr.list) {
@@ -3021,7 +3021,7 @@ static u32 cbm_ensure_valid(u32 _val, struct rdt_resource *r)
* Set the RDT domain up to start off with all usable allocations. That is,
* all shareable and unused bits. All-zero CBM is invalid.
*/
-static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema *s,
+static int __init_one_rdt_domain(struct rdt_ctrl_domain *d, struct resctrl_schema *s,
u32 closid)
{
enum resctrl_conf_type peer_type = resctrl_peer_type(s->conf_type);
@@ -3101,7 +3101,7 @@ static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema *s,
*/
static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid)
{
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
int ret;

list_for_each_entry(d, &s->res->ctrl_domains, hdr.list) {
@@ -3117,7 +3117,7 @@ static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid)
static void rdtgroup_init_mba(struct rdt_resource *r, u32 closid)
{
struct resctrl_staged_config *cfg;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;

list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
if (is_mba_sc(r)) {
@@ -3704,14 +3704,14 @@ static int __init rdtgroup_setup_root(void)
return ret;
}

-static void domain_destroy_mon_state(struct rdt_domain *d)
+static void domain_destroy_mon_state(struct rdt_mon_domain *d)
{
bitmap_free(d->rmid_busy_llc);
kfree(d->mbm_total);
kfree(d->mbm_local);
}

-void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
+void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d)
{
lockdep_assert_held(&rdtgroup_mutex);

@@ -3719,7 +3719,7 @@ void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
mba_sc_domain_destroy(r, d);
}

-void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
{
lockdep_assert_held(&rdtgroup_mutex);

@@ -3748,7 +3748,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
domain_destroy_mon_state(d);
}

-static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
+static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain *d)
{
size_t tsize;

@@ -3778,7 +3778,7 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

-int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
+int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d)
{
lockdep_assert_held(&rdtgroup_mutex);

@@ -3789,7 +3789,7 @@ int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

-int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
{
int err;

--
2.41.0

2023-10-03 16:08:50

by Tony Luck

[permalink] [raw]
Subject: [PATCH v7 8/8] x86/resctrl: Update documentation with Sub-NUMA cluster changes

With Sub-NUMA Cluster mode enabled the scope of monitoring resources is
per-NODE instead of per-L3 cache. Suffixes of directories with "L3" in
their name refer to Sub-NUMA nodes instead of L3 cache ids.

Users should be aware that SNC mode also affects the amount of L3 cache
available for allocation within each SNC node.

Signed-off-by: Tony Luck <[email protected]>

---

Changes since v5:

Added addtional details about challenges tracking tasks when SNC
mode is enabled.

Documentation/arch/x86/resctrl.rst | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index cb05d90111b4..222c507089a5 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -345,9 +345,9 @@ When control is enabled all CTRL_MON groups will also contain:
When monitoring is enabled all MON groups will also contain:

"mon_data":
- This contains a set of files organized by L3 domain and by
- RDT event. E.g. on a system with two L3 domains there will
- be subdirectories "mon_L3_00" and "mon_L3_01". Each of these
+ This contains a set of files organized by L3 domain or by NUMA
+ node (depending on whether Sub-NUMA Cluster (SNC) mode is disabled
+ or enabled respectively) and by RDT event. Each of these
directories have one file per event (e.g. "llc_occupancy",
"mbm_total_bytes", and "mbm_local_bytes"). In a MON group these
files provide a read out of the current value of the event for
@@ -452,6 +452,23 @@ and 0xA are not. On a system with a 20-bit mask each bit represents 5%
of the capacity of the cache. You could partition the cache into four
equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.

+Notes on Sub-NUMA Cluster mode
+==============================
+When SNC mode is enabled Linux may load balance tasks between Sub-NUMA
+nodes much more readily than between regular NUMA nodes since the CPUs
+on Sub-NUMA nodes share the same L3 cache and the system may report
+the NUMA distance between Sub-NUMA nodes with a lower value than used
+for regular NUMA nodes. Users who do not bind tasks to the CPUs of a
+specific Sub-NUMA node must read the "llc_occupancy", "mbm_total_bytes",
+and "mbm_local_bytes" for all Sub-NUMA nodes where the tasks may execute
+to get the full view of traffic for which the tasks were the source.
+
+The cache allocation feature still provides the same number of
+bits in a mask to control allocation into the L3 cache. But each
+of those ways has its capacity reduced because the cache is divided
+between the SNC nodes. The values reported in the resctrl
+"size" files are adjusted accordingly.
+
Memory bandwidth Allocation and monitoring
==========================================

--
2.41.0

2023-10-03 16:09:21

by Tony Luck

[permalink] [raw]
Subject: [PATCH v7 5/8] x86/resctrl: Add node-scope to the options for feature scope

Currently supported resctrl features are all domain scoped the same as the
scope of the L2 or L3 caches.

Add RESCTRL_NODE as a new option for features that are scoped at the
same granularity as NUMA nodes. This is needed for Intel's Sub-NUMA
Cluster (SNC) feature where monitoring features are node scoped.

Signed-off-by: Tony Luck <[email protected]>
Reviewed-by: Peter Newman <[email protected]>
---

No changes since last version

include/linux/resctrl.h | 1 +
arch/x86/kernel/cpu/resctrl/core.c | 2 ++
2 files changed, 3 insertions(+)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 56fa04cedb50..15b98ac91272 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -172,6 +172,7 @@ struct resctrl_schema;
enum resctrl_scope {
RESCTRL_L2_CACHE = 2,
RESCTRL_L3_CACHE = 3,
+ RESCTRL_NODE,
};

/**
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index d92fdce4e44f..6b937da36e4c 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -511,6 +511,8 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
case RESCTRL_L2_CACHE:
case RESCTRL_L3_CACHE:
return get_cpu_cacheinfo_id(cpu, scope);
+ case RESCTRL_NODE:
+ return cpu_to_node(cpu);
default:
break;
}
--
2.41.0

2023-10-03 16:09:54

by Tony Luck

[permalink] [raw]
Subject: [PATCH v7 7/8] x86/resctrl: Sub NUMA Cluster detection and enable

There isn't a simple h/w bit that indicates whether a CPU is
running in Sub NUMA Cluster (SNC) mode. Infer the state by comparing
the ratio of NUMA nodes to L3 cache instances.

When SNC mode is detected, reconfigure the RMID counters by updating
the MSR_RMID_SNC_CONFIG MSR on each socket as CPUs are seen.

Clearing bit zero of the MSR divides the RMIDs and renumbers the ones
on the second SNC node to start from zero. An earlier commit includes
all the required changes in Linux to operate in this reconfigured mode.

Signed-off-by: Tony Luck <[email protected]>
Reviewed-by: Peter Newman <[email protected]>

---

Changes since last version:

Moved kfree(node_caches); earlier, to the earliest point where it
is no longer needed.

Added Granite Rapids to list of CPU models that support SNC mode.

arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/core.c | 92 ++++++++++++++++++++++++++++++
2 files changed, 93 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 1d111350197f..393d1b047617 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1100,6 +1100,7 @@
#define MSR_IA32_QM_CTR 0xc8e
#define MSR_IA32_PQR_ASSOC 0xc8f
#define MSR_IA32_L3_CBM_BASE 0xc90
+#define MSR_RMID_SNC_CONFIG 0xca0
#define MSR_IA32_L2_CBM_BASE 0xd10
#define MSR_IA32_MBA_THRTL_BASE 0xd50

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index cd189b7ca6ea..dff294529f27 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -16,11 +16,14 @@

#define pr_fmt(fmt) "resctrl: " fmt

+#include <linux/cpu.h>
#include <linux/slab.h>
#include <linux/err.h>
#include <linux/cacheinfo.h>
#include <linux/cpuhotplug.h>
+#include <linux/mod_devicetable.h>

+#include <asm/cpu_device_id.h>
#include <asm/intel-family.h>
#include <asm/resctrl.h>
#include "internal.h"
@@ -751,11 +754,42 @@ static void clear_closid_rmid(int cpu)
wrmsr(MSR_IA32_PQR_ASSOC, 0, 0);
}

+/*
+ * The power-on reset value of MSR_RMID_SNC_CONFIG is 0x1
+ * which indicates that RMIDs are configured in legacy mode.
+ * This mode is incompatible with Linux resctrl semantics
+ * as RMIDs are partitioned between SNC nodes, which requires
+ * a user to know which RMID is allocated to a task.
+ * Clearing bit 0 reconfigures the RMID counters for use
+ * in Sub NUMA Cluster mode. This mode is better for Linux.
+ * The RMID space is divided between all SNC nodes with the
+ * RMIDs renumbered to start from zero in each node when
+ * couning operations from tasks. Code to read the counters
+ * must adjust RMID counnter numbers based on SNC node. See
+ * __rmid_read() for code that does this.
+ */
+static void snc_remap_rmids(int cpu)
+{
+ u64 val;
+
+ /* Only need to enable once per package. */
+ if (cpumask_first(topology_core_cpumask(cpu)) != cpu)
+ return;
+
+ rdmsrl(MSR_RMID_SNC_CONFIG, val);
+ val &= ~BIT_ULL(0);
+ wrmsrl(MSR_RMID_SNC_CONFIG, val);
+}
+
static int resctrl_online_cpu(unsigned int cpu)
{
struct rdt_resource *r;

mutex_lock(&rdtgroup_mutex);
+
+ if (snc_nodes_per_l3_cache > 1)
+ snc_remap_rmids(cpu);
+
for_each_capable_rdt_resource(r)
domain_add_cpu(cpu, r);
/* The cpu is set in default rdtgroup after online. */
@@ -1010,11 +1044,69 @@ static __init bool get_rdt_resources(void)
return (rdt_mon_capable || rdt_alloc_capable);
}

+/* CPU models that support MSR_RMID_SNC_CONFIG */
+static const struct x86_cpu_id snc_cpu_ids[] __initconst = {
+ X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(GRANITERAPIDS_X, 0),
+ {}
+};
+
+/*
+ * There isn't a simple h/w bit that indicates whether a CPU is running
+ * in Sub NUMA Cluster (SNC) mode. Infer the state by comparing the
+ * ratio of NUMA nodes to L3 cache instances.
+ * It is not possible to accurately detemine SNC state if the system is
+ * booted with a maxcpus=N parameter. That distorts the ratio of SNC nodes
+ * to L3 caches. It will be OK if system is booted with hyperthreading
+ * disabled (since this doesn't affect the ratio).
+ */
+static __init int snc_get_config(void)
+{
+ unsigned long *node_caches;
+ int mem_only_nodes = 0;
+ int cpu, node, ret;
+ int num_l3_caches;
+
+ if (!x86_match_cpu(snc_cpu_ids))
+ return 1;
+
+ node_caches = bitmap_zalloc(nr_node_ids, GFP_KERNEL);
+ if (!node_caches)
+ return 1;
+
+ cpus_read_lock();
+ for_each_node(node) {
+ cpu = cpumask_first(cpumask_of_node(node));
+ if (cpu < nr_cpu_ids)
+ set_bit(get_cpu_cacheinfo_id(cpu, 3), node_caches);
+ else
+ mem_only_nodes++;
+ }
+ cpus_read_unlock();
+
+ num_l3_caches = bitmap_weight(node_caches, nr_node_ids);
+ kfree(node_caches);
+
+ if (!num_l3_caches)
+ return 1;
+
+ ret = (nr_node_ids - mem_only_nodes) / num_l3_caches;
+
+ if (ret > 1)
+ rdt_resources_all[RDT_RESOURCE_L3].r_resctrl.mon_scope = RESCTRL_NODE;
+
+ return ret;
+}
+
static __init void rdt_init_res_defs_intel(void)
{
struct rdt_hw_resource *hw_res;
struct rdt_resource *r;

+ snc_nodes_per_l3_cache = snc_get_config();
+
for_each_rdt_resource(r) {
hw_res = resctrl_to_arch_res(r);

--
2.41.0

2023-10-03 16:17:47

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v7 0/8] Add support for Sub-NUMA cluster (SNC) systems

> The Sub-NUMA cluster feature on some Intel processors partitions
> the CPUs that share an L3 cache into two or more sets. This plays
> havoc with the Resource Director Technology (RDT) monitoring features.
> Prior to this patch Intel has advised that SNC and RDT are incompatible.
>
> Some of these CPU support an MSR that can partition the RMID
> counters in the same way. This allows for monitoring features
> to be used (with the caveat that memory accesses between different
> SNC NUMA nodes may still not be counted accuratlely.
>
> Note that this patch series improves resctrl reporting considerably
> on systems with SNC enabled, but there will still be some anomalies
> for processes accessing memory from other sub-NUMA nodes.

Bother .. forgot to add the changes since last version summary
to the cover letter. I fixed all the issues called out by Peter Newman
in his review of v6 series. Specific details are included in each patch
(except for patch 0005 which is unchanged).

I added Peter's "Reviewed-by" to patches where he offered it AND
where I didn't make substantive changes (parts 4, 5, 6, 7)

-Tony

2023-10-03 16:34:35

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v7 0/8] Add support for Sub-NUMA cluster (SNC) systems



On 10/3/2023 9:16 AM, Luck, Tony wrote:
>> The Sub-NUMA cluster feature on some Intel processors partitions
>> the CPUs that share an L3 cache into two or more sets. This plays
>> havoc with the Resource Director Technology (RDT) monitoring features.
>> Prior to this patch Intel has advised that SNC and RDT are incompatible.
>>
>> Some of these CPU support an MSR that can partition the RMID
>> counters in the same way. This allows for monitoring features
>> to be used (with the caveat that memory accesses between different
>> SNC NUMA nodes may still not be counted accuratlely.

The typo that I pointed out in V4 as well as V5 remains.
Not fixing something this fundamental reflects poorly on the rest
of this work.

Reinette

2023-10-03 16:46:24

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v7 4/8] x86/resctrl: Split the rdt_domain and rdt_hw_domain structures

> The same rdt_domain structure is used for both control and monitor
> functions. But this results in wasted memory as some of the fields are
> only used by control functions, while most are only used for monitor
> functions.
>
> Split into separate rdt_ctrl_domain and rdt_mon_domain structures with
> just the fields required for control and monitoring respectively.
>
> Similar split of the rdt_hw_domain structure into rdt_hw_ctrl_domain
> and rdt_hw_mon_domain.
>
> Signed-off-by: Tony Luck <[email protected]>
> Reviewed-by: Peter Newman <[email protected]>

Bother^2 ... lkp complained it can't apply this series to any of the
usual upstream targets (linus, tip, linx-next). Digging into that it
seems that "git format-patch" may have generated something that
can't be applied using "git am".

Not sure if I've got something bad in my config, or my git tree.

Will keep digging.

-Tony

2023-10-03 21:32:27

by Tony Luck

[permalink] [raw]
Subject: [PATCH v8 0/8] Add support for Sub-NUMA cluster (SNC) systems

The Sub-NUMA cluster feature on some Intel processors partitions
the CPUs that share an L3 cache into two or more sets. This plays
havoc with the Resource Director Technology (RDT) monitoring features.
Prior to this patch Intel has advised that SNC and RDT are incompatible.

Some of these CPU support an MSR that can partition the RMID
counters in the same way. This allows for monitoring features
to be used (with the caveat that memory accesses between different
SNC NUMA nodes may still not be counted accurately.

Note that this patch series improves resctrl reporting considerably
on systems with SNC enabled, but there will still be some anomalies
for processes accessing memory from other sub-NUMA nodes.

Signed-off-by: Tony Luck <[email protected]>

---
Please ignore v7 posting. There was some glitch in how I created
the patches with "git format-patch" that meant part 0004 would not
apply.

Changes since v6:

* Fixed spelling of "accurately" in cover letter.

* Applied changes from Peter Newman's review
Link: https://lore.kernel.org/r/CALPaoChB5ryT96ZZBQb6+3=xO+A0uR-ToN0TWqUjLJ7bgi==Rg@mail.gmail.com
(and follow-on posts against other patches in the v6 series).

See comments in indivdual patches for specific details.

Added Peter's "Reviewed-by" to parts 4-7.q

Tony Luck (8):
x86/resctrl: Prepare for new domain scope
x86/resctrl: Prepare to split rdt_domain structure
x86/resctrl: Prepare for different scope for control/monitor
operations
x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
x86/resctrl: Add node-scope to the options for feature scope
x86/resctrl: Introduce snc_nodes_per_l3_cache
x86/resctrl: Sub NUMA Cluster detection and enable
x86/resctrl: Update documentation with Sub-NUMA cluster changes

Documentation/arch/x86/resctrl.rst | 23 +-
include/linux/resctrl.h | 85 +++--
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 66 ++--
arch/x86/kernel/cpu/resctrl/core.c | 400 +++++++++++++++++-----
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 58 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 58 ++--
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 14 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 132 +++----
9 files changed, 591 insertions(+), 246 deletions(-)


base-commit: 6465e260f48790807eef06b583b38ca9789b6072
--
2.41.0

2023-10-03 21:32:57

by Tony Luck

[permalink] [raw]
Subject: [PATCH v8 1/8] x86/resctrl: Prepare for new domain scope

Resctrl resources operate on subsets of CPUs in the system with the
defining attribute of each subset being an instance of a particular
level of cache. E.g. all CPUs sharing an L3 cache would be part of the
same domain.

In preparation for features that are scoped at the NUMA node level
change the code from explicit references to "cache_level" to a more
generic scope. At this point the only options for this scope are groups
of CPUs that share an L2 cache or L3 cache.

Provide a more detailed warning message if a domain id cannot be found
when adding a CPU. Just check and silent return if the domain id can't
be found when removing a CPU.

Signed-off-by: Tony Luck <[email protected]>

---
Changes since last version:

s/-EINVAL/0/ for return value of rdtgroup_cbm_to_size()

---
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 8334eeacfec5..618735e396cb 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -144,13 +144,18 @@ struct resctrl_membw {
struct rdt_parse_data;
struct resctrl_schema;

+enum resctrl_scope {
+ RESCTRL_L2_CACHE = 2,
+ RESCTRL_L3_CACHE = 3,
+};
+
/**
* struct rdt_resource - attributes of a resctrl resource
* @rid: The index of the resource
* @alloc_capable: Is allocation available on this machine
* @mon_capable: Is monitor feature available on this machine
* @num_rmid: Number of RMIDs available
- * @cache_level: Which cache level defines scope of this resource
+ * @scope: Scope of this resource
* @cache: Cache allocation related data
* @membw: If the component has bandwidth controls, their properties.
* @domains: All domains for this resource
@@ -168,7 +173,7 @@ struct rdt_resource {
bool alloc_capable;
bool mon_capable;
int num_rmid;
- int cache_level;
+ enum resctrl_scope scope;
struct resctrl_cache cache;
struct resctrl_membw membw;
struct list_head domains;
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 030d3b409768..3b1837e1fb6b 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -65,7 +65,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_L3,
.name = "L3",
- .cache_level = 3,
+ .scope = RESCTRL_L3_CACHE,
.domains = domain_init(RDT_RESOURCE_L3),
.parse_ctrlval = parse_cbm,
.format_str = "%d=%0*x",
@@ -79,7 +79,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_L2,
.name = "L2",
- .cache_level = 2,
+ .scope = RESCTRL_L2_CACHE,
.domains = domain_init(RDT_RESOURCE_L2),
.parse_ctrlval = parse_cbm,
.format_str = "%d=%0*x",
@@ -93,7 +93,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_MBA,
.name = "MB",
- .cache_level = 3,
+ .scope = RESCTRL_L3_CACHE,
.domains = domain_init(RDT_RESOURCE_MBA),
.parse_ctrlval = parse_bw,
.format_str = "%d=%*u",
@@ -105,7 +105,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_SMBA,
.name = "SMBA",
- .cache_level = 3,
+ .scope = RESCTRL_L3_CACHE,
.domains = domain_init(RDT_RESOURCE_SMBA),
.parse_ctrlval = parse_bw,
.format_str = "%d=%*u",
@@ -487,6 +487,19 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
return 0;
}

+static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
+{
+ switch (scope) {
+ case RESCTRL_L2_CACHE:
+ case RESCTRL_L3_CACHE:
+ return get_cpu_cacheinfo_id(cpu, scope);
+ default:
+ break;
+ }
+
+ return -EINVAL;
+}
+
/*
* domain_add_cpu - Add a cpu to a resource's domain list.
*
@@ -502,12 +515,17 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
*/
static void domain_add_cpu(int cpu, struct rdt_resource *r)
{
- int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
+ int id = get_domain_id_from_scope(cpu, r->scope);
struct list_head *add_pos = NULL;
struct rdt_hw_domain *hw_dom;
struct rdt_domain *d;
int err;

+ if (id < 0) {
+ pr_warn_once("Can't find domain id for CPU:%d scope:%d for resource %s\n",
+ cpu, r->scope, r->name);
+ return;
+ }
d = rdt_find_domain(r, id, &add_pos);
if (IS_ERR(d)) {
pr_warn("Couldn't find cache id for CPU %d\n", cpu);
@@ -552,10 +570,13 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)

static void domain_remove_cpu(int cpu, struct rdt_resource *r)
{
- int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
+ int id = get_domain_id_from_scope(cpu, r->scope);
struct rdt_hw_domain *hw_dom;
struct rdt_domain *d;

+ if (id < 0)
+ return;
+
d = rdt_find_domain(r, id, NULL);
if (IS_ERR_OR_NULL(d)) {
pr_warn("Couldn't find cache id for CPU %d\n", cpu);
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 8f559eeae08e..8c5f932bc00b 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -292,10 +292,14 @@ static void pseudo_lock_region_clear(struct pseudo_lock_region *plr)
*/
static int pseudo_lock_region_init(struct pseudo_lock_region *plr)
{
+ int scope = plr->s->res->scope;
struct cpu_cacheinfo *ci;
int ret;
int i;

+ if (WARN_ON_ONCE(scope != RESCTRL_L2_CACHE && scope != RESCTRL_L3_CACHE))
+ return -ENODEV;
+
/* Pick the first cpu we find that is associated with the cache. */
plr->cpu = cpumask_first(&plr->d->cpu_mask);

@@ -311,7 +315,7 @@ static int pseudo_lock_region_init(struct pseudo_lock_region *plr)
plr->size = rdtgroup_cbm_to_size(plr->s->res, plr->d, plr->cbm);

for (i = 0; i < ci->num_leaves; i++) {
- if (ci->info_list[i].level == plr->s->res->cache_level) {
+ if (ci->info_list[i].level == scope) {
plr->line_size = ci->info_list[i].coherency_line_size;
return 0;
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 725344048f85..04c164f6d39d 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1345,10 +1345,13 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
unsigned int size = 0;
int num_b, i;

+ if (WARN_ON_ONCE(r->scope != RESCTRL_L2_CACHE && r->scope != RESCTRL_L3_CACHE))
+ return size;
+
num_b = bitmap_weight(&cbm, r->cache.cbm_len);
ci = get_cpu_cacheinfo(cpumask_any(&d->cpu_mask));
for (i = 0; i < ci->num_leaves; i++) {
- if (ci->info_list[i].level == r->cache_level) {
+ if (ci->info_list[i].level == r->scope) {
size = ci->info_list[i].size / r->cache.cbm_len * num_b;
break;
}

2023-10-03 21:45:41

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v8 0/8] Add support for Sub-NUMA cluster (SNC) systems

Tony,

On 10/3/2023 2:30 PM, Tony Luck wrote:
> The Sub-NUMA cluster feature on some Intel processors partitions
> the CPUs that share an L3 cache into two or more sets. This plays
> havoc with the Resource Director Technology (RDT) monitoring features.
> Prior to this patch Intel has advised that SNC and RDT are incompatible.
>
> Some of these CPU support an MSR that can partition the RMID
> counters in the same way. This allows for monitoring features
> to be used (with the caveat that memory accesses between different
> SNC NUMA nodes may still not be counted accurately.

Almost. For reference:
https://lore.kernel.org/lkml/[email protected]/

No need to send a new series just for this, but this series does find
itself at the back of my queue.

> Note that this patch series improves resctrl reporting considerably
> on systems with SNC enabled, but there will still be some anomalies
> for processes accessing memory from other sub-NUMA nodes.
>
> Signed-off-by: Tony Luck <[email protected]>

Reinette

2023-10-05 14:23:43

by Shaopeng Tan (Fujitsu)

[permalink] [raw]
Subject: RE: [PATCH v8 0/8] Add support for Sub-NUMA cluster (SNC) systems

Hi Tony,

I applied this patch series to kernel v6.5 and v6.6-rc4, but the kernel cannot be booted.
Could you tell me what kernel version this patch series is based on?

Best regards,
Shaopeng TAN

2023-10-05 16:07:53

by Peter Newman

[permalink] [raw]
Subject: Re: [PATCH v8 1/8] x86/resctrl: Prepare for new domain scope

On Tue, Oct 3, 2023 at 11:30 PM Tony Luck <[email protected]> wrote:
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 725344048f85..04c164f6d39d 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1345,10 +1345,13 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
> unsigned int size = 0;
> int num_b, i;
>
> + if (WARN_ON_ONCE(r->scope != RESCTRL_L2_CACHE && r->scope != RESCTRL_L3_CACHE))
> + return size;

Thanks!

Reviewed-by: Peter Newman <[email protected]>

2023-10-05 16:08:54

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v8 0/8] Add support for Sub-NUMA cluster (SNC) systems

> I applied this patch series to kernel v6.5 and v6.6-rc4, but the kernel cannot be booted.
> Could you tell me what kernel version this patch series is based on?

Hi Shaopeng,

Patches are based on v6.6-rc3 (see the "base-commit" line at the end of the cover letter).

Which CPU family/model/stepping & microcode are you testing on?

Are there error or panic messages when it does not boot?

-Tony

2023-10-05 17:18:32

by Peter Newman

[permalink] [raw]
Subject: Re: [PATCH v6 0/8] Add support for Sub-NUMA cluster (SNC) systems

Hi Tony,

On Thu, Sep 28, 2023 at 9:14 PM Tony Luck <[email protected]> wrote:
>
> The Sub-NUMA cluster feature on some Intel processors partitions
> the CPUs that share an L3 cache into two or more sets. This plays
> havoc with the Resource Director Technology (RDT) monitoring features.
> Prior to this patch Intel has advised that SNC and RDT are incompatible.
>
> Some of these CPU support an MSR that can partition the RMID
> counters in the same way. This allows for monitoring features
> to be used (with the caveat that memory accesses between different
> SNC NUMA nodes may still not be counted accuratlely.

Is an "SNC NUMA node" a "sub-NUMA node", or a NUMA node on which SNC
has been enabled?

Thanks!
-Peter

2023-10-06 20:27:36

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v8 0/8] Add support for Sub-NUMA cluster (SNC) systems

Hi Tony,

On 10/5/2023 8:08 AM, Luck, Tony wrote:
>> I applied this patch series to kernel v6.5 and v6.6-rc4, but the kernel cannot be booted.
>> Could you tell me what kernel version this patch series is based on?
>
> Hi Shaopeng,
>
> Patches are based on v6.6-rc3 (see the "base-commit" line at the end of the cover letter).
>
> Which CPU family/model/stepping & microcode are you testing on?
>
> Are there error or panic messages when it does not boot?

There is no need to ask Shaopeng for more information. If you test
this series you will immediately learn that it is broken.
Reading the series I only made it to patch #3 where I realized that this
has not been tested before posting.

Reinette

2023-10-20 21:32:05

by Tony Luck

[permalink] [raw]
Subject: [PATCH v9 0/8] Add support for Sub-NUMA cluster (SNC) systems

The Sub-NUMA cluster feature on some Intel processors partitions the CPUs
that share an L3 cache into two or more sets. This plays havoc with the
Resource Director Technology (RDT) monitoring features. Prior to this
patch Intel has advised that SNC and RDT are incompatible.

Some of these CPU support an MSR that can partition the RMID counters in
the same way. This allows monitoring features to be used. With the caveat
that users must be aware that Linux may migrate tasks more frequently
between SNC nodes than between "regular" NUMA nodes, so reading counters
from all SNC nodes may be needed to get a complete picture of activity
for tasks.

Cache and memory bandwidth allocation features continue to operate at
the scope of the L3 cache.

Signed-off-by: Tony Luck <[email protected]>

Changes since v6 (see individual patches for specifics):

v7 - had some git format-patch disaster and one of the patches couldn't
be applied.

v8 - Was rushed. Somehow I booted the wrong kernel while testing and
let escape a brown-paper-bag bug that crashed duing boot.
Sincere apologies to all who wasted time reading this series,
or trying to boot it.

v9 - Tested (Really! I checked timestamps in dmesg, and all sorts of
other checks to make sure I was really looking at a kernel built
with these patches).

Rebased to tip/master October 20th since that has several other
resctrl changes staged resdy for next merge window. No
significant collisions, just noise where "git am" would not
automatically apply. New base is:

3300447612b2 ("Merge branch into tip/master: 'x86/tdx'")

Fixed the brown-paper-bag bug from v8.

Added Peter's "Reviewed-by" where offered (except on patch 3
which had the aforementioned bug).

Tony Luck (8):
x86/resctrl: Prepare for new domain scope
x86/resctrl: Prepare to split rdt_domain structure
x86/resctrl: Prepare for different scope for control/monitor
operations
x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
x86/resctrl: Add node-scope to the options for feature scope
x86/resctrl: Introduce snc_nodes_per_l3_cache
x86/resctrl: Sub NUMA Cluster detection and enable
x86/resctrl: Update documentation with Sub-NUMA cluster changes

Documentation/arch/x86/resctrl.rst | 23 +-
include/linux/resctrl.h | 85 +++--
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 66 ++--
arch/x86/kernel/cpu/resctrl/core.c | 402 +++++++++++++++++-----
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 58 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 58 ++--
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 14 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 132 +++----
9 files changed, 592 insertions(+), 247 deletions(-)


base-commit: 3300447612b2adbc05cbb90e5d1cb288f19c40c6
--
2.41.0

2023-10-20 21:32:06

by Tony Luck

[permalink] [raw]
Subject: [PATCH v9 1/8] x86/resctrl: Prepare for new domain scope

Resctrl resources operate on subsets of CPUs in the system with the
defining attribute of each subset being an instance of a particular
level of cache. E.g. all CPUs sharing an L3 cache would be part of the
same domain.

In preparation for features that are scoped at the NUMA node level
change the code from explicit references to "cache_level" to a more
generic scope. At this point the only options for this scope are groups
of CPUs that share an L2 cache or L3 cache.

Provide a more detailed warning message if a domain id cannot be found
when adding a CPU. Just check and silent return if the domain id can't
be found when removing a CPU.

Reviewed-by: Peter Newman <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
Changes since v6:

s/-EINVAL/0/ for return value of rdtgroup_cbm_to_size()
Added Peter's review tag

include/linux/resctrl.h | 9 +++++--
arch/x86/kernel/cpu/resctrl/core.c | 33 ++++++++++++++++++-----
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 6 ++++-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 5 +++-
4 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 66942d7fba7f..7d4eb7df611d 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -144,13 +144,18 @@ struct resctrl_membw {
struct rdt_parse_data;
struct resctrl_schema;

+enum resctrl_scope {
+ RESCTRL_L2_CACHE = 2,
+ RESCTRL_L3_CACHE = 3,
+};
+
/**
* struct rdt_resource - attributes of a resctrl resource
* @rid: The index of the resource
* @alloc_capable: Is allocation available on this machine
* @mon_capable: Is monitor feature available on this machine
* @num_rmid: Number of RMIDs available
- * @cache_level: Which cache level defines scope of this resource
+ * @scope: Scope of this resource
* @cache: Cache allocation related data
* @membw: If the component has bandwidth controls, their properties.
* @domains: All domains for this resource
@@ -168,7 +173,7 @@ struct rdt_resource {
bool alloc_capable;
bool mon_capable;
int num_rmid;
- int cache_level;
+ enum resctrl_scope scope;
struct resctrl_cache cache;
struct resctrl_membw membw;
struct list_head domains;
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 19e0681f0435..e1588bcd9bd2 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -65,7 +65,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_L3,
.name = "L3",
- .cache_level = 3,
+ .scope = RESCTRL_L3_CACHE,
.domains = domain_init(RDT_RESOURCE_L3),
.parse_ctrlval = parse_cbm,
.format_str = "%d=%0*x",
@@ -79,7 +79,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_L2,
.name = "L2",
- .cache_level = 2,
+ .scope = RESCTRL_L2_CACHE,
.domains = domain_init(RDT_RESOURCE_L2),
.parse_ctrlval = parse_cbm,
.format_str = "%d=%0*x",
@@ -93,7 +93,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_MBA,
.name = "MB",
- .cache_level = 3,
+ .scope = RESCTRL_L3_CACHE,
.domains = domain_init(RDT_RESOURCE_MBA),
.parse_ctrlval = parse_bw,
.format_str = "%d=%*u",
@@ -105,7 +105,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_SMBA,
.name = "SMBA",
- .cache_level = 3,
+ .scope = RESCTRL_L3_CACHE,
.domains = domain_init(RDT_RESOURCE_SMBA),
.parse_ctrlval = parse_bw,
.format_str = "%d=%*u",
@@ -491,6 +491,19 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
return 0;
}

+static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
+{
+ switch (scope) {
+ case RESCTRL_L2_CACHE:
+ case RESCTRL_L3_CACHE:
+ return get_cpu_cacheinfo_id(cpu, scope);
+ default:
+ break;
+ }
+
+ return -EINVAL;
+}
+
/*
* domain_add_cpu - Add a cpu to a resource's domain list.
*
@@ -506,12 +519,17 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
*/
static void domain_add_cpu(int cpu, struct rdt_resource *r)
{
- int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
+ int id = get_domain_id_from_scope(cpu, r->scope);
struct list_head *add_pos = NULL;
struct rdt_hw_domain *hw_dom;
struct rdt_domain *d;
int err;

+ if (id < 0) {
+ pr_warn_once("Can't find domain id for CPU:%d scope:%d for resource %s\n",
+ cpu, r->scope, r->name);
+ return;
+ }
d = rdt_find_domain(r, id, &add_pos);
if (IS_ERR(d)) {
pr_warn("Couldn't find cache id for CPU %d\n", cpu);
@@ -556,10 +574,13 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)

static void domain_remove_cpu(int cpu, struct rdt_resource *r)
{
- int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
+ int id = get_domain_id_from_scope(cpu, r->scope);
struct rdt_hw_domain *hw_dom;
struct rdt_domain *d;

+ if (id < 0)
+ return;
+
d = rdt_find_domain(r, id, NULL);
if (IS_ERR_OR_NULL(d)) {
pr_warn("Couldn't find cache id for CPU %d\n", cpu);
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 8f559eeae08e..8c5f932bc00b 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -292,10 +292,14 @@ static void pseudo_lock_region_clear(struct pseudo_lock_region *plr)
*/
static int pseudo_lock_region_init(struct pseudo_lock_region *plr)
{
+ int scope = plr->s->res->scope;
struct cpu_cacheinfo *ci;
int ret;
int i;

+ if (WARN_ON_ONCE(scope != RESCTRL_L2_CACHE && scope != RESCTRL_L3_CACHE))
+ return -ENODEV;
+
/* Pick the first cpu we find that is associated with the cache. */
plr->cpu = cpumask_first(&plr->d->cpu_mask);

@@ -311,7 +315,7 @@ static int pseudo_lock_region_init(struct pseudo_lock_region *plr)
plr->size = rdtgroup_cbm_to_size(plr->s->res, plr->d, plr->cbm);

for (i = 0; i < ci->num_leaves; i++) {
- if (ci->info_list[i].level == plr->s->res->cache_level) {
+ if (ci->info_list[i].level == scope) {
plr->line_size = ci->info_list[i].coherency_line_size;
return 0;
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 69a1de92384a..c44be64d65ec 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1413,10 +1413,13 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
unsigned int size = 0;
int num_b, i;

+ if (WARN_ON_ONCE(r->scope != RESCTRL_L2_CACHE && r->scope != RESCTRL_L3_CACHE))
+ return size;
+
num_b = bitmap_weight(&cbm, r->cache.cbm_len);
ci = get_cpu_cacheinfo(cpumask_any(&d->cpu_mask));
for (i = 0; i < ci->num_leaves; i++) {
- if (ci->info_list[i].level == r->cache_level) {
+ if (ci->info_list[i].level == r->scope) {
size = ci->info_list[i].size / r->cache.cbm_len * num_b;
break;
}
--
2.41.0

2023-10-20 21:32:24

by Tony Luck

[permalink] [raw]
Subject: [PATCH v9 3/8] x86/resctrl: Prepare for different scope for control/monitor operations

Resctrl assumes that control and monitor operations on a resource are
performed at the same scope.

Prepare for systems that use different scope (specifically L3 scope for
cache control and NODE scope for cache occupancy and memory bandwidth
monitoring).

Create separate domain lists for control and monitor operations.

Note that errors during initialization of either control or monitor
functions on a domain would previously result in that domain being
excluded from both control and monitor operations. Now the domains are
allocated independently it is no longer required to disable both control
and monitor operations if either fail.

Signed-off-by: Tony Luck <[email protected]>
---
Changes since v6:

Initialize the "type" in rdt_domain_hdr when creating domains.
Check type has expected value before using container_of() to
get to the surrounding structure.

Rename "hw_mondom" to "hw_dom" in domain_add_cpu_mon() and
in domain_remove_cpu_mon().

Add lockdep_assert_held(&rdtgroup_mutex) to resctrl_offline_mon_domain()

Changes since v8:
Fix the brown-paper-bag bugs using NULL result from rdt_find_domain()


include/linux/resctrl.h | 18 +-
arch/x86/kernel/cpu/resctrl/internal.h | 4 +-
arch/x86/kernel/cpu/resctrl/core.c | 216 +++++++++++++++++-----
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 18 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 2 +-
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 4 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 54 +++---
7 files changed, 225 insertions(+), 91 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 320febbb0a4e..98d917aff075 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -170,10 +170,12 @@ enum resctrl_scope {
* @alloc_capable: Is allocation available on this machine
* @mon_capable: Is monitor feature available on this machine
* @num_rmid: Number of RMIDs available
- * @scope: Scope of this resource
+ * @ctrl_scope: Scope of this resource for control functions
+ * @mon_scope: Scope of this resource for monitor functions
* @cache: Cache allocation related data
* @membw: If the component has bandwidth controls, their properties.
- * @domains: All domains for this resource
+ * @ctrl_domains: Control domains for this resource
+ * @mon_domains: Monitor domains for this resource
* @name: Name to use in "schemata" file.
* @data_width: Character width of data when displaying
* @default_ctrl: Specifies default cache cbm or memory B/W percent.
@@ -188,10 +190,12 @@ struct rdt_resource {
bool alloc_capable;
bool mon_capable;
int num_rmid;
- enum resctrl_scope scope;
+ enum resctrl_scope ctrl_scope;
+ enum resctrl_scope mon_scope;
struct resctrl_cache cache;
struct resctrl_membw membw;
- struct list_head domains;
+ struct list_head ctrl_domains;
+ struct list_head mon_domains;
char *name;
int data_width;
u32 default_ctrl;
@@ -237,8 +241,10 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,

u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
-int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
-void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
+int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d);

/**
* resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index a4f1aa15f0a2..12f1ea3ba8a1 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -520,8 +520,8 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn);
int rdtgroup_kn_mode_restrict(struct rdtgroup *r, const char *name);
int rdtgroup_kn_mode_restore(struct rdtgroup *r, const char *name,
umode_t mask);
-struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
- struct list_head **pos);
+struct rdt_domain_hdr *rdt_find_domain(struct list_head *h, int id,
+ struct list_head **pos);
ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off);
int rdtgroup_schemata_show(struct kernfs_open_file *of,
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 7daa5b7e6cb0..b3b8f936bea5 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -57,7 +57,8 @@ static void
mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
struct rdt_resource *r);

-#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.domains)
+#define ctrl_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.ctrl_domains)
+#define mon_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.mon_domains)

struct rdt_hw_resource rdt_resources_all[] = {
[RDT_RESOURCE_L3] =
@@ -65,8 +66,10 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_L3,
.name = "L3",
- .scope = RESCTRL_L3_CACHE,
- .domains = domain_init(RDT_RESOURCE_L3),
+ .ctrl_scope = RESCTRL_L3_CACHE,
+ .mon_scope = RESCTRL_L3_CACHE,
+ .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_L3),
+ .mon_domains = mon_domain_init(RDT_RESOURCE_L3),
.parse_ctrlval = parse_cbm,
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
@@ -79,8 +82,8 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_L2,
.name = "L2",
- .scope = RESCTRL_L2_CACHE,
- .domains = domain_init(RDT_RESOURCE_L2),
+ .ctrl_scope = RESCTRL_L2_CACHE,
+ .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_L2),
.parse_ctrlval = parse_cbm,
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
@@ -93,8 +96,8 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_MBA,
.name = "MB",
- .scope = RESCTRL_L3_CACHE,
- .domains = domain_init(RDT_RESOURCE_MBA),
+ .ctrl_scope = RESCTRL_L3_CACHE,
+ .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_MBA),
.parse_ctrlval = parse_bw,
.format_str = "%d=%*u",
.fflags = RFTYPE_RES_MB,
@@ -105,8 +108,8 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_SMBA,
.name = "SMBA",
- .scope = RESCTRL_L3_CACHE,
- .domains = domain_init(RDT_RESOURCE_SMBA),
+ .ctrl_scope = RESCTRL_L3_CACHE,
+ .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_SMBA),
.parse_ctrlval = parse_bw,
.format_str = "%d=%*u",
.fflags = RFTYPE_RES_MB,
@@ -356,7 +359,7 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
{
struct rdt_domain *d;

- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
/* Find the domain that contains this CPU */
if (cpumask_test_cpu(cpu, &d->cpu_mask))
return d;
@@ -388,29 +391,39 @@ void rdt_ctrl_update(void *arg)
}

/*
- * rdt_find_domain - Find a domain in a resource that matches input resource id
+ * rdt_find_domain - Find a domain in one of a resource domain lists.
*
- * Search resource r's domain list to find the resource id. If the resource
- * id is found in a domain, return the domain. Otherwise, if requested by
- * caller, return the first domain whose id is bigger than the input id.
+ * Search the list to find the resource id. If the resource id is found
+ * in a domain, return the domain. Otherwise, if requested by caller,
+ * return the first domain whose id is bigger than the input id.
* The domain list is sorted by id in ascending order.
+ *
+ * If an existing domain in the resource r's domain list matches the cpu's
+ * resource id, add the cpu in the domain.
+ *
+ * Otherwise, caller will allocate a new domain and insert into the right position
+ * in the domain list sorted by id in ascending order.
+ *
+ * The order in the domain list is visible to users when we print entries
+ * in the schemata file and schemata input is validated to have the same order
+ * as this list.
*/
-struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
- struct list_head **pos)
+struct rdt_domain_hdr *rdt_find_domain(struct list_head *h, int id,
+ struct list_head **pos)
{
- struct rdt_domain *d;
+ struct rdt_domain_hdr *d;
struct list_head *l;

if (id < 0)
return ERR_PTR(-ENODEV);

- list_for_each(l, &r->domains) {
- d = list_entry(l, struct rdt_domain, hdr.list);
+ list_for_each(l, h) {
+ d = list_entry(l, struct rdt_domain_hdr, list);
/* When id is found, return its domain. */
- if (id == d->hdr.id)
+ if (id == d->id)
return d;
/* Stop searching when finding id's position in sorted list. */
- if (id < d->hdr.id)
+ if (id < d->id)
break;
}

@@ -504,39 +517,33 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
return -EINVAL;
}

-/*
- * domain_add_cpu - Add a cpu to a resource's domain list.
- *
- * If an existing domain in the resource r's domain list matches the cpu's
- * resource id, add the cpu in the domain.
- *
- * Otherwise, a new domain is allocated and inserted into the right position
- * in the domain list sorted by id in ascending order.
- *
- * The order in the domain list is visible to users when we print entries
- * in the schemata file and schemata input is validated to have the same order
- * as this list.
- */
-static void domain_add_cpu(int cpu, struct rdt_resource *r)
+static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
{
- int id = get_domain_id_from_scope(cpu, r->scope);
+ int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
struct list_head *add_pos = NULL;
struct rdt_hw_domain *hw_dom;
+ struct rdt_domain_hdr *hdr;
struct rdt_domain *d;
int err;

if (id < 0) {
- pr_warn_once("Can't find domain id for CPU:%d scope:%d for resource %s\n",
- cpu, r->scope, r->name);
+ pr_warn_once("Can't find control domain id for CPU:%d scope:%d for resource %s\n",
+ cpu, r->ctrl_scope, r->name);
return;
}
- d = rdt_find_domain(r, id, &add_pos);
- if (IS_ERR(d)) {
- pr_warn("Couldn't find cache id for CPU %d\n", cpu);
+
+ hdr = rdt_find_domain(&r->ctrl_domains, id, &add_pos);
+ if (IS_ERR(hdr)) {
+ pr_warn("Couldn't find control scope id=%d for CPU %d\n", id, cpu);
return;
}

- if (d) {
+ if (hdr) {
+ if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
+ return;
+
+ d = container_of(hdr, struct rdt_domain, hdr);
+
cpumask_set_cpu(cpu, &d->cpu_mask);
if (r->cache.arch_has_per_cpu_cfg)
rdt_domain_reconfigure_cdp(r);
@@ -549,48 +556,115 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)

d = &hw_dom->d_resctrl;
d->hdr.id = id;
+ d->hdr.type = RESCTRL_CTRL_DOMAIN;
cpumask_set_cpu(cpu, &d->cpu_mask);

rdt_domain_reconfigure_cdp(r);

- if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
+ if (domain_setup_ctrlval(r, d)) {
+ domain_free(hw_dom);
+ return;
+ }
+
+ list_add_tail(&d->hdr.list, add_pos);
+
+ err = resctrl_online_ctrl_domain(r, d);
+ if (err) {
+ list_del(&d->hdr.list);
domain_free(hw_dom);
+ }
+}
+
+static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
+{
+ int id = get_domain_id_from_scope(cpu, r->mon_scope);
+ struct list_head *add_pos = NULL;
+ struct rdt_hw_domain *hw_dom;
+ struct rdt_domain_hdr *hdr;
+ struct rdt_domain *d;
+ int err;
+
+ if (id < 0) {
+ pr_warn_once("Can't find monitor domain id for CPU:%d scope:%d for resource %s\n",
+ cpu, r->mon_scope, r->name);
+ return;
+ }
+
+ hdr = rdt_find_domain(&r->mon_domains, id, &add_pos);
+ if (IS_ERR(hdr)) {
+ pr_warn("Couldn't find monitor scope id=%d for CPU %d\n", id, cpu);
+ return;
+ }
+
+ if (hdr) {
+ if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
+ return;
+
+ d = container_of(hdr, struct rdt_domain, hdr);
+
+ cpumask_set_cpu(cpu, &d->cpu_mask);
return;
}

- if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+ hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu));
+ if (!hw_dom)
+ return;
+
+ d = &hw_dom->d_resctrl;
+ d->hdr.id = id;
+ d->hdr.type = RESCTRL_MON_DOMAIN;
+ cpumask_set_cpu(cpu, &d->cpu_mask);
+
+ if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
domain_free(hw_dom);
return;
}

list_add_tail(&d->hdr.list, add_pos);

- err = resctrl_online_domain(r, d);
+ err = resctrl_online_mon_domain(r, d);
if (err) {
list_del(&d->hdr.list);
domain_free(hw_dom);
}
}

-static void domain_remove_cpu(int cpu, struct rdt_resource *r)
+/*
+ * domain_add_cpu - Add a cpu to either/both resource's domain lists.
+ */
+static void domain_add_cpu(int cpu, struct rdt_resource *r)
+{
+ if (r->alloc_capable)
+ domain_add_cpu_ctrl(cpu, r);
+ if (r->mon_capable)
+ domain_add_cpu_mon(cpu, r);
+}
+
+static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
{
- int id = get_domain_id_from_scope(cpu, r->scope);
+ int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
struct rdt_hw_domain *hw_dom;
+ struct rdt_domain_hdr *hdr;
struct rdt_domain *d;

if (id < 0)
return;

- d = rdt_find_domain(r, id, NULL);
- if (IS_ERR_OR_NULL(d)) {
- pr_warn("Couldn't find cache id for CPU %d\n", cpu);
+ hdr = rdt_find_domain(&r->ctrl_domains, id, NULL);
+ if (IS_ERR_OR_NULL(hdr)) {
+ pr_warn("Couldn't find control scope id=%d for CPU %d\n", id, cpu);
return;
}
+
+ if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
+ return;
+
+ d = container_of(hdr, struct rdt_domain, hdr);
hw_dom = resctrl_to_arch_dom(d);

cpumask_clear_cpu(cpu, &d->cpu_mask);
if (cpumask_empty(&d->cpu_mask)) {
- resctrl_offline_domain(r, d);
+ resctrl_offline_ctrl_domain(r, d);
list_del(&d->hdr.list);

/*
@@ -603,6 +677,38 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)

return;
}
+}
+
+static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
+{
+ int id = get_domain_id_from_scope(cpu, r->mon_scope);
+ struct rdt_hw_domain *hw_dom;
+ struct rdt_domain_hdr *hdr;
+ struct rdt_domain *d;
+
+ if (id < 0)
+ return;
+
+ hdr = rdt_find_domain(&r->mon_domains, id, NULL);
+ if (IS_ERR_OR_NULL(hdr)) {
+ pr_warn("Couldn't find scope id=%d for CPU %d\n", id, cpu);
+ return;
+ }
+
+ if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
+ return;
+
+ d = container_of(hdr, struct rdt_domain, hdr);
+ hw_dom = resctrl_to_arch_dom(d);
+
+ cpumask_clear_cpu(cpu, &d->cpu_mask);
+ if (cpumask_empty(&d->cpu_mask)) {
+ resctrl_offline_mon_domain(r, d);
+ list_del(&d->hdr.list);
+ domain_free(hw_dom);
+
+ return;
+ }

if (r == &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl) {
if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {
@@ -617,6 +723,14 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
}
}

+static void domain_remove_cpu(int cpu, struct rdt_resource *r)
+{
+ if (r->alloc_capable)
+ domain_remove_cpu_ctrl(cpu, r);
+ if (r->mon_capable)
+ domain_remove_cpu_mon(cpu, r);
+}
+
static void clear_closid_rmid(int cpu)
{
struct resctrl_pqr_state *state = this_cpu_ptr(&pqr_state);
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 6f4152b21985..34b7eb26b06d 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -226,7 +226,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
return -EINVAL;
}
dom = strim(dom);
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
if (d->hdr.id == dom_id) {
data.buf = dom;
data.rdtgrp = rdtgrp;
@@ -318,7 +318,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
return -ENOMEM;

msr_param.res = NULL;
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
hw_dom = resctrl_to_arch_dom(d);
for (t = 0; t < CDP_NUM_TYPES; t++) {
cfg = &hw_dom->d_resctrl.staged_config[t];
@@ -466,7 +466,7 @@ static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int clo
u32 ctrl_val;

seq_printf(s, "%*s:", max_name_width, schema->name);
- list_for_each_entry(dom, &r->domains, hdr.list) {
+ list_for_each_entry(dom, &r->ctrl_domains, hdr.list) {
if (sep)
seq_puts(s, ";");

@@ -542,6 +542,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
int rdtgroup_mondata_show(struct seq_file *m, void *arg)
{
struct kernfs_open_file *of = m->private;
+ struct rdt_domain_hdr *hdr;
u32 resid, evtid, domid;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
@@ -562,12 +563,19 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
evtid = md.u.evtid;

r = &rdt_resources_all[resid].r_resctrl;
- d = rdt_find_domain(r, domid, NULL);
- if (IS_ERR_OR_NULL(d)) {
+ hdr = rdt_find_domain(&r->mon_domains, domid, NULL);
+ if (IS_ERR_OR_NULL(hdr)) {
ret = -ENOENT;
goto out;
}

+ if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ d = container_of(hdr, struct rdt_domain, hdr);
+
mon_event_read(&rr, r, d, rdtgrp, evtid, false);

if (rr.err == -EIO)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index ef12f78392e9..6d5e2cbdefb5 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -340,7 +340,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)

entry->busy = 0;
cpu = get_cpu();
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
err = resctrl_arch_rmid_read(r, d, entry->rmid,
QOS_L3_OCCUP_EVENT_ID,
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 18b6183a1b48..bda32b4e1c1e 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -292,7 +292,7 @@ static void pseudo_lock_region_clear(struct pseudo_lock_region *plr)
*/
static int pseudo_lock_region_init(struct pseudo_lock_region *plr)
{
- int scope = plr->s->res->scope;
+ int scope = plr->s->res->ctrl_scope;
struct cpu_cacheinfo *ci;
int ret;
int i;
@@ -856,7 +856,7 @@ bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
* associated with them.
*/
for_each_alloc_capable_rdt_resource(r) {
- list_for_each_entry(d_i, &r->domains, hdr.list) {
+ list_for_each_entry(d_i, &r->ctrl_domains, hdr.list) {
if (d_i->plr)
cpumask_or(cpu_with_psl, cpu_with_psl,
&d_i->cpu_mask);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 746ee56856a9..9df8f02ecb63 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -91,7 +91,7 @@ void rdt_staged_configs_clear(void)
lockdep_assert_held(&rdtgroup_mutex);

for_each_alloc_capable_rdt_resource(r) {
- list_for_each_entry(dom, &r->domains, hdr.list)
+ list_for_each_entry(dom, &r->ctrl_domains, hdr.list)
memset(dom->staged_config, 0, sizeof(dom->staged_config));
}
}
@@ -984,7 +984,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,

mutex_lock(&rdtgroup_mutex);
hw_shareable = r->cache.shareable_bits;
- list_for_each_entry(dom, &r->domains, hdr.list) {
+ list_for_each_entry(dom, &r->ctrl_domains, hdr.list) {
if (sep)
seq_putc(seq, ';');
sw_shareable = 0;
@@ -1302,7 +1302,7 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
if (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)
continue;
has_cache = true;
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
ctrl = resctrl_arch_get_config(r, d, closid,
s->conf_type);
if (rdtgroup_cbm_overlaps(s, d, ctrl, closid, false)) {
@@ -1413,13 +1413,13 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
unsigned int size = 0;
int num_b, i;

- if (WARN_ON_ONCE(r->scope != RESCTRL_L2_CACHE && r->scope != RESCTRL_L3_CACHE))
+ if (WARN_ON_ONCE(r->ctrl_scope != RESCTRL_L2_CACHE && r->ctrl_scope != RESCTRL_L3_CACHE))
return size;

num_b = bitmap_weight(&cbm, r->cache.cbm_len);
ci = get_cpu_cacheinfo(cpumask_any(&d->cpu_mask));
for (i = 0; i < ci->num_leaves; i++) {
- if (ci->info_list[i].level == r->scope) {
+ if (ci->info_list[i].level == r->ctrl_scope) {
size = ci->info_list[i].size / r->cache.cbm_len * num_b;
break;
}
@@ -1477,7 +1477,7 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
type = schema->conf_type;
sep = false;
seq_printf(s, "%*s:", max_name_width, schema->name);
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
if (sep)
seq_putc(s, ';');
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
@@ -1566,7 +1566,7 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid

mutex_lock(&rdtgroup_mutex);

- list_for_each_entry(dom, &r->domains, hdr.list) {
+ list_for_each_entry(dom, &r->mon_domains, hdr.list) {
if (sep)
seq_puts(s, ";");

@@ -1689,7 +1689,7 @@ static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
return -EINVAL;
}

- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
if (d->hdr.id == dom_id) {
ret = mbm_config_write_domain(r, d, evtid, val);
if (ret)
@@ -2232,7 +2232,7 @@ static int set_cache_qos_cfg(int level, bool enable)
return -ENOMEM;

r_l = &rdt_resources_all[level].r_resctrl;
- list_for_each_entry(d, &r_l->domains, hdr.list) {
+ list_for_each_entry(d, &r_l->ctrl_domains, hdr.list) {
if (r_l->cache.arch_has_per_cpu_cfg)
/* Pick all the CPUs in the domain instance */
for_each_cpu(cpu, &d->cpu_mask)
@@ -2317,7 +2317,7 @@ static int set_mba_sc(bool mba_sc)

r->membw.mba_sc = mba_sc;

- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
for (i = 0; i < num_closid; i++)
d->mbps_val[i] = MBA_MAX_MBPS;
}
@@ -2653,7 +2653,7 @@ static int rdt_get_tree(struct fs_context *fc)

if (is_mbm_enabled()) {
r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- list_for_each_entry(dom, &r->domains, hdr.list)
+ list_for_each_entry(dom, &r->mon_domains, hdr.list)
mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL);
}

@@ -2777,10 +2777,10 @@ static int reset_all_ctrls(struct rdt_resource *r)

/*
* Disable resource control for this resource by setting all
- * CBMs in all domains to the maximum mask value. Pick one CPU
+ * CBMs in all ctrl_domains to the maximum mask value. Pick one CPU
* from each domain to update the MSRs below.
*/
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
hw_dom = resctrl_to_arch_dom(d);
cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);

@@ -3050,7 +3050,7 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn,
struct rdt_domain *dom;
int ret;

- list_for_each_entry(dom, &r->domains, hdr.list) {
+ list_for_each_entry(dom, &r->mon_domains, hdr.list) {
ret = mkdir_mondata_subdir(parent_kn, dom, r, prgrp);
if (ret)
return ret;
@@ -3232,7 +3232,7 @@ static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid)
struct rdt_domain *d;
int ret;

- list_for_each_entry(d, &s->res->domains, hdr.list) {
+ list_for_each_entry(d, &s->res->ctrl_domains, hdr.list) {
ret = __init_one_rdt_domain(d, s, closid);
if (ret < 0)
return ret;
@@ -3247,7 +3247,7 @@ static void rdtgroup_init_mba(struct rdt_resource *r, u32 closid)
struct resctrl_staged_config *cfg;
struct rdt_domain *d;

- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
if (is_mba_sc(r)) {
d->mbps_val[closid] = MBA_MAX_MBPS;
continue;
@@ -3849,15 +3849,17 @@ static void domain_destroy_mon_state(struct rdt_domain *d)
kfree(d->mbm_local);
}

-void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
+void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
{
lockdep_assert_held(&rdtgroup_mutex);

if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
mba_sc_domain_destroy(r, d);
+}

- if (!r->mon_capable)
- return;
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+ lockdep_assert_held(&rdtgroup_mutex);

/*
* If resctrl is mounted, remove all the
@@ -3914,18 +3916,22 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

-int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
+int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
{
- int err;
-
lockdep_assert_held(&rdtgroup_mutex);

if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
/* RDT_RESOURCE_MBA is never mon_capable */
return mba_sc_domain_allocate(r, d);

- if (!r->mon_capable)
- return 0;
+ return 0;
+}
+
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+ int err;
+
+ lockdep_assert_held(&rdtgroup_mutex);

err = domain_setup_mon_state(r, d);
if (err)
--
2.41.0

2023-10-20 21:32:55

by Tony Luck

[permalink] [raw]
Subject: [PATCH v9 4/8] x86/resctrl: Split the rdt_domain and rdt_hw_domain structures

The same rdt_domain structure is used for both control and monitor
functions. But this results in wasted memory as some of the fields are
only used by control functions, while most are only used for monitor
functions.

Split into separate rdt_ctrl_domain and rdt_mon_domain structures with
just the fields required for control and monitoring respectively.

Similar split of the rdt_hw_domain structure into rdt_hw_ctrl_domain
and rdt_hw_mon_domain.

Reviewed-by: Peter Newman <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
Changes since v6:

Changes since last version:

Fixed two places where line wrapping inside comments were not
properly aligned.

Added Peter's review tag

include/linux/resctrl.h | 50 +++++++------
arch/x86/kernel/cpu/resctrl/internal.h | 60 ++++++++++------
arch/x86/kernel/cpu/resctrl/core.c | 87 ++++++++++++-----------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 32 ++++-----
arch/x86/kernel/cpu/resctrl/monitor.c | 40 +++++------
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 6 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 62 ++++++++--------
7 files changed, 184 insertions(+), 153 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 98d917aff075..4778ef71c893 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -70,7 +70,25 @@ struct rdt_domain_hdr {
};

/**
- * struct rdt_domain - group of CPUs sharing a resctrl resource
+ * struct rdt_ctrl_domain - group of CPUs sharing a resctrl control resource
+ * @hdr: common header for different domain types
+ * @cpu_mask: which CPUs share this resource
+ * @plr: pseudo-locked region (if any) associated with domain
+ * @staged_config: parsed configuration to be applied
+ * @mbps_val: When mba_sc is enabled, this holds the array of user
+ * specified control values for mba_sc in MBps, indexed
+ * by closid
+ */
+struct rdt_ctrl_domain {
+ struct rdt_domain_hdr hdr;
+ struct cpumask cpu_mask;
+ struct pseudo_lock_region *plr;
+ struct resctrl_staged_config staged_config[CDP_NUM_TYPES];
+ u32 *mbps_val;
+};
+
+/**
+ * struct rdt_mon_domain - group of CPUs sharing a resctrl control resource
* @hdr: common header for different domain types
* @cpu_mask: which CPUs share this resource
* @rmid_busy_llc: bitmap of which limbo RMIDs are above threshold
@@ -80,13 +98,8 @@ struct rdt_domain_hdr {
* @cqm_limbo: worker to periodically read CQM h/w counters
* @mbm_work_cpu: worker CPU for MBM h/w counters
* @cqm_work_cpu: worker CPU for CQM h/w counters
- * @plr: pseudo-locked region (if any) associated with domain
- * @staged_config: parsed configuration to be applied
- * @mbps_val: When mba_sc is enabled, this holds the array of user
- * specified control values for mba_sc in MBps, indexed
- * by closid
*/
-struct rdt_domain {
+struct rdt_mon_domain {
struct rdt_domain_hdr hdr;
struct cpumask cpu_mask;
unsigned long *rmid_busy_llc;
@@ -96,9 +109,6 @@ struct rdt_domain {
struct delayed_work cqm_limbo;
int mbm_work_cpu;
int cqm_work_cpu;
- struct pseudo_lock_region *plr;
- struct resctrl_staged_config staged_config[CDP_NUM_TYPES];
- u32 *mbps_val;
};

/**
@@ -202,7 +212,7 @@ struct rdt_resource {
const char *format_str;
int (*parse_ctrlval)(struct rdt_parse_data *data,
struct resctrl_schema *s,
- struct rdt_domain *d);
+ struct rdt_ctrl_domain *d);
struct list_head evt_list;
unsigned long fflags;
bool cdp_capable;
@@ -236,15 +246,15 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
* Update the ctrl_val and apply this config right now.
* Must be called on one of the domain's CPUs.
*/
-int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type t, u32 cfg_val);

-u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
+u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type type);
-int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
-int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
-void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
-void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
+int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d);
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d);
+void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d);
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d);

/**
* resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
@@ -260,7 +270,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
* Return:
* 0 on success, or -EIO, -EINVAL etc on error.
*/
-int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 rmid, enum resctrl_event_id eventid, u64 *val);

/**
@@ -273,7 +283,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
*
* This can be called from any CPU.
*/
-void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 rmid, enum resctrl_event_id eventid);

/**
@@ -285,7 +295,7 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
*
* This can be called from any CPU.
*/
-void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);

extern unsigned int resctrl_rmid_realloc_threshold;
extern unsigned int resctrl_rmid_realloc_limit;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 12f1ea3ba8a1..41a23556f57d 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -107,7 +107,7 @@ union mon_data_bits {
struct rmid_read {
struct rdtgroup *rgrp;
struct rdt_resource *r;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;
enum resctrl_event_id evtid;
bool first;
int err;
@@ -192,7 +192,7 @@ struct mongroup {
*/
struct pseudo_lock_region {
struct resctrl_schema *s;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
u32 cbm;
wait_queue_head_t lock_thread_wq;
int thread_done;
@@ -319,25 +319,41 @@ struct arch_mbm_state {
};

/**
- * struct rdt_hw_domain - Arch private attributes of a set of CPUs that share
- * a resource
+ * struct rdt_hw_ctrl_domain - Arch private attributes of a set of CPUs that share
+ * a resource for a control function
* @d_resctrl: Properties exposed to the resctrl file system
* @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID)
+ *
+ * Members of this structure are accessed via helpers that provide abstraction.
+ */
+struct rdt_hw_ctrl_domain {
+ struct rdt_ctrl_domain d_resctrl;
+ u32 *ctrl_val;
+};
+
+/**
+ * struct rdt_hw_mon_domain - Arch private attributes of a set of CPUs that share
+ * a resource for a monitor function
+ * @d_resctrl: Properties exposed to the resctrl file system
* @arch_mbm_total: arch private state for MBM total bandwidth
* @arch_mbm_local: arch private state for MBM local bandwidth
*
* Members of this structure are accessed via helpers that provide abstraction.
*/
-struct rdt_hw_domain {
- struct rdt_domain d_resctrl;
- u32 *ctrl_val;
+struct rdt_hw_mon_domain {
+ struct rdt_mon_domain d_resctrl;
struct arch_mbm_state *arch_mbm_total;
struct arch_mbm_state *arch_mbm_local;
};

-static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
+static inline struct rdt_hw_ctrl_domain *resctrl_to_arch_ctrl_dom(struct rdt_ctrl_domain *r)
+{
+ return container_of(r, struct rdt_hw_ctrl_domain, d_resctrl);
+}
+
+static inline struct rdt_hw_mon_domain *resctrl_to_arch_mon_dom(struct rdt_mon_domain *r)
{
- return container_of(r, struct rdt_hw_domain, d_resctrl);
+ return container_of(r, struct rdt_hw_mon_domain, d_resctrl);
}

/**
@@ -405,7 +421,7 @@ struct rdt_hw_resource {
struct rdt_resource r_resctrl;
u32 num_closid;
unsigned int msr_base;
- void (*msr_update) (struct rdt_domain *d, struct msr_param *m,
+ void (*msr_update) (struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r);
unsigned int mon_scale;
unsigned int mbm_width;
@@ -418,9 +434,9 @@ static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r
}

int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,
- struct rdt_domain *d);
+ struct rdt_ctrl_domain *d);
int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
- struct rdt_domain *d);
+ struct rdt_ctrl_domain *d);

extern struct mutex rdtgroup_mutex;

@@ -526,21 +542,21 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off);
int rdtgroup_schemata_show(struct kernfs_open_file *of,
struct seq_file *s, void *v);
-bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_domain *d,
+bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_ctrl_domain *d,
unsigned long cbm, int closid, bool exclusive);
-unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, struct rdt_domain *d,
+unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, struct rdt_ctrl_domain *d,
unsigned long cbm);
enum rdtgrp_mode rdtgroup_mode_by_closid(int closid);
int rdtgroup_tasks_assigned(struct rdtgroup *r);
int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp);
-bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, unsigned long cbm);
-bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d);
+bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_ctrl_domain *d, unsigned long cbm);
+bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_ctrl_domain *d);
int rdt_pseudo_lock_init(void);
void rdt_pseudo_lock_release(void);
int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp);
void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp);
-struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
+struct rdt_ctrl_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
int closids_supported(void);
void closid_free(int closid);
int alloc_rmid(void);
@@ -550,17 +566,17 @@ bool __init rdt_cpu_has(int flag);
void mon_event_count(void *info);
int rdtgroup_mondata_show(struct seq_file *m, void *arg);
void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
- struct rdt_domain *d, struct rdtgroup *rdtgrp,
+ struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
int evtid, int first);
-void mbm_setup_overflow_handler(struct rdt_domain *dom,
+void mbm_setup_overflow_handler(struct rdt_mon_domain *dom,
unsigned long delay_ms);
void mbm_handle_overflow(struct work_struct *work);
void __init intel_rdt_mbm_apply_quirk(void);
bool is_mba_sc(struct rdt_resource *r);
-void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
+void cqm_setup_limbo_handler(struct rdt_mon_domain *dom, unsigned long delay_ms);
void cqm_handle_limbo(struct work_struct *work);
-bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
-void __check_limbo(struct rdt_domain *d, bool force_free);
+bool has_busy_rmid(struct rdt_resource *r, struct rdt_mon_domain *d);
+void __check_limbo(struct rdt_mon_domain *d, bool force_free);
void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
void __init thread_throttle_mode_init(void);
void __init mbm_config_rftype_init(const char *config);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index b3b8f936bea5..bcc4bd2e1930 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -49,12 +49,12 @@ int max_name_width, max_data_width;
bool rdt_alloc_capable;

static void
-mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
+mba_wrmsr_intel(struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r);
static void
-cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r);
+cat_wrmsr(struct rdt_ctrl_domain *d, struct msr_param *m, struct rdt_resource *r);
static void
-mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
+mba_wrmsr_amd(struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r);

#define ctrl_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.ctrl_domains)
@@ -307,11 +307,11 @@ static void rdt_get_cdp_l2_config(void)
}

static void
-mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
+mba_wrmsr_amd(struct rdt_ctrl_domain *d, struct msr_param *m, struct rdt_resource *r)
{
- unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ unsigned int i;

for (i = m->low; i < m->high; i++)
wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
@@ -332,12 +332,12 @@ static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
}

static void
-mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
+mba_wrmsr_intel(struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r)
{
- unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ unsigned int i;

/* Write the delay values for mba. */
for (i = m->low; i < m->high; i++)
@@ -345,19 +345,19 @@ mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
}

static void
-cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
+cat_wrmsr(struct rdt_ctrl_domain *d, struct msr_param *m, struct rdt_resource *r)
{
- unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ unsigned int i;

for (i = m->low; i < m->high; i++)
wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
}

-struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
+struct rdt_ctrl_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
{
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;

list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
/* Find the domain that contains this CPU */
@@ -379,7 +379,7 @@ void rdt_ctrl_update(void *arg)
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
struct rdt_resource *r = m->res;
int cpu = smp_processor_id();
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;

d = get_domain_from_cpu(cpu, r);
if (d) {
@@ -447,18 +447,23 @@ static void setup_default_ctrlval(struct rdt_resource *r, u32 *dc)
*dc = r->default_ctrl;
}

-static void domain_free(struct rdt_hw_domain *hw_dom)
+static void ctrl_domain_free(struct rdt_hw_ctrl_domain *hw_dom)
+{
+ kfree(hw_dom->ctrl_val);
+ kfree(hw_dom);
+}
+
+static void mon_domain_free(struct rdt_hw_mon_domain *hw_dom)
{
kfree(hw_dom->arch_mbm_total);
kfree(hw_dom->arch_mbm_local);
- kfree(hw_dom->ctrl_val);
kfree(hw_dom);
}

-static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
+static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *d)
{
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct msr_param m;
u32 *dc;

@@ -481,7 +486,7 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
* @num_rmid: The size of the MBM counter array
* @hw_dom: The domain that owns the allocated arrays
*/
-static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
+static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_mon_domain *hw_dom)
{
size_t tsize;

@@ -520,10 +525,10 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
+ struct rdt_hw_ctrl_domain *hw_dom;
struct list_head *add_pos = NULL;
- struct rdt_hw_domain *hw_dom;
struct rdt_domain_hdr *hdr;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
int err;

if (id < 0) {
@@ -542,7 +547,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
return;

- d = container_of(hdr, struct rdt_domain, hdr);
+ d = container_of(hdr, struct rdt_ctrl_domain, hdr);

cpumask_set_cpu(cpu, &d->cpu_mask);
if (r->cache.arch_has_per_cpu_cfg)
@@ -562,7 +567,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
rdt_domain_reconfigure_cdp(r);

if (domain_setup_ctrlval(r, d)) {
- domain_free(hw_dom);
+ ctrl_domain_free(hw_dom);
return;
}

@@ -571,17 +576,17 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
err = resctrl_online_ctrl_domain(r, d);
if (err) {
list_del(&d->hdr.list);
- domain_free(hw_dom);
+ ctrl_domain_free(hw_dom);
}
}

static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->mon_scope);
+ struct rdt_hw_mon_domain *hw_dom;
struct list_head *add_pos = NULL;
- struct rdt_hw_domain *hw_dom;
struct rdt_domain_hdr *hdr;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;
int err;

if (id < 0) {
@@ -600,7 +605,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
return;

- d = container_of(hdr, struct rdt_domain, hdr);
+ d = container_of(hdr, struct rdt_mon_domain, hdr);

cpumask_set_cpu(cpu, &d->cpu_mask);
return;
@@ -616,7 +621,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
cpumask_set_cpu(cpu, &d->cpu_mask);

if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
- domain_free(hw_dom);
+ mon_domain_free(hw_dom);
return;
}

@@ -625,7 +630,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
err = resctrl_online_mon_domain(r, d);
if (err) {
list_del(&d->hdr.list);
- domain_free(hw_dom);
+ mon_domain_free(hw_dom);
}
}

@@ -643,9 +648,9 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
- struct rdt_hw_domain *hw_dom;
+ struct rdt_hw_ctrl_domain *hw_dom;
struct rdt_domain_hdr *hdr;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;

if (id < 0)
return;
@@ -659,8 +664,8 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
return;

- d = container_of(hdr, struct rdt_domain, hdr);
- hw_dom = resctrl_to_arch_dom(d);
+ d = container_of(hdr, struct rdt_ctrl_domain, hdr);
+ hw_dom = resctrl_to_arch_ctrl_dom(d);

cpumask_clear_cpu(cpu, &d->cpu_mask);
if (cpumask_empty(&d->cpu_mask)) {
@@ -668,12 +673,12 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
list_del(&d->hdr.list);

/*
- * rdt_domain "d" is going to be freed below, so clear
+ * rdt_ctrl_domain "d" is going to be freed below, so clear
* its pointer from pseudo_lock_region struct.
*/
if (d->plr)
d->plr->d = NULL;
- domain_free(hw_dom);
+ ctrl_domain_free(hw_dom);

return;
}
@@ -682,9 +687,9 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->mon_scope);
- struct rdt_hw_domain *hw_dom;
+ struct rdt_hw_mon_domain *hw_dom;
struct rdt_domain_hdr *hdr;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;

if (id < 0)
return;
@@ -698,14 +703,14 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
return;

- d = container_of(hdr, struct rdt_domain, hdr);
- hw_dom = resctrl_to_arch_dom(d);
+ d = container_of(hdr, struct rdt_mon_domain, hdr);
+ hw_dom = resctrl_to_arch_mon_dom(d);

cpumask_clear_cpu(cpu, &d->cpu_mask);
if (cpumask_empty(&d->cpu_mask)) {
resctrl_offline_mon_domain(r, d);
list_del(&d->hdr.list);
- domain_free(hw_dom);
+ mon_domain_free(hw_dom);

return;
}
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 34b7eb26b06d..1e68e1b91d31 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -58,7 +58,7 @@ static bool bw_validate(char *buf, unsigned long *data, struct rdt_resource *r)
}

int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
- struct rdt_domain *d)
+ struct rdt_ctrl_domain *d)
{
struct resctrl_staged_config *cfg;
u32 closid = data->rdtgrp->closid;
@@ -137,7 +137,7 @@ static bool cbm_validate(char *buf, u32 *data, struct rdt_resource *r)
* resource type.
*/
int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,
- struct rdt_domain *d)
+ struct rdt_ctrl_domain *d)
{
struct rdtgroup *rdtgrp = data->rdtgrp;
struct resctrl_staged_config *cfg;
@@ -206,8 +206,8 @@ static int parse_line(char *line, struct resctrl_schema *s,
struct resctrl_staged_config *cfg;
struct rdt_resource *r = s->res;
struct rdt_parse_data data;
+ struct rdt_ctrl_domain *d;
char *dom = NULL, *id;
- struct rdt_domain *d;
unsigned long dom_id;

if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
@@ -267,11 +267,11 @@ static u32 get_config_index(u32 closid, enum resctrl_conf_type type)
}
}

-static bool apply_config(struct rdt_hw_domain *hw_dom,
+static bool apply_config(struct rdt_hw_ctrl_domain *hw_dom,
struct resctrl_staged_config *cfg, u32 idx,
cpumask_var_t cpu_mask)
{
- struct rdt_domain *dom = &hw_dom->d_resctrl;
+ struct rdt_ctrl_domain *dom = &hw_dom->d_resctrl;

if (cfg->new_ctrl != hw_dom->ctrl_val[idx]) {
cpumask_set_cpu(cpumask_any(&dom->cpu_mask), cpu_mask);
@@ -283,11 +283,11 @@ static bool apply_config(struct rdt_hw_domain *hw_dom,
return false;
}

-int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type t, u32 cfg_val)
{
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
u32 idx = get_config_index(closid, t);
struct msr_param msr_param;

@@ -307,11 +307,11 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
{
struct resctrl_staged_config *cfg;
- struct rdt_hw_domain *hw_dom;
+ struct rdt_hw_ctrl_domain *hw_dom;
struct msr_param msr_param;
+ struct rdt_ctrl_domain *d;
enum resctrl_conf_type t;
cpumask_var_t cpu_mask;
- struct rdt_domain *d;
u32 idx;

if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
@@ -319,7 +319,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)

msr_param.res = NULL;
list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
- hw_dom = resctrl_to_arch_dom(d);
+ hw_dom = resctrl_to_arch_ctrl_dom(d);
for (t = 0; t < CDP_NUM_TYPES; t++) {
cfg = &hw_dom->d_resctrl.staged_config[t];
if (!cfg->have_new_ctrl)
@@ -449,10 +449,10 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
return ret ?: nbytes;
}

-u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
+u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type type)
{
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
u32 idx = get_config_index(closid, type);

return hw_dom->ctrl_val[idx];
@@ -461,7 +461,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int closid)
{
struct rdt_resource *r = schema->res;
- struct rdt_domain *dom;
+ struct rdt_ctrl_domain *dom;
bool sep = false;
u32 ctrl_val;

@@ -523,7 +523,7 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
}

void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
- struct rdt_domain *d, struct rdtgroup *rdtgrp,
+ struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
int evtid, int first)
{
/*
@@ -543,11 +543,11 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
{
struct kernfs_open_file *of = m->private;
struct rdt_domain_hdr *hdr;
+ struct rdt_mon_domain *d;
u32 resid, evtid, domid;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
union mon_data_bits md;
- struct rdt_domain *d;
struct rmid_read rr;
int ret = 0;

@@ -574,7 +574,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
goto out;
}

- d = container_of(hdr, struct rdt_domain, hdr);
+ d = container_of(hdr, struct rdt_mon_domain, hdr);

mon_event_read(&rr, r, d, rdtgrp, evtid, false);

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 6d5e2cbdefb5..7f06848fb828 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -170,7 +170,7 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
return 0;
}

-static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain *hw_dom,
+static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_dom,
u32 rmid,
enum resctrl_event_id eventid)
{
@@ -189,10 +189,10 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain *hw_dom,
return NULL;
}

-void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 rmid, enum resctrl_event_id eventid)
{
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
struct arch_mbm_state *am;

am = get_arch_mbm_state(hw_dom, rmid, eventid);
@@ -208,9 +208,9 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
* Assumes that hardware counters are also reset and thus that there is
* no need to record initial non-zero counts.
*/
-void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d)
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
{
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);

if (is_mbm_total_enabled())
memset(hw_dom->arch_mbm_total, 0,
@@ -229,11 +229,11 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
return chunks >> shift;
}

-int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 rmid, enum resctrl_event_id eventid, u64 *val)
{
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct arch_mbm_state *am;
u64 msr_val, chunks;
int ret;
@@ -266,7 +266,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
* decrement the count. If the busy count gets to zero on an RMID, we
* free the RMID
*/
-void __check_limbo(struct rdt_domain *d, bool force_free)
+void __check_limbo(struct rdt_mon_domain *d, bool force_free)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
struct rmid_entry *entry;
@@ -305,7 +305,7 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
}
}

-bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d)
+bool has_busy_rmid(struct rdt_resource *r, struct rdt_mon_domain *d)
{
return find_first_bit(d->rmid_busy_llc, r->num_rmid) != r->num_rmid;
}
@@ -334,7 +334,7 @@ int alloc_rmid(void)
static void add_rmid_to_limbo(struct rmid_entry *entry)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;
int cpu, err;
u64 val = 0;

@@ -383,7 +383,7 @@ void free_rmid(u32 rmid)
list_add_tail(&entry->list, &rmid_free_lru);
}

-static struct mbm_state *get_mbm_state(struct rdt_domain *d, u32 rmid,
+static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 rmid,
enum resctrl_event_id evtid)
{
switch (evtid) {
@@ -516,13 +516,13 @@ void mon_event_count(void *info)
* throttle MSRs already have low percentage values. To avoid
* unnecessarily restricting such rdtgroups, we also increase the bandwidth.
*/
-static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
+static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
{
u32 closid, rmid, cur_msr_val, new_msr_val;
struct mbm_state *pmbm_data, *cmbm_data;
+ struct rdt_ctrl_domain *dom_mba;
u32 cur_bw, delta_bw, user_bw;
struct rdt_resource *r_mba;
- struct rdt_domain *dom_mba;
struct list_head *head;
struct rdtgroup *entry;

@@ -600,7 +600,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
}
}

-static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
+static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d, int rmid)
{
struct rmid_read rr;

@@ -640,13 +640,13 @@ void cqm_handle_limbo(struct work_struct *work)
{
unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
int cpu = smp_processor_id();
+ struct rdt_mon_domain *d;
struct rdt_resource *r;
- struct rdt_domain *d;

mutex_lock(&rdtgroup_mutex);

r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- d = container_of(work, struct rdt_domain, cqm_limbo.work);
+ d = container_of(work, struct rdt_mon_domain, cqm_limbo.work);

__check_limbo(d, false);

@@ -656,7 +656,7 @@ void cqm_handle_limbo(struct work_struct *work)
mutex_unlock(&rdtgroup_mutex);
}

-void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms)
+void cqm_setup_limbo_handler(struct rdt_mon_domain *dom, unsigned long delay_ms)
{
unsigned long delay = msecs_to_jiffies(delay_ms);
int cpu;
@@ -672,9 +672,9 @@ void mbm_handle_overflow(struct work_struct *work)
unsigned long delay = msecs_to_jiffies(MBM_OVERFLOW_INTERVAL);
struct rdtgroup *prgrp, *crgrp;
int cpu = smp_processor_id();
+ struct rdt_mon_domain *d;
struct list_head *head;
struct rdt_resource *r;
- struct rdt_domain *d;

mutex_lock(&rdtgroup_mutex);

@@ -682,7 +682,7 @@ void mbm_handle_overflow(struct work_struct *work)
goto out_unlock;

r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- d = container_of(work, struct rdt_domain, mbm_over.work);
+ d = container_of(work, struct rdt_mon_domain, mbm_over.work);

list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
mbm_update(r, d, prgrp->mon.rmid);
@@ -701,7 +701,7 @@ void mbm_handle_overflow(struct work_struct *work)
mutex_unlock(&rdtgroup_mutex);
}

-void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
+void mbm_setup_overflow_handler(struct rdt_mon_domain *dom, unsigned long delay_ms)
{
unsigned long delay = msecs_to_jiffies(delay_ms);
int cpu;
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index bda32b4e1c1e..675e9e47af54 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -814,7 +814,7 @@ int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp)
* Return: true if @cbm overlaps with pseudo-locked region on @d, false
* otherwise.
*/
-bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, unsigned long cbm)
+bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_ctrl_domain *d, unsigned long cbm)
{
unsigned int cbm_len;
unsigned long cbm_b;
@@ -841,11 +841,11 @@ bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, unsigned long cbm
* if it is not possible to test due to memory allocation issue,
* false otherwise.
*/
-bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
+bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_ctrl_domain *d)
{
+ struct rdt_ctrl_domain *d_i;
cpumask_var_t cpu_with_psl;
struct rdt_resource *r;
- struct rdt_domain *d_i;
bool ret = false;

if (!zalloc_cpumask_var(&cpu_with_psl, GFP_KERNEL))
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 9df8f02ecb63..46c6d6807bad 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -85,8 +85,8 @@ void rdt_last_cmd_printf(const char *fmt, ...)

void rdt_staged_configs_clear(void)
{
+ struct rdt_ctrl_domain *dom;
struct rdt_resource *r;
- struct rdt_domain *dom;

lockdep_assert_held(&rdtgroup_mutex);

@@ -976,7 +976,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
unsigned long sw_shareable = 0, hw_shareable = 0;
unsigned long exclusive = 0, pseudo_locked = 0;
struct rdt_resource *r = s->res;
- struct rdt_domain *dom;
+ struct rdt_ctrl_domain *dom;
int i, hwb, swb, excl, psl;
enum rdtgrp_mode mode;
bool sep = false;
@@ -1205,7 +1205,7 @@ static int rdt_has_sparse_bitmasks_show(struct kernfs_open_file *of,
*
* Return: false if CBM does not overlap, true if it does.
*/
-static bool __rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d,
+static bool __rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_ctrl_domain *d,
unsigned long cbm, int closid,
enum resctrl_conf_type type, bool exclusive)
{
@@ -1260,7 +1260,7 @@ static bool __rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d
*
* Return: true if CBM overlap detected, false if there is no overlap
*/
-bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_domain *d,
+bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_ctrl_domain *d,
unsigned long cbm, int closid, bool exclusive)
{
enum resctrl_conf_type peer_type = resctrl_peer_type(s->conf_type);
@@ -1291,10 +1291,10 @@ bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_domain *d,
static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
{
int closid = rdtgrp->closid;
+ struct rdt_ctrl_domain *d;
struct resctrl_schema *s;
struct rdt_resource *r;
bool has_cache = false;
- struct rdt_domain *d;
u32 ctrl;

list_for_each_entry(s, &resctrl_schema_all, list) {
@@ -1407,7 +1407,7 @@ static ssize_t rdtgroup_mode_write(struct kernfs_open_file *of,
* bitmap functions work correctly.
*/
unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
- struct rdt_domain *d, unsigned long cbm)
+ struct rdt_ctrl_domain *d, unsigned long cbm)
{
struct cpu_cacheinfo *ci;
unsigned int size = 0;
@@ -1439,9 +1439,9 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
{
struct resctrl_schema *schema;
enum resctrl_conf_type type;
+ struct rdt_ctrl_domain *d;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
- struct rdt_domain *d;
unsigned int size;
int ret = 0;
u32 closid;
@@ -1553,7 +1553,7 @@ static void mon_event_config_read(void *info)
mon_info->mon_config = msrval & MAX_EVT_CONFIG_BITS;
}

-static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
+static void mondata_config_read(struct rdt_mon_domain *d, struct mon_config_info *mon_info)
{
smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
}
@@ -1561,7 +1561,7 @@ static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mo
static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
{
struct mon_config_info mon_info = {0};
- struct rdt_domain *dom;
+ struct rdt_mon_domain *dom;
bool sep = false;

mutex_lock(&rdtgroup_mutex);
@@ -1618,7 +1618,7 @@ static void mon_event_config_write(void *info)
}

static int mbm_config_write_domain(struct rdt_resource *r,
- struct rdt_domain *d, u32 evtid, u32 val)
+ struct rdt_mon_domain *d, u32 evtid, u32 val)
{
struct mon_config_info mon_info = {0};
int ret = 0;
@@ -1668,7 +1668,7 @@ static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
{
char *dom_str = NULL, *id_str;
unsigned long dom_id, val;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;
int ret = 0;

next:
@@ -2216,9 +2216,9 @@ static inline bool is_mba_linear(void)
static int set_cache_qos_cfg(int level, bool enable)
{
void (*update)(void *arg);
+ struct rdt_ctrl_domain *d;
struct rdt_resource *r_l;
cpumask_var_t cpu_mask;
- struct rdt_domain *d;
int cpu;

if (level == RDT_RESOURCE_L3)
@@ -2265,7 +2265,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
l3_qos_cfg_update(&hw_res->cdp_enabled);
}

-static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_domain *d)
+static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_ctrl_domain *d)
{
u32 num_closid = resctrl_arch_get_num_closid(r);
int cpu = cpumask_any(&d->cpu_mask);
@@ -2283,7 +2283,7 @@ static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_domain *d)
}

static void mba_sc_domain_destroy(struct rdt_resource *r,
- struct rdt_domain *d)
+ struct rdt_ctrl_domain *d)
{
kfree(d->mbps_val);
d->mbps_val = NULL;
@@ -2309,7 +2309,7 @@ static int set_mba_sc(bool mba_sc)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
u32 num_closid = resctrl_arch_get_num_closid(r);
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
int i;

if (!supports_mba_mbps() || mba_sc == is_mba_sc(r))
@@ -2578,7 +2578,7 @@ static int rdt_get_tree(struct fs_context *fc)
{
struct rdt_fs_context *ctx = rdt_fc2context(fc);
unsigned long flags = RFTYPE_CTRL_BASE;
- struct rdt_domain *dom;
+ struct rdt_mon_domain *dom;
struct rdt_resource *r;
int ret;

@@ -2762,10 +2762,10 @@ static int rdt_init_fs_context(struct fs_context *fc)
static int reset_all_ctrls(struct rdt_resource *r)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- struct rdt_hw_domain *hw_dom;
+ struct rdt_hw_ctrl_domain *hw_dom;
struct msr_param msr_param;
+ struct rdt_ctrl_domain *d;
cpumask_var_t cpu_mask;
- struct rdt_domain *d;
int i;

if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
@@ -2781,7 +2781,7 @@ static int reset_all_ctrls(struct rdt_resource *r)
* from each domain to update the MSRs below.
*/
list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
- hw_dom = resctrl_to_arch_dom(d);
+ hw_dom = resctrl_to_arch_ctrl_dom(d);
cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);

for (i = 0; i < hw_res->num_closid; i++)
@@ -2976,7 +2976,7 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
}

static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
- struct rdt_domain *d,
+ struct rdt_mon_domain *d,
struct rdt_resource *r, struct rdtgroup *prgrp)
{
union mon_data_bits priv;
@@ -3025,7 +3025,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
* and "monitor" groups with given domain id.
*/
static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
- struct rdt_domain *d)
+ struct rdt_mon_domain *d)
{
struct kernfs_node *parent_kn;
struct rdtgroup *prgrp, *crgrp;
@@ -3047,7 +3047,7 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn,
struct rdt_resource *r,
struct rdtgroup *prgrp)
{
- struct rdt_domain *dom;
+ struct rdt_mon_domain *dom;
int ret;

list_for_each_entry(dom, &r->mon_domains, hdr.list) {
@@ -3149,7 +3149,7 @@ static u32 cbm_ensure_valid(u32 _val, struct rdt_resource *r)
* Set the RDT domain up to start off with all usable allocations. That is,
* all shareable and unused bits. All-zero CBM is invalid.
*/
-static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema *s,
+static int __init_one_rdt_domain(struct rdt_ctrl_domain *d, struct resctrl_schema *s,
u32 closid)
{
enum resctrl_conf_type peer_type = resctrl_peer_type(s->conf_type);
@@ -3229,7 +3229,7 @@ static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema *s,
*/
static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid)
{
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
int ret;

list_for_each_entry(d, &s->res->ctrl_domains, hdr.list) {
@@ -3245,7 +3245,7 @@ static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid)
static void rdtgroup_init_mba(struct rdt_resource *r, u32 closid)
{
struct resctrl_staged_config *cfg;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;

list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
if (is_mba_sc(r)) {
@@ -3842,14 +3842,14 @@ static void __init rdtgroup_setup_default(void)
mutex_unlock(&rdtgroup_mutex);
}

-static void domain_destroy_mon_state(struct rdt_domain *d)
+static void domain_destroy_mon_state(struct rdt_mon_domain *d)
{
bitmap_free(d->rmid_busy_llc);
kfree(d->mbm_total);
kfree(d->mbm_local);
}

-void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
+void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d)
{
lockdep_assert_held(&rdtgroup_mutex);

@@ -3857,7 +3857,7 @@ void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
mba_sc_domain_destroy(r, d);
}

-void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
{
lockdep_assert_held(&rdtgroup_mutex);

@@ -3886,7 +3886,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
domain_destroy_mon_state(d);
}

-static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
+static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain *d)
{
size_t tsize;

@@ -3916,7 +3916,7 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

-int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
+int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d)
{
lockdep_assert_held(&rdtgroup_mutex);

@@ -3927,7 +3927,7 @@ int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

-int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
{
int err;

--
2.41.0

2023-10-20 21:33:10

by Tony Luck

[permalink] [raw]
Subject: [PATCH v9 7/8] x86/resctrl: Sub NUMA Cluster detection and enable

There isn't a simple h/w bit that indicates whether a CPU is
running in Sub NUMA Cluster (SNC) mode. Infer the state by comparing
the ratio of NUMA nodes to L3 cache instances.

When SNC mode is detected, reconfigure the RMID counters by updating
the MSR_RMID_SNC_CONFIG MSR on each socket as CPUs are seen.

Clearing bit zero of the MSR divides the RMIDs and renumbers the ones
on the second SNC node to start from zero. An earlier commit includes
all the required changes in Linux to operate in this reconfigured mode.

Reviewed-by: Peter Newman <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
Changes since v6:

Moved kfree(node_caches); earlier, to the earliest point where it
is no longer needed.

Added Granite Rapids to list of CPU models that support SNC mode.

Added Peter's review tag

arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/core.c | 92 ++++++++++++++++++++++++++++++
2 files changed, 93 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e3fa9cecd599..4285a5ee81fe 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1109,6 +1109,7 @@
#define MSR_IA32_QM_CTR 0xc8e
#define MSR_IA32_PQR_ASSOC 0xc8f
#define MSR_IA32_L3_CBM_BASE 0xc90
+#define MSR_RMID_SNC_CONFIG 0xca0
#define MSR_IA32_L2_CBM_BASE 0xd10
#define MSR_IA32_MBA_THRTL_BASE 0xd50

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 0e418dd14070..ac187eb0440f 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -16,11 +16,14 @@

#define pr_fmt(fmt) "resctrl: " fmt

+#include <linux/cpu.h>
#include <linux/slab.h>
#include <linux/err.h>
#include <linux/cacheinfo.h>
#include <linux/cpuhotplug.h>
+#include <linux/mod_devicetable.h>

+#include <asm/cpu_device_id.h>
#include <asm/intel-family.h>
#include <asm/resctrl.h>
#include "internal.h"
@@ -755,11 +758,42 @@ static void clear_closid_rmid(int cpu)
wrmsr(MSR_IA32_PQR_ASSOC, 0, 0);
}

+/*
+ * The power-on reset value of MSR_RMID_SNC_CONFIG is 0x1
+ * which indicates that RMIDs are configured in legacy mode.
+ * This mode is incompatible with Linux resctrl semantics
+ * as RMIDs are partitioned between SNC nodes, which requires
+ * a user to know which RMID is allocated to a task.
+ * Clearing bit 0 reconfigures the RMID counters for use
+ * in Sub NUMA Cluster mode. This mode is better for Linux.
+ * The RMID space is divided between all SNC nodes with the
+ * RMIDs renumbered to start from zero in each node when
+ * couning operations from tasks. Code to read the counters
+ * must adjust RMID counnter numbers based on SNC node. See
+ * __rmid_read() for code that does this.
+ */
+static void snc_remap_rmids(int cpu)
+{
+ u64 val;
+
+ /* Only need to enable once per package. */
+ if (cpumask_first(topology_core_cpumask(cpu)) != cpu)
+ return;
+
+ rdmsrl(MSR_RMID_SNC_CONFIG, val);
+ val &= ~BIT_ULL(0);
+ wrmsrl(MSR_RMID_SNC_CONFIG, val);
+}
+
static int resctrl_online_cpu(unsigned int cpu)
{
struct rdt_resource *r;

mutex_lock(&rdtgroup_mutex);
+
+ if (snc_nodes_per_l3_cache > 1)
+ snc_remap_rmids(cpu);
+
for_each_capable_rdt_resource(r)
domain_add_cpu(cpu, r);
/* The cpu is set in default rdtgroup after online. */
@@ -1014,11 +1048,69 @@ static __init bool get_rdt_resources(void)
return (rdt_mon_capable || rdt_alloc_capable);
}

+/* CPU models that support MSR_RMID_SNC_CONFIG */
+static const struct x86_cpu_id snc_cpu_ids[] __initconst = {
+ X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(GRANITERAPIDS_X, 0),
+ {}
+};
+
+/*
+ * There isn't a simple h/w bit that indicates whether a CPU is running
+ * in Sub NUMA Cluster (SNC) mode. Infer the state by comparing the
+ * ratio of NUMA nodes to L3 cache instances.
+ * It is not possible to accurately determine SNC state if the system is
+ * booted with a maxcpus=N parameter. That distorts the ratio of SNC nodes
+ * to L3 caches. It will be OK if system is booted with hyperthreading
+ * disabled (since this doesn't affect the ratio).
+ */
+static __init int snc_get_config(void)
+{
+ unsigned long *node_caches;
+ int mem_only_nodes = 0;
+ int cpu, node, ret;
+ int num_l3_caches;
+
+ if (!x86_match_cpu(snc_cpu_ids))
+ return 1;
+
+ node_caches = bitmap_zalloc(nr_node_ids, GFP_KERNEL);
+ if (!node_caches)
+ return 1;
+
+ cpus_read_lock();
+ for_each_node(node) {
+ cpu = cpumask_first(cpumask_of_node(node));
+ if (cpu < nr_cpu_ids)
+ set_bit(get_cpu_cacheinfo_id(cpu, 3), node_caches);
+ else
+ mem_only_nodes++;
+ }
+ cpus_read_unlock();
+
+ num_l3_caches = bitmap_weight(node_caches, nr_node_ids);
+ kfree(node_caches);
+
+ if (!num_l3_caches)
+ return 1;
+
+ ret = (nr_node_ids - mem_only_nodes) / num_l3_caches;
+
+ if (ret > 1)
+ rdt_resources_all[RDT_RESOURCE_L3].r_resctrl.mon_scope = RESCTRL_NODE;
+
+ return ret;
+}
+
static __init void rdt_init_res_defs_intel(void)
{
struct rdt_hw_resource *hw_res;
struct rdt_resource *r;

+ snc_nodes_per_l3_cache = snc_get_config();
+
for_each_rdt_resource(r) {
hw_res = resctrl_to_arch_res(r);

--
2.41.0

2023-10-20 21:33:11

by Tony Luck

[permalink] [raw]
Subject: [PATCH v9 6/8] x86/resctrl: Introduce snc_nodes_per_l3_cache

Intel Sub-NUMA Cluster (SNC) is a feature that subdivides the CPU cores
and memory controllers on a socket into two or more groups. These are
presented to the operating system as NUMA nodes.

This may enable some workloads to have slightly lower latency to memory
as the memory controller(s) in an SNC node are electrically closer to the
CPU cores on that SNC node. This cost may be offset by lower bandwidth
since the memory accesses for each core can only be interleaved between
the memory controllers on the same SNC node.

Resctrl monitoring on Intel system depends upon attaching RMIDs to tasks
to track L3 cache occupancy and memory bandwidth. There is an MSR that
controls how the RMIDs are shared between SNC nodes.

The default mode divides them numerically. E.g. when there are two SNC
nodes on a socket the lower number half of the RMIDs are given to the
first node, the remainder to the second node. This would be difficult
to use with the Linux resctrl interface as specific RMID values assigned
to resctrl groups are not visible to users.

The other mode divides the RMIDs and renumbers the ones on the second
SNC node to start from zero.

Even with this renumbering SNC mode requires several changes in resctrl
behavior for correct operation.

Add a global integer "snc_nodes_per_l3_cache" that will show how many
SNC nodes share each L3 cache. When this is "1", SNC mode is either
not implemented, or not enabled.

A later patch will detect SNC mode and set snc_nodes_per_l3_cache to
the appropriate value. For now it remains at the default "1" to
indicate SNC mode is not active.

Code that needs to take action when SNC is enabled is:
1) The number of logical RMIDs per L3 cache available for use is the
number of physical RMIDs divided by the number of SNC nodes.
2) Likewise the "mon_scale" value must be adjusted for the number
of SNC nodes.
3) The RMID renumbering operates when using the value from the
IA32_PQR_ASSOC MSR to count accesses by a task. When reading an RMID
counter, code must adjust from the logical RMID used to the physical
RMID value for the SNC node that it wishes to read and load the
adjusted value into the IA32_QM_EVTSEL MSR.
4) The L3 cache is divided between the SNC nodes. So the value
reported in the resctrl "size" file is adjusted.
5) The "-o mba_MBps" mount option must be disabled in SNC mode
because the monitoring is being done per SNC node, while the
bandwidth allocation is still done at the L3 cache scope.
Trying to use this feedback loop might result in contradictory
changes to the throttling level coming from each of the SNC
node bandwidth measurements.

Reviewed-by: Peter Newman <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
Changes since v6:

In commit comment s/redumbering/renumbering/

Move check that SNC is not enabled into supports_mba_mbps().

Add Peter's review tag.

arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
arch/x86/kernel/cpu/resctrl/core.c | 6 ++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 16 +++++++++++++---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 5 +++--
4 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 41a23556f57d..563e6203321e 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -446,6 +446,8 @@ DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);

extern struct dentry *debugfs_resctrl;

+extern int snc_nodes_per_l3_cache;
+
enum resctrl_res_level {
RDT_RESOURCE_L3,
RDT_RESOURCE_L2,
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 2c3975c9c20c..0e418dd14070 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -48,6 +48,12 @@ int max_name_width, max_data_width;
*/
bool rdt_alloc_capable;

+/*
+ * Number of SNC nodes that share each L3 cache. Default is 1 for
+ * systems that do not support SNC, or have SNC disabled.
+ */
+int snc_nodes_per_l3_cache = 1;
+
static void
mba_wrmsr_intel(struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 7f06848fb828..9122c9a725e2 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -148,8 +148,18 @@ static inline struct rmid_entry *__rmid_entry(u32 rmid)

static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ int cpu = smp_processor_id();
+ int rmid_offset = 0;
u64 msr_val;

+ /*
+ * When SNC mode is on, need to compute the offset to read the
+ * physical RMID counter for the node to which this CPU belongs.
+ */
+ if (snc_nodes_per_l3_cache > 1)
+ rmid_offset = (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->num_rmid;
+
/*
* As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
* with a valid event code for supported resource type and the bits
@@ -158,7 +168,7 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
* IA32_QM_CTR.Error (bit 63) and IA32_QM_CTR.Unavailable (bit 62)
* are error bits.
*/
- wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
+ wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid + rmid_offset);
rdmsrl(MSR_IA32_QM_CTR, msr_val);

if (msr_val & RMID_VAL_ERROR)
@@ -783,8 +793,8 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
int ret;

resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
- hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
- r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1;
+ hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_nodes_per_l3_cache;
+ r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;

if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 46c6d6807bad..d2aae0ca3c40 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1425,7 +1425,7 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
}
}

- return size;
+ return size / snc_nodes_per_l3_cache;
}

/*
@@ -2298,7 +2298,8 @@ static bool supports_mba_mbps(void)
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;

return (is_mbm_local_enabled() &&
- r->alloc_capable && is_mba_linear());
+ r->alloc_capable && is_mba_linear() &&
+ snc_nodes_per_l3_cache == 1);
}

/*
--
2.41.0

2023-10-20 21:33:17

by Tony Luck

[permalink] [raw]
Subject: [PATCH v9 5/8] x86/resctrl: Add node-scope to the options for feature scope

Currently supported resctrl features are all domain scoped the same as the
scope of the L2 or L3 caches.

Add RESCTRL_NODE as a new option for features that are scoped at the
same granularity as NUMA nodes. This is needed for Intel's Sub-NUMA
Cluster (SNC) feature where monitoring features are node scoped.

Reviewed-by: Peter Newman <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
No changes since v6 except to add Peter's review tag

include/linux/resctrl.h | 1 +
arch/x86/kernel/cpu/resctrl/core.c | 2 ++
2 files changed, 3 insertions(+)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 4778ef71c893..683706355810 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -172,6 +172,7 @@ struct resctrl_schema;
enum resctrl_scope {
RESCTRL_L2_CACHE = 2,
RESCTRL_L3_CACHE = 3,
+ RESCTRL_NODE,
};

/**
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index bcc4bd2e1930..2c3975c9c20c 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -515,6 +515,8 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
case RESCTRL_L2_CACHE:
case RESCTRL_L3_CACHE:
return get_cpu_cacheinfo_id(cpu, scope);
+ case RESCTRL_NODE:
+ return cpu_to_node(cpu);
default:
break;
}
--
2.41.0

2023-10-20 21:33:20

by Tony Luck

[permalink] [raw]
Subject: [PATCH v9 8/8] x86/resctrl: Update documentation with Sub-NUMA cluster changes

With Sub-NUMA Cluster mode enabled the scope of monitoring resources is
per-NODE instead of per-L3 cache. Suffixes of directories with "L3" in
their name refer to Sub-NUMA nodes instead of L3 cache ids.

Users should be aware that SNC mode also affects the amount of L3 cache
available for allocation within each SNC node.

Reviewed-by: Peter Newman <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
Changes since v6:

Added Peter's review tag

Documentation/arch/x86/resctrl.rst | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index a6279df64a9d..d1db200db5f9 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -366,9 +366,9 @@ When control is enabled all CTRL_MON groups will also contain:
When monitoring is enabled all MON groups will also contain:

"mon_data":
- This contains a set of files organized by L3 domain and by
- RDT event. E.g. on a system with two L3 domains there will
- be subdirectories "mon_L3_00" and "mon_L3_01". Each of these
+ This contains a set of files organized by L3 domain or by NUMA
+ node (depending on whether Sub-NUMA Cluster (SNC) mode is disabled
+ or enabled respectively) and by RDT event. Each of these
directories have one file per event (e.g. "llc_occupancy",
"mbm_total_bytes", and "mbm_local_bytes"). In a MON group these
files provide a read out of the current value of the event for
@@ -478,6 +478,23 @@ if non-contiguous 1s value is supported. On a system with a 20-bit mask
each bit represents 5% of the capacity of the cache. You could partition
the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.

+Notes on Sub-NUMA Cluster mode
+==============================
+When SNC mode is enabled Linux may load balance tasks between Sub-NUMA
+nodes much more readily than between regular NUMA nodes since the CPUs
+on Sub-NUMA nodes share the same L3 cache and the system may report
+the NUMA distance between Sub-NUMA nodes with a lower value than used
+for regular NUMA nodes. Users who do not bind tasks to the CPUs of a
+specific Sub-NUMA node must read the "llc_occupancy", "mbm_total_bytes",
+and "mbm_local_bytes" for all Sub-NUMA nodes where the tasks may execute
+to get the full view of traffic for which the tasks were the source.
+
+The cache allocation feature still provides the same number of
+bits in a mask to control allocation into the L3 cache. But each
+of those ways has its capacity reduced because the cache is divided
+between the SNC nodes. The values reported in the resctrl
+"size" files are adjusted accordingly.
+
Memory bandwidth Allocation and monitoring
==========================================

--
2.41.0

2023-10-24 05:42:14

by Shaopeng Tan (Fujitsu)

[permalink] [raw]
Subject: RE: [PATCH v9 0/8] Add support for Sub-NUMA cluster (SNC) systems

Hi tony,

> base-commit: 3300447612b2adbc05cbb90e5d1cb288f19c40c6

$ git checkout 3300447612b2adbc05cbb90e5d1cb288f19c40c6 -b patch_test
fatal: reference is not a tree: 3300447612b2adbc05cbb90e5d1cb288f19c40c6

Then I tried apply this patch series to kernel v6.5 and v6.6-rc1~7,
but it failed with error message "Patch failed at 000x".

Could you tell me what kernel version this patch series is based on?

Best regards,
Shaopeng TAN

2023-10-24 15:36:26

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v9 0/8] Add support for Sub-NUMA cluster (SNC) systems

> Could you tell me what kernel version this patch series is based on?

Shaopeng TAN,

Sorry. It's buried in all the other text in the change log in the cover letter.
I should have put this more prominently at the start of the cover letter.

Rebased to tip/master October 20th since that has several other
resctrl changes staged resdy for next merge window. No
significant collisions, just noise where "git am" would not
automatically apply. New base is:

3300447612b2 ("Merge branch into tip/master: 'x86/tdx'")

-Tony

2023-10-30 21:18:53

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v9 1/8] x86/resctrl: Prepare for new domain scope

Hi Tony,

On 10/20/2023 2:30 PM, Tony Luck wrote:

...

> @@ -506,12 +519,17 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
> */
> static void domain_add_cpu(int cpu, struct rdt_resource *r)
> {
> - int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
> + int id = get_domain_id_from_scope(cpu, r->scope);
> struct list_head *add_pos = NULL;
> struct rdt_hw_domain *hw_dom;
> struct rdt_domain *d;
> int err;
>
> + if (id < 0) {
> + pr_warn_once("Can't find domain id for CPU:%d scope:%d for resource %s\n",
> + cpu, r->scope, r->name);
> + return;
> + }
> d = rdt_find_domain(r, id, &add_pos);
> if (IS_ERR(d)) {
> pr_warn("Couldn't find cache id for CPU %d\n", cpu);

From what I can tell the original implementation relied on implementation of
rdt_find_domain() to do error checking of the id value, printing the above pr_warn()
if id was found to be invalid. In your change the error checking on id is moved
earlier yet this original behavior is maintained. How could rdt_find_domain()
possibly fail for this reason at this point?

> diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> index 8f559eeae08e..8c5f932bc00b 100644
> --- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> +++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
> @@ -292,10 +292,14 @@ static void pseudo_lock_region_clear(struct pseudo_lock_region *plr)
> */
> static int pseudo_lock_region_init(struct pseudo_lock_region *plr)
> {
> + int scope = plr->s->res->scope;

enum resctrl_scope ?

> struct cpu_cacheinfo *ci;
> int ret;
> int i;




Reinette

2023-10-30 21:21:35

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v9 3/8] x86/resctrl: Prepare for different scope for control/monitor operations

Hi Tony,

On 10/20/2023 2:30 PM, Tony Luck wrote:
> Resctrl assumes that control and monitor operations on a resource are
> performed at the same scope.
>
> Prepare for systems that use different scope (specifically L3 scope for
> cache control and NODE scope for cache occupancy and memory bandwidth
> monitoring).

The first paragraph is a generalization of all resources but then the
second paragraph only mentions L3. In preparation for readers seeing
that only L3 resource's monitoring scope initialized it may help
to be specific here that resctrl only supports monitoring on L3.

> Create separate domain lists for control and monitor operations.

Please do note that an upcoming change changes the domain list locking.
I expect the transition to go smoothly with the locking and list type
translating to both lists so just sharing for your information in case you
are now aware:
https://lore.kernel.org/lkml/[email protected]/

>
> Note that errors during initialization of either control or monitor
> functions on a domain would previously result in that domain being
> excluded from both control and monitor operations. Now the domains are
> allocated independently it is no longer required to disable both control
> and monitor operations if either fail.
>
> Signed-off-by: Tony Luck <[email protected]>
> ---
...
> @@ -356,7 +359,7 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
> {
> struct rdt_domain *d;
>
> - list_for_each_entry(d, &r->domains, hdr.list) {
> + list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
> /* Find the domain that contains this CPU */
> if (cpumask_test_cpu(cpu, &d->cpu_mask))
> return d;

This appears to silently turn a generic "get_domain_from_cpu()" into
code that assumes control domain. This works for now for the existing
users but can trip up future changes. I think this would be better
renamed to get_ctrl_domain_from_cpu() or something better.

> @@ -388,29 +391,39 @@ void rdt_ctrl_update(void *arg)
> }
>
> /*
> - * rdt_find_domain - Find a domain in a resource that matches input resource id
> + * rdt_find_domain - Find a domain in one of a resource domain lists.

The above does not sound right.
"one of a resource domain lists" -> "a resource domain list" or "one of the resource
domain lists" or ?

> *
> - * Search resource r's domain list to find the resource id. If the resource
> - * id is found in a domain, return the domain. Otherwise, if requested by
> - * caller, return the first domain whose id is bigger than the input id.
> + * Search the list to find the resource id. If the resource id is found
> + * in a domain, return the domain. Otherwise, if requested by caller,
> + * return the first domain whose id is bigger than the input id.

The above does not sound right. First there is "Search the list to find
the resource id." The resource id is not involved in this code, do you
mean "domain id"? Also later "If the resource id is found in a domain,"
what does that mean here?

> * The domain list is sorted by id in ascending order.
> + *
> + * If an existing domain in the resource r's domain list matches the cpu's
> + * resource id, add the cpu in the domain.

domain id?

> + *
> + * Otherwise, caller will allocate a new domain and insert into the right position
> + * in the domain list sorted by id in ascending order.
> + *
> + * The order in the domain list is visible to users when we print entries
> + * in the schemata file and schemata input is validated to have the same order
> + * as this list.

Please document what the caller does at the caller. Also, please always use "CPU",
not "cpu". Finally, please do not impersonate code (no "we"). I understand you
are copying original comments, no need to propagate these issues.

> */
> -struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
> - struct list_head **pos)
> +struct rdt_domain_hdr *rdt_find_domain(struct list_head *h, int id,
> + struct list_head **pos)
> {
> - struct rdt_domain *d;
> + struct rdt_domain_hdr *d;
> struct list_head *l;
>
> if (id < 0)
> return ERR_PTR(-ENODEV);
>
> - list_for_each(l, &r->domains) {
> - d = list_entry(l, struct rdt_domain, hdr.list);
> + list_for_each(l, h) {
> + d = list_entry(l, struct rdt_domain_hdr, list);
> /* When id is found, return its domain. */
> - if (id == d->hdr.id)
> + if (id == d->id)
> return d;
> /* Stop searching when finding id's position in sorted list. */
> - if (id < d->hdr.id)
> + if (id < d->id)
> break;
> }
>
> @@ -504,39 +517,33 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
> return -EINVAL;
> }
>
> -/*
> - * domain_add_cpu - Add a cpu to a resource's domain list.
> - *
> - * If an existing domain in the resource r's domain list matches the cpu's
> - * resource id, add the cpu in the domain.
> - *
> - * Otherwise, a new domain is allocated and inserted into the right position
> - * in the domain list sorted by id in ascending order.
> - *
> - * The order in the domain list is visible to users when we print entries
> - * in the schemata file and schemata input is validated to have the same order
> - * as this list.
> - */
> -static void domain_add_cpu(int cpu, struct rdt_resource *r)
> +static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
> {
> - int id = get_domain_id_from_scope(cpu, r->scope);
> + int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
> struct list_head *add_pos = NULL;
> struct rdt_hw_domain *hw_dom;
> + struct rdt_domain_hdr *hdr;
> struct rdt_domain *d;
> int err;
>
> if (id < 0) {
> - pr_warn_once("Can't find domain id for CPU:%d scope:%d for resource %s\n",
> - cpu, r->scope, r->name);
> + pr_warn_once("Can't find control domain id for CPU:%d scope:%d for resource %s\n",
> + cpu, r->ctrl_scope, r->name);
> return;
> }
> - d = rdt_find_domain(r, id, &add_pos);
> - if (IS_ERR(d)) {
> - pr_warn("Couldn't find cache id for CPU %d\n", cpu);
> +
> + hdr = rdt_find_domain(&r->ctrl_domains, id, &add_pos);
> + if (IS_ERR(hdr)) {
> + pr_warn("Couldn't find control scope id=%d for CPU %d\n", id, cpu);

How can the failure in the error message be encountered at this point?

> return;
> }
>
> - if (d) {
> + if (hdr) {
> + if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
> + return;
> +
> + d = container_of(hdr, struct rdt_domain, hdr);
> +
> cpumask_set_cpu(cpu, &d->cpu_mask);
> if (r->cache.arch_has_per_cpu_cfg)
> rdt_domain_reconfigure_cdp(r);
> @@ -549,48 +556,115 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
>
> d = &hw_dom->d_resctrl;
> d->hdr.id = id;
> + d->hdr.type = RESCTRL_CTRL_DOMAIN;
> cpumask_set_cpu(cpu, &d->cpu_mask);
>
> rdt_domain_reconfigure_cdp(r);
>
> - if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
> + if (domain_setup_ctrlval(r, d)) {
> + domain_free(hw_dom);
> + return;
> + }
> +
> + list_add_tail(&d->hdr.list, add_pos);
> +
> + err = resctrl_online_ctrl_domain(r, d);
> + if (err) {
> + list_del(&d->hdr.list);
> domain_free(hw_dom);
> + }
> +}
> +
> +static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
> +{
> + int id = get_domain_id_from_scope(cpu, r->mon_scope);
> + struct list_head *add_pos = NULL;
> + struct rdt_hw_domain *hw_dom;
> + struct rdt_domain_hdr *hdr;
> + struct rdt_domain *d;
> + int err;
> +
> + if (id < 0) {
> + pr_warn_once("Can't find monitor domain id for CPU:%d scope:%d for resource %s\n",
> + cpu, r->mon_scope, r->name);
> + return;
> + }
> +
> + hdr = rdt_find_domain(&r->mon_domains, id, &add_pos);
> + if (IS_ERR(hdr)) {
> + pr_warn("Couldn't find monitor scope id=%d for CPU %d\n", id, cpu);
> + return;
> + }
> +
> + if (hdr) {
> + if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
> + return;
> +
> + d = container_of(hdr, struct rdt_domain, hdr);
> +
> + cpumask_set_cpu(cpu, &d->cpu_mask);
> return;
> }
>
> - if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
> + hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu));
> + if (!hw_dom)
> + return;
> +
> + d = &hw_dom->d_resctrl;
> + d->hdr.id = id;
> + d->hdr.type = RESCTRL_MON_DOMAIN;
> + cpumask_set_cpu(cpu, &d->cpu_mask);
> +
> + if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
> domain_free(hw_dom);
> return;
> }
>
> list_add_tail(&d->hdr.list, add_pos);
>
> - err = resctrl_online_domain(r, d);
> + err = resctrl_online_mon_domain(r, d);
> if (err) {
> list_del(&d->hdr.list);
> domain_free(hw_dom);
> }
> }
>
> -static void domain_remove_cpu(int cpu, struct rdt_resource *r)
> +/*
> + * domain_add_cpu - Add a cpu to either/both resource's domain lists.

cpu -> CPU (please check all changelog and comments)

> + */
> +static void domain_add_cpu(int cpu, struct rdt_resource *r)
> +{
> + if (r->alloc_capable)
> + domain_add_cpu_ctrl(cpu, r);
> + if (r->mon_capable)
> + domain_add_cpu_mon(cpu, r);
> +}
> +

> @@ -3914,18 +3916,22 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
> return 0;
> }
>
> -int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
> +int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
> {
> - int err;
> -
> lockdep_assert_held(&rdtgroup_mutex);
>
> if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
> /* RDT_RESOURCE_MBA is never mon_capable */

This comment was used to justify the early exit based on later
"if (!r->mon_capable)" test. With the test removed this comment
becomes unnecessary.


Reinette

2023-10-30 21:21:52

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v9 4/8] x86/resctrl: Split the rdt_domain and rdt_hw_domain structures

Hi Tony,

On 10/20/2023 2:30 PM, Tony Luck wrote:
> The same rdt_domain structure is used for both control and monitor
> functions. But this results in wasted memory as some of the fields are
> only used by control functions, while most are only used for monitor
> functions.
>
> Split into separate rdt_ctrl_domain and rdt_mon_domain structures with
> just the fields required for control and monitoring respectively.

Sounds like a motivation for the cpumask to form part of the
common header?

Reinette

2023-10-30 21:22:44

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v9 6/8] x86/resctrl: Introduce snc_nodes_per_l3_cache

Hi Tony,

On 10/20/2023 2:30 PM, Tony Luck wrote:
> Intel Sub-NUMA Cluster (SNC) is a feature that subdivides the CPU cores
> and memory controllers on a socket into two or more groups. These are
> presented to the operating system as NUMA nodes.
>
> This may enable some workloads to have slightly lower latency to memory
> as the memory controller(s) in an SNC node are electrically closer to the
> CPU cores on that SNC node. This cost may be offset by lower bandwidth
> since the memory accesses for each core can only be interleaved between
> the memory controllers on the same SNC node.
>
> Resctrl monitoring on Intel system depends upon attaching RMIDs to tasks

"on Intel system depends" -> "on an Intel system depends" or "on Intel
systems depend" or ... ?

> to track L3 cache occupancy and memory bandwidth. There is an MSR that
> controls how the RMIDs are shared between SNC nodes.
>
> The default mode divides them numerically. E.g. when there are two SNC
> nodes on a socket the lower number half of the RMIDs are given to the
> first node, the remainder to the second node. This would be difficult
> to use with the Linux resctrl interface as specific RMID values assigned
> to resctrl groups are not visible to users.
>
> The other mode divides the RMIDs and renumbers the ones on the second
> SNC node to start from zero.
>
> Even with this renumbering SNC mode requires several changes in resctrl
> behavior for correct operation.
>
> Add a global integer "snc_nodes_per_l3_cache" that will show how many
> SNC nodes share each L3 cache. When this is "1", SNC mode is either
> not implemented, or not enabled.
>
> A later patch will detect SNC mode and set snc_nodes_per_l3_cache to

Please remove usages of "later patch" from this series. For reference:
https://lore.kernel.org/lkml/20231009171918.GPZSQ2Frs%2Fqp129wsP@fat_crate.local/
Please check whole series. For same reason I expect "earlier patch" to
need removal also.

> the appropriate value. For now it remains at the default "1" to
> indicate SNC mode is not active.
>
> Code that needs to take action when SNC is enabled is:
> 1) The number of logical RMIDs per L3 cache available for use is the
> number of physical RMIDs divided by the number of SNC nodes.
> 2) Likewise the "mon_scale" value must be adjusted for the number
> of SNC nodes.

Can this be expanded to indicate how the value needs to be adjusted?

> 3) The RMID renumbering operates when using the value from the
> IA32_PQR_ASSOC MSR to count accesses by a task. When reading an RMID
> counter, code must adjust from the logical RMID used to the physical
> RMID value for the SNC node that it wishes to read and load the
> adjusted value into the IA32_QM_EVTSEL MSR.
> 4) The L3 cache is divided between the SNC nodes. So the value
> reported in the resctrl "size" file is adjusted.

Can this be expanded to indicate how the value needs to be adjusted?

> 5) The "-o mba_MBps" mount option must be disabled in SNC mode
> because the monitoring is being done per SNC node, while the
> bandwidth allocation is still done at the L3 cache scope.
> Trying to use this feedback loop might result in contradictory
> changes to the throttling level coming from each of the SNC
> node bandwidth measurements.
>
> Reviewed-by: Peter Newman <[email protected]>
> Signed-off-by: Tony Luck <[email protected]>
> ---
> Changes since v6:
>
> In commit comment s/redumbering/renumbering/
>
> Move check that SNC is not enabled into supports_mba_mbps().
>
> Add Peter's review tag.
>
> arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
> arch/x86/kernel/cpu/resctrl/core.c | 6 ++++++
> arch/x86/kernel/cpu/resctrl/monitor.c | 16 +++++++++++++---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 5 +++--
> 4 files changed, 24 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 41a23556f57d..563e6203321e 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -446,6 +446,8 @@ DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
>
> extern struct dentry *debugfs_resctrl;
>
> +extern int snc_nodes_per_l3_cache;
> +
> enum resctrl_res_level {
> RDT_RESOURCE_L3,
> RDT_RESOURCE_L2,
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 2c3975c9c20c..0e418dd14070 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -48,6 +48,12 @@ int max_name_width, max_data_width;
> */
> bool rdt_alloc_capable;
>
> +/*
> + * Number of SNC nodes that share each L3 cache. Default is 1 for
> + * systems that do not support SNC, or have SNC disabled.
> + */
> +int snc_nodes_per_l3_cache = 1;

Should this be an unsigned int?

> +
> static void
> mba_wrmsr_intel(struct rdt_ctrl_domain *d, struct msr_param *m,
> struct rdt_resource *r);

Reinette

2023-10-30 21:23:26

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v9 7/8] x86/resctrl: Sub NUMA Cluster detection and enable

Hi Tony,

On 10/20/2023 2:30 PM, Tony Luck wrote:
> There isn't a simple h/w bit that indicates whether a CPU is

"h/w" -> hardware

> running in Sub NUMA Cluster (SNC) mode. Infer the state by comparing
> the ratio of NUMA nodes to L3 cache instances.
>
> When SNC mode is detected, reconfigure the RMID counters by updating
> the MSR_RMID_SNC_CONFIG MSR on each socket as CPUs are seen.
>
> Clearing bit zero of the MSR divides the RMIDs and renumbers the ones
> on the second SNC node to start from zero. An earlier commit includes

Please drop the "earlier commit" reference.

> all the required changes in Linux to operate in this reconfigured mode.
>
> Reviewed-by: Peter Newman <[email protected]>
> Signed-off-by: Tony Luck <[email protected]>
> ---
> Changes since v6:
>
> Moved kfree(node_caches); earlier, to the earliest point where it
> is no longer needed.
>
> Added Granite Rapids to list of CPU models that support SNC mode.
>
> Added Peter's review tag
>
> arch/x86/include/asm/msr-index.h | 1 +
> arch/x86/kernel/cpu/resctrl/core.c | 92 ++++++++++++++++++++++++++++++
> 2 files changed, 93 insertions(+)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index e3fa9cecd599..4285a5ee81fe 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1109,6 +1109,7 @@
> #define MSR_IA32_QM_CTR 0xc8e
> #define MSR_IA32_PQR_ASSOC 0xc8f
> #define MSR_IA32_L3_CBM_BASE 0xc90
> +#define MSR_RMID_SNC_CONFIG 0xca0
> #define MSR_IA32_L2_CBM_BASE 0xd10
> #define MSR_IA32_MBA_THRTL_BASE 0xd50
>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 0e418dd14070..ac187eb0440f 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -16,11 +16,14 @@
>
> #define pr_fmt(fmt) "resctrl: " fmt
>
> +#include <linux/cpu.h>
> #include <linux/slab.h>
> #include <linux/err.h>
> #include <linux/cacheinfo.h>
> #include <linux/cpuhotplug.h>
> +#include <linux/mod_devicetable.h>
>
> +#include <asm/cpu_device_id.h>
> #include <asm/intel-family.h>
> #include <asm/resctrl.h>
> #include "internal.h"
> @@ -755,11 +758,42 @@ static void clear_closid_rmid(int cpu)
> wrmsr(MSR_IA32_PQR_ASSOC, 0, 0);
> }
>
> +/*
> + * The power-on reset value of MSR_RMID_SNC_CONFIG is 0x1
> + * which indicates that RMIDs are configured in legacy mode.
> + * This mode is incompatible with Linux resctrl semantics
> + * as RMIDs are partitioned between SNC nodes, which requires
> + * a user to know which RMID is allocated to a task.
> + * Clearing bit 0 reconfigures the RMID counters for use
> + * in Sub NUMA Cluster mode. This mode is better for Linux.
> + * The RMID space is divided between all SNC nodes with the
> + * RMIDs renumbered to start from zero in each node when
> + * couning operations from tasks. Code to read the counters
> + * must adjust RMID counnter numbers based on SNC node. See

counnter -> counter

> + * __rmid_read() for code that does this.
> + */
> +static void snc_remap_rmids(int cpu)
> +{
> + u64 val;
> +
> + /* Only need to enable once per package. */
> + if (cpumask_first(topology_core_cpumask(cpu)) != cpu)
> + return;
> +
> + rdmsrl(MSR_RMID_SNC_CONFIG, val);
> + val &= ~BIT_ULL(0);
> + wrmsrl(MSR_RMID_SNC_CONFIG, val);
> +}
> +
> static int resctrl_online_cpu(unsigned int cpu)
> {
> struct rdt_resource *r;
>
> mutex_lock(&rdtgroup_mutex);
> +
> + if (snc_nodes_per_l3_cache > 1)
> + snc_remap_rmids(cpu);
> +
> for_each_capable_rdt_resource(r)
> domain_add_cpu(cpu, r);
> /* The cpu is set in default rdtgroup after online. */
> @@ -1014,11 +1048,69 @@ static __init bool get_rdt_resources(void)
> return (rdt_mon_capable || rdt_alloc_capable);
> }
>
> +/* CPU models that support MSR_RMID_SNC_CONFIG */
> +static const struct x86_cpu_id snc_cpu_ids[] __initconst = {
> + X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, 0),
> + X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, 0),
> + X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, 0),
> + X86_MATCH_INTEL_FAM6_MODEL(GRANITERAPIDS_X, 0),
> + {}
> +};
> +
> +/*
> + * There isn't a simple h/w bit that indicates whether a CPU is running

h/w -> hardware

> + * in Sub NUMA Cluster (SNC) mode. Infer the state by comparing the
> + * ratio of NUMA nodes to L3 cache instances.
> + * It is not possible to accurately determine SNC state if the system is
> + * booted with a maxcpus=N parameter. That distorts the ratio of SNC nodes

Can the user be warned when SNC state cannot be determined accurately?

> + * to L3 caches. It will be OK if system is booted with hyperthreading
> + * disabled (since this doesn't affect the ratio).
> + */
> +static __init int snc_get_config(void)
> +{
> + unsigned long *node_caches;
> + int mem_only_nodes = 0;
> + int cpu, node, ret;
> + int num_l3_caches;
> +
> + if (!x86_match_cpu(snc_cpu_ids))
> + return 1;
> +
> + node_caches = bitmap_zalloc(nr_node_ids, GFP_KERNEL);
> + if (!node_caches)
> + return 1;
> +
> + cpus_read_lock();
> + for_each_node(node) {
> + cpu = cpumask_first(cpumask_of_node(node));
> + if (cpu < nr_cpu_ids)
> + set_bit(get_cpu_cacheinfo_id(cpu, 3), node_caches);
> + else
> + mem_only_nodes++;
> + }
> + cpus_read_unlock();
> +
> + num_l3_caches = bitmap_weight(node_caches, nr_node_ids);
> + kfree(node_caches);
> +
> + if (!num_l3_caches)
> + return 1;
> +
> + ret = (nr_node_ids - mem_only_nodes) / num_l3_caches;
> +
> + if (ret > 1)
> + rdt_resources_all[RDT_RESOURCE_L3].r_resctrl.mon_scope = RESCTRL_NODE;
> +
> + return ret;
> +}
> +
> static __init void rdt_init_res_defs_intel(void)
> {
> struct rdt_hw_resource *hw_res;
> struct rdt_resource *r;
>
> + snc_nodes_per_l3_cache = snc_get_config();
> +
> for_each_rdt_resource(r) {
> hw_res = resctrl_to_arch_res(r);
>

Reinette

2023-10-31 21:17:38

by Tony Luck

[permalink] [raw]
Subject: [PATCH v10 0/8] Add support for Sub-NUMA cluster (SNC) systems

The Sub-NUMA cluster feature on some Intel processors partitions the CPUs
that share an L3 cache into two or more sets. This plays havoc with the
Resource Director Technology (RDT) monitoring features. Prior to this
patch Intel has advised that SNC and RDT are incompatible.

Some of these CPU support an MSR that can partition the RMID counters in
the same way. This allows monitoring features to be used. With the caveat
that users must be aware that Linux may migrate tasks more frequently
between SNC nodes than between "regular" NUMA nodes, so reading counters
from all SNC nodes may be needed to get a complete picture of activity
for tasks.

Cache and memory bandwidth allocation features continue to operate at
the scope of the L3 cache.

Signed-off-by: Tony Luck <[email protected]>

---

Dropped Peter's "Reviewed-by" from all but parts 5 & 8 since there
have been many changes since he provided those.

Other changes since v9 (all from Reinette's comments)

global s/cpu/CPU/ in commit messages and code comments

#1
New test for invalid domain id before calling rdt_find_domain() means that
error handling in that function and at all call-sites can be simplified.
In pseudo_lock_region_init() use the new enum resctrl_scope for local variable.

#2
Include *all* common fields in the rdt_domain_hdr. Defer adding "type" until it is
used later in part #3.

#3
Fix commit to be specific the only the RDT_RESOURCE_L3 resource is going
to have different monitor and control scope.
Rename get_domain_from_cpu() -> get_ctrl_domain_from_cpu()
Rewrite comment for rdt_find_domains().
Add "type" field to rdt_domain_hdr structure.
Delete the /* RDT_RESOURCE_MBA is never mon_capable */ comment.

#4
Comment against patch 4, but now fixed in patch #2. cpu_mask
is included in common header.

#5
No comments. No changes.

#6
Fixed missing word s/monitoring on Intel/monitoring on an Intel/
Deleted "A later patch" paragraph.
Expanded description how how values are "adjusted" for mon_scale
and cache size.
Changed type of "snc_nodes_per_l3_cache" to "unsigned int".

#7
Expand h/w to hardware (commit and code comments)
Remove "earlier commit" reference
s/counnter/counter/
Check for offline CPUs and warn user SNC detection may be broken.

#8
No comments. No changes.

Tony Luck (8):
x86/resctrl: Prepare for new domain scope
x86/resctrl: Prepare to split rdt_domain structure
x86/resctrl: Prepare for different scope for control/monitor
operations
x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
x86/resctrl: Add node-scope to the options for feature scope
x86/resctrl: Introduce snc_nodes_per_l3_cache
x86/resctrl: Sub NUMA Cluster detection and enable
x86/resctrl: Update documentation with Sub-NUMA cluster changes

Documentation/arch/x86/resctrl.rst | 23 +-
include/linux/resctrl.h | 87 +++--
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 66 ++--
arch/x86/kernel/cpu/resctrl/core.c | 411 +++++++++++++++++-----
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 58 +--
arch/x86/kernel/cpu/resctrl/monitor.c | 68 ++--
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 26 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 149 ++++----
9 files changed, 607 insertions(+), 282 deletions(-)


base-commit: 5a6a09e97199d6600d31383055f9d43fbbcbe86f
--
2.41.0

2023-10-31 21:17:48

by Tony Luck

[permalink] [raw]
Subject: [PATCH v10 3/8] x86/resctrl: Prepare for different scope for control/monitor operations

Resctrl assumes that control and monitor operations on a resource are
performed at the same scope.

Prepare for systems that use different scope (specifically Intel needs
to split the RDT_RESOURCE_L3 resource to use L3 scope for cache control
and NODE scope for cache occupancy and memory bandwidth monitoring).

Create separate domain lists for control and monitor operations.

Note that errors during initialization of either control or monitor
functions on a domain would previously result in that domain being
excluded from both control and monitor operations. Now the domains are
allocated independently it is no longer required to disable both control
and monitor operations if either fail.

Signed-off-by: Tony Luck <[email protected]>
---
Changes since v9
Fix commit to be specific the only the RDT_RESOURCE_L3 resource is going
to have different monitor and control scope.
Rename get_domain_from_cpu() -> get_ctrl_domain_from_cpu()
Rewrite comment for rdt_find_domains().
Add "type" field to rdt_domain_hdr structure.
Delete the /* RDT_RESOURCE_MBA is never mon_capable */ comment.

include/linux/resctrl.h | 25 ++-
arch/x86/kernel/cpu/resctrl/internal.h | 6 +-
arch/x86/kernel/cpu/resctrl/core.c | 206 ++++++++++++++++------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 12 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 4 +-
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 4 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 55 +++---
7 files changed, 218 insertions(+), 94 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index c4067150a6b7..35e700edc6e6 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -52,15 +52,22 @@ struct resctrl_staged_config {
bool have_new_ctrl;
};

+enum resctrl_domain_type {
+ RESCTRL_CTRL_DOMAIN,
+ RESCTRL_MON_DOMAIN,
+};
+
/**
* struct rdt_domain_hdr - common header for different domain types
* @list: all instances of this resource
* @id: unique id for this instance
+ * @type: type of this instance
* @cpu_mask: which CPUs share this resource
*/
struct rdt_domain_hdr {
struct list_head list;
int id;
+ enum resctrl_domain_type type;
struct cpumask cpu_mask;
};

@@ -163,10 +170,12 @@ enum resctrl_scope {
* @alloc_capable: Is allocation available on this machine
* @mon_capable: Is monitor feature available on this machine
* @num_rmid: Number of RMIDs available
- * @scope: Scope of this resource
+ * @ctrl_scope: Scope of this resource for control functions
+ * @mon_scope: Scope of this resource for monitor functions
* @cache: Cache allocation related data
* @membw: If the component has bandwidth controls, their properties.
- * @domains: All domains for this resource
+ * @ctrl_domains: Control domains for this resource
+ * @mon_domains: Monitor domains for this resource
* @name: Name to use in "schemata" file.
* @data_width: Character width of data when displaying
* @default_ctrl: Specifies default cache cbm or memory B/W percent.
@@ -181,10 +190,12 @@ struct rdt_resource {
bool alloc_capable;
bool mon_capable;
int num_rmid;
- enum resctrl_scope scope;
+ enum resctrl_scope ctrl_scope;
+ enum resctrl_scope mon_scope;
struct resctrl_cache cache;
struct resctrl_membw membw;
- struct list_head domains;
+ struct list_head ctrl_domains;
+ struct list_head mon_domains;
char *name;
int data_width;
u32 default_ctrl;
@@ -230,8 +241,10 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,

u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
-int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
-void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
+int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d);

/**
* resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index a4f1aa15f0a2..24bf9d7989a9 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -520,8 +520,8 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn);
int rdtgroup_kn_mode_restrict(struct rdtgroup *r, const char *name);
int rdtgroup_kn_mode_restore(struct rdtgroup *r, const char *name,
umode_t mask);
-struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
- struct list_head **pos);
+struct rdt_domain_hdr *rdt_find_domain(struct list_head *h, int id,
+ struct list_head **pos);
ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off);
int rdtgroup_schemata_show(struct kernfs_open_file *of,
@@ -540,7 +540,7 @@ int rdt_pseudo_lock_init(void);
void rdt_pseudo_lock_release(void);
int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp);
void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp);
-struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
+struct rdt_domain *get_ctrl_domain_from_cpu(int cpu, struct rdt_resource *r);
int closids_supported(void);
void closid_free(int closid);
int alloc_rmid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index c26ecb2e415f..8dc2cb49358e 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -57,7 +57,8 @@ static void
mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
struct rdt_resource *r);

-#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.domains)
+#define ctrl_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.ctrl_domains)
+#define mon_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.mon_domains)

struct rdt_hw_resource rdt_resources_all[] = {
[RDT_RESOURCE_L3] =
@@ -65,8 +66,10 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_L3,
.name = "L3",
- .scope = RESCTRL_L3_CACHE,
- .domains = domain_init(RDT_RESOURCE_L3),
+ .ctrl_scope = RESCTRL_L3_CACHE,
+ .mon_scope = RESCTRL_L3_CACHE,
+ .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_L3),
+ .mon_domains = mon_domain_init(RDT_RESOURCE_L3),
.parse_ctrlval = parse_cbm,
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
@@ -79,8 +82,8 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_L2,
.name = "L2",
- .scope = RESCTRL_L2_CACHE,
- .domains = domain_init(RDT_RESOURCE_L2),
+ .ctrl_scope = RESCTRL_L2_CACHE,
+ .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_L2),
.parse_ctrlval = parse_cbm,
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
@@ -93,8 +96,8 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_MBA,
.name = "MB",
- .scope = RESCTRL_L3_CACHE,
- .domains = domain_init(RDT_RESOURCE_MBA),
+ .ctrl_scope = RESCTRL_L3_CACHE,
+ .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_MBA),
.parse_ctrlval = parse_bw,
.format_str = "%d=%*u",
.fflags = RFTYPE_RES_MB,
@@ -105,8 +108,8 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_SMBA,
.name = "SMBA",
- .scope = RESCTRL_L3_CACHE,
- .domains = domain_init(RDT_RESOURCE_SMBA),
+ .ctrl_scope = RESCTRL_L3_CACHE,
+ .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_SMBA),
.parse_ctrlval = parse_bw,
.format_str = "%d=%*u",
.fflags = RFTYPE_RES_MB,
@@ -352,11 +355,11 @@ cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
}

-struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
+struct rdt_domain *get_ctrl_domain_from_cpu(int cpu, struct rdt_resource *r)
{
struct rdt_domain *d;

- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
/* Find the domain that contains this CPU */
if (cpumask_test_cpu(cpu, &d->hdr.cpu_mask))
return d;
@@ -378,7 +381,7 @@ void rdt_ctrl_update(void *arg)
int cpu = smp_processor_id();
struct rdt_domain *d;

- d = get_domain_from_cpu(cpu, r);
+ d = get_ctrl_domain_from_cpu(cpu, r);
if (d) {
hw_res->msr_update(d, m, r);
return;
@@ -388,26 +391,26 @@ void rdt_ctrl_update(void *arg)
}

/*
- * rdt_find_domain - Find a domain in a resource that matches input resource id
+ * rdt_find_domain - Search for a domain id in a resource domain list.
*
- * Search resource r's domain list to find the resource id. If the resource
- * id is found in a domain, return the domain. Otherwise, if requested by
- * caller, return the first domain whose id is bigger than the input id.
+ * Search the list to find the resource id. If the domain id is found
+ * in a domain, return the domain. Otherwise, if requested by caller,
+ * return the first domain whose id is bigger than the input id.
* The domain list is sorted by id in ascending order.
*/
-struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
- struct list_head **pos)
+struct rdt_domain_hdr *rdt_find_domain(struct list_head *h, int id,
+ struct list_head **pos)
{
- struct rdt_domain *d;
+ struct rdt_domain_hdr *d;
struct list_head *l;

- list_for_each(l, &r->domains) {
- d = list_entry(l, struct rdt_domain, hdr.list);
+ list_for_each(l, h) {
+ d = list_entry(l, struct rdt_domain_hdr, list);
/* When id is found, return its domain. */
- if (id == d->hdr.id)
+ if (id == d->id)
return d;
/* Stop searching when finding id's position in sorted list. */
- if (id < d->hdr.id)
+ if (id < d->id)
break;
}

@@ -501,35 +504,29 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
return -EINVAL;
}

-/*
- * domain_add_cpu - Add a cpu to a resource's domain list.
- *
- * If an existing domain in the resource r's domain list matches the cpu's
- * resource id, add the cpu in the domain.
- *
- * Otherwise, a new domain is allocated and inserted into the right position
- * in the domain list sorted by id in ascending order.
- *
- * The order in the domain list is visible to users when we print entries
- * in the schemata file and schemata input is validated to have the same order
- * as this list.
- */
-static void domain_add_cpu(int cpu, struct rdt_resource *r)
+static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
{
- int id = get_domain_id_from_scope(cpu, r->scope);
+ int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
struct list_head *add_pos = NULL;
struct rdt_hw_domain *hw_dom;
+ struct rdt_domain_hdr *hdr;
struct rdt_domain *d;
int err;

if (id < 0) {
- pr_warn_once("Can't find domain id for CPU:%d scope:%d for resource %s\n",
- cpu, r->scope, r->name);
+ pr_warn_once("Can't find control domain id for CPU:%d scope:%d for resource %s\n",
+ cpu, r->ctrl_scope, r->name);
return;
}
- d = rdt_find_domain(r, id, &add_pos);

- if (d) {
+ hdr = rdt_find_domain(&r->ctrl_domains, id, &add_pos);
+
+ if (hdr) {
+ if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
+ return;
+
+ d = container_of(hdr, struct rdt_domain, hdr);
+
cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
if (r->cache.arch_has_per_cpu_cfg)
rdt_domain_reconfigure_cdp(r);
@@ -542,48 +539,115 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)

d = &hw_dom->d_resctrl;
d->hdr.id = id;
+ d->hdr.type = RESCTRL_CTRL_DOMAIN;
cpumask_set_cpu(cpu, &d->hdr.cpu_mask);

rdt_domain_reconfigure_cdp(r);

- if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
+ if (domain_setup_ctrlval(r, d)) {
domain_free(hw_dom);
return;
}

- if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+ list_add_tail(&d->hdr.list, add_pos);
+
+ err = resctrl_online_ctrl_domain(r, d);
+ if (err) {
+ list_del(&d->hdr.list);
+ domain_free(hw_dom);
+ }
+}
+
+static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
+{
+ int id = get_domain_id_from_scope(cpu, r->mon_scope);
+ struct list_head *add_pos = NULL;
+ struct rdt_hw_domain *hw_dom;
+ struct rdt_domain_hdr *hdr;
+ struct rdt_domain *d;
+ int err;
+
+ if (id < 0) {
+ pr_warn_once("Can't find monitor domain id for CPU:%d scope:%d for resource %s\n",
+ cpu, r->mon_scope, r->name);
+ return;
+ }
+
+ hdr = rdt_find_domain(&r->mon_domains, id, &add_pos);
+ if (IS_ERR(hdr)) {
+ pr_warn("Couldn't find monitor scope id=%d for CPU %d\n", id, cpu);
+ return;
+ }
+
+ if (hdr) {
+ if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
+ return;
+
+ d = container_of(hdr, struct rdt_domain, hdr);
+
+ cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
+ return;
+ }
+
+ hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu));
+ if (!hw_dom)
+ return;
+
+ d = &hw_dom->d_resctrl;
+ d->hdr.id = id;
+ d->hdr.type = RESCTRL_MON_DOMAIN;
+ cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
+
+ if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
domain_free(hw_dom);
return;
}

list_add_tail(&d->hdr.list, add_pos);

- err = resctrl_online_domain(r, d);
+ err = resctrl_online_mon_domain(r, d);
if (err) {
list_del(&d->hdr.list);
domain_free(hw_dom);
}
}

-static void domain_remove_cpu(int cpu, struct rdt_resource *r)
+/*
+ * domain_add_cpu - Add a cpu to either/both resource's domain lists.
+ */
+static void domain_add_cpu(int cpu, struct rdt_resource *r)
+{
+ if (r->alloc_capable)
+ domain_add_cpu_ctrl(cpu, r);
+ if (r->mon_capable)
+ domain_add_cpu_mon(cpu, r);
+}
+
+static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
{
- int id = get_domain_id_from_scope(cpu, r->scope);
+ int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
struct rdt_hw_domain *hw_dom;
+ struct rdt_domain_hdr *hdr;
struct rdt_domain *d;

if (id < 0)
return;

- d = rdt_find_domain(r, id, NULL);
- if (!d) {
- pr_warn("Couldn't find cache id for CPU %d\n", cpu);
+ hdr = rdt_find_domain(&r->ctrl_domains, id, NULL);
+ if (IS_ERR_OR_NULL(hdr)) {
+ pr_warn("Couldn't find control scope id=%d for CPU %d\n", id, cpu);
return;
}
+
+ if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
+ return;
+
+ d = container_of(hdr, struct rdt_domain, hdr);
hw_dom = resctrl_to_arch_dom(d);

cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
if (cpumask_empty(&d->hdr.cpu_mask)) {
- resctrl_offline_domain(r, d);
+ resctrl_offline_ctrl_domain(r, d);
list_del(&d->hdr.list);

/*
@@ -596,6 +660,38 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)

return;
}
+}
+
+static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
+{
+ int id = get_domain_id_from_scope(cpu, r->mon_scope);
+ struct rdt_hw_domain *hw_dom;
+ struct rdt_domain_hdr *hdr;
+ struct rdt_domain *d;
+
+ if (id < 0)
+ return;
+
+ hdr = rdt_find_domain(&r->mon_domains, id, NULL);
+ if (IS_ERR_OR_NULL(hdr)) {
+ pr_warn("Couldn't find scope id=%d for CPU %d\n", id, cpu);
+ return;
+ }
+
+ if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
+ return;
+
+ d = container_of(hdr, struct rdt_domain, hdr);
+ hw_dom = resctrl_to_arch_dom(d);
+
+ cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
+ if (cpumask_empty(&d->hdr.cpu_mask)) {
+ resctrl_offline_mon_domain(r, d);
+ list_del(&d->hdr.list);
+ domain_free(hw_dom);
+
+ return;
+ }

if (r == &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl) {
if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {
@@ -610,6 +706,14 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
}
}

+static void domain_remove_cpu(int cpu, struct rdt_resource *r)
+{
+ if (r->alloc_capable)
+ domain_remove_cpu_ctrl(cpu, r);
+ if (r->mon_capable)
+ domain_remove_cpu_mon(cpu, r);
+}
+
static void clear_closid_rmid(int cpu)
{
struct resctrl_pqr_state *state = this_cpu_ptr(&pqr_state);
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 23f8258d36a8..0b4136c42762 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -226,7 +226,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
return -EINVAL;
}
dom = strim(dom);
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
if (d->hdr.id == dom_id) {
data.buf = dom;
data.rdtgrp = rdtgrp;
@@ -318,7 +318,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
return -ENOMEM;

msr_param.res = NULL;
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
hw_dom = resctrl_to_arch_dom(d);
for (t = 0; t < CDP_NUM_TYPES; t++) {
cfg = &hw_dom->d_resctrl.staged_config[t];
@@ -466,7 +466,7 @@ static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int clo
u32 ctrl_val;

seq_printf(s, "%*s:", max_name_width, schema->name);
- list_for_each_entry(dom, &r->domains, hdr.list) {
+ list_for_each_entry(dom, &r->ctrl_domains, hdr.list) {
if (sep)
seq_puts(s, ";");

@@ -542,6 +542,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
int rdtgroup_mondata_show(struct seq_file *m, void *arg)
{
struct kernfs_open_file *of = m->private;
+ struct rdt_domain_hdr *hdr;
u32 resid, evtid, domid;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
@@ -562,11 +563,12 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
evtid = md.u.evtid;

r = &rdt_resources_all[resid].r_resctrl;
- d = rdt_find_domain(r, domid, NULL);
- if (!d) {
+ hdr = rdt_find_domain(&r->mon_domains, domid, NULL);
+ if (!hdr || WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN)) {
ret = -ENOENT;
goto out;
}
+ d = container_of(hdr, struct rdt_domain, hdr);

mon_event_read(&rr, r, d, rdtgrp, evtid, false);

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index dd0ea1bc0092..ec5ad926c5dc 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -340,7 +340,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)

entry->busy = 0;
cpu = get_cpu();
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
if (cpumask_test_cpu(cpu, &d->hdr.cpu_mask)) {
err = resctrl_arch_rmid_read(r, d, entry->rmid,
QOS_L3_OCCUP_EVENT_ID,
@@ -535,7 +535,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
rmid = rgrp->mon.rmid;
pmbm_data = &dom_mbm->mbm_local[rmid];

- dom_mba = get_domain_from_cpu(smp_processor_id(), r_mba);
+ dom_mba = get_ctrl_domain_from_cpu(smp_processor_id(), r_mba);
if (!dom_mba) {
pr_warn_once("Failure to get domain for MBA update\n");
return;
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index fcbd99e2eb66..ed6d59af1cef 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -292,7 +292,7 @@ static void pseudo_lock_region_clear(struct pseudo_lock_region *plr)
*/
static int pseudo_lock_region_init(struct pseudo_lock_region *plr)
{
- enum resctrl_scope scope = plr->s->res->scope;
+ enum resctrl_scope scope = plr->s->res->ctrl_scope;
struct cpu_cacheinfo *ci;
int ret;
int i;
@@ -856,7 +856,7 @@ bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
* associated with them.
*/
for_each_alloc_capable_rdt_resource(r) {
- list_for_each_entry(d_i, &r->domains, hdr.list) {
+ list_for_each_entry(d_i, &r->ctrl_domains, hdr.list) {
if (d_i->plr)
cpumask_or(cpu_with_psl, cpu_with_psl,
&d_i->hdr.cpu_mask);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 04d32602ac33..760013ed1bff 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -91,7 +91,7 @@ void rdt_staged_configs_clear(void)
lockdep_assert_held(&rdtgroup_mutex);

for_each_alloc_capable_rdt_resource(r) {
- list_for_each_entry(dom, &r->domains, hdr.list)
+ list_for_each_entry(dom, &r->ctrl_domains, hdr.list)
memset(dom->staged_config, 0, sizeof(dom->staged_config));
}
}
@@ -984,7 +984,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,

mutex_lock(&rdtgroup_mutex);
hw_shareable = r->cache.shareable_bits;
- list_for_each_entry(dom, &r->domains, hdr.list) {
+ list_for_each_entry(dom, &r->ctrl_domains, hdr.list) {
if (sep)
seq_putc(seq, ';');
sw_shareable = 0;
@@ -1302,7 +1302,7 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
if (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)
continue;
has_cache = true;
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
ctrl = resctrl_arch_get_config(r, d, closid,
s->conf_type);
if (rdtgroup_cbm_overlaps(s, d, ctrl, closid, false)) {
@@ -1413,13 +1413,13 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
unsigned int size = 0;
int num_b, i;

- if (WARN_ON_ONCE(r->scope != RESCTRL_L2_CACHE && r->scope != RESCTRL_L3_CACHE))
+ if (WARN_ON_ONCE(r->ctrl_scope != RESCTRL_L2_CACHE && r->ctrl_scope != RESCTRL_L3_CACHE))
return size;

num_b = bitmap_weight(&cbm, r->cache.cbm_len);
ci = get_cpu_cacheinfo(cpumask_any(&d->hdr.cpu_mask));
for (i = 0; i < ci->num_leaves; i++) {
- if (ci->info_list[i].level == r->scope) {
+ if (ci->info_list[i].level == r->ctrl_scope) {
size = ci->info_list[i].size / r->cache.cbm_len * num_b;
break;
}
@@ -1477,7 +1477,7 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
type = schema->conf_type;
sep = false;
seq_printf(s, "%*s:", max_name_width, schema->name);
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
if (sep)
seq_putc(s, ';');
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
@@ -1566,7 +1566,7 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid

mutex_lock(&rdtgroup_mutex);

- list_for_each_entry(dom, &r->domains, hdr.list) {
+ list_for_each_entry(dom, &r->mon_domains, hdr.list) {
if (sep)
seq_puts(s, ";");

@@ -1689,7 +1689,7 @@ static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
return -EINVAL;
}

- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
if (d->hdr.id == dom_id) {
ret = mbm_config_write_domain(r, d, evtid, val);
if (ret)
@@ -2232,7 +2232,7 @@ static int set_cache_qos_cfg(int level, bool enable)
return -ENOMEM;

r_l = &rdt_resources_all[level].r_resctrl;
- list_for_each_entry(d, &r_l->domains, hdr.list) {
+ list_for_each_entry(d, &r_l->ctrl_domains, hdr.list) {
if (r_l->cache.arch_has_per_cpu_cfg)
/* Pick all the CPUs in the domain instance */
for_each_cpu(cpu, &d->hdr.cpu_mask)
@@ -2317,7 +2317,7 @@ static int set_mba_sc(bool mba_sc)

r->membw.mba_sc = mba_sc;

- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
for (i = 0; i < num_closid; i++)
d->mbps_val[i] = MBA_MAX_MBPS;
}
@@ -2653,7 +2653,7 @@ static int rdt_get_tree(struct fs_context *fc)

if (is_mbm_enabled()) {
r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- list_for_each_entry(dom, &r->domains, hdr.list)
+ list_for_each_entry(dom, &r->mon_domains, hdr.list)
mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL);
}

@@ -2777,10 +2777,10 @@ static int reset_all_ctrls(struct rdt_resource *r)

/*
* Disable resource control for this resource by setting all
- * CBMs in all domains to the maximum mask value. Pick one CPU
+ * CBMs in all ctrl_domains to the maximum mask value. Pick one CPU
* from each domain to update the MSRs below.
*/
- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
hw_dom = resctrl_to_arch_dom(d);
cpumask_set_cpu(cpumask_any(&d->hdr.cpu_mask), cpu_mask);

@@ -3050,7 +3050,7 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn,
struct rdt_domain *dom;
int ret;

- list_for_each_entry(dom, &r->domains, hdr.list) {
+ list_for_each_entry(dom, &r->mon_domains, hdr.list) {
ret = mkdir_mondata_subdir(parent_kn, dom, r, prgrp);
if (ret)
return ret;
@@ -3232,7 +3232,7 @@ static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid)
struct rdt_domain *d;
int ret;

- list_for_each_entry(d, &s->res->domains, hdr.list) {
+ list_for_each_entry(d, &s->res->ctrl_domains, hdr.list) {
ret = __init_one_rdt_domain(d, s, closid);
if (ret < 0)
return ret;
@@ -3247,7 +3247,7 @@ static void rdtgroup_init_mba(struct rdt_resource *r, u32 closid)
struct resctrl_staged_config *cfg;
struct rdt_domain *d;

- list_for_each_entry(d, &r->domains, hdr.list) {
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
if (is_mba_sc(r)) {
d->mbps_val[closid] = MBA_MAX_MBPS;
continue;
@@ -3849,15 +3849,17 @@ static void domain_destroy_mon_state(struct rdt_domain *d)
kfree(d->mbm_local);
}

-void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
+void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
{
lockdep_assert_held(&rdtgroup_mutex);

if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
mba_sc_domain_destroy(r, d);
+}

- if (!r->mon_capable)
- return;
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+ lockdep_assert_held(&rdtgroup_mutex);

/*
* If resctrl is mounted, remove all the
@@ -3914,18 +3916,21 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

-int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
+int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
{
- int err;
-
lockdep_assert_held(&rdtgroup_mutex);

if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
- /* RDT_RESOURCE_MBA is never mon_capable */
return mba_sc_domain_allocate(r, d);

- if (!r->mon_capable)
- return 0;
+ return 0;
+}
+
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+ int err;
+
+ lockdep_assert_held(&rdtgroup_mutex);

err = domain_setup_mon_state(r, d);
if (err)
--
2.41.0

2023-10-31 21:17:51

by Tony Luck

[permalink] [raw]
Subject: [PATCH v10 4/8] x86/resctrl: Split the rdt_domain and rdt_hw_domain structures

The same rdt_domain structure is used for both control and monitor
functions. But this results in wasted memory as some of the fields are
only used by control functions, while most are only used for monitor
functions.

Split into separate rdt_ctrl_domain and rdt_mon_domain structures with
just the fields required for control and monitoring respectively.

Similar split of the rdt_hw_domain structure into rdt_hw_ctrl_domain
and rdt_hw_mon_domain.

Signed-off-by: Tony Luck <[email protected]>
---
Changes since v9
Comment against patch 4, but now fixed in patch #2. cpu_mask
is included in common header.

include/linux/resctrl.h | 50 +++++++------
arch/x86/kernel/cpu/resctrl/internal.h | 60 ++++++++++------
arch/x86/kernel/cpu/resctrl/core.c | 87 ++++++++++++-----------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 32 ++++-----
arch/x86/kernel/cpu/resctrl/monitor.c | 40 +++++------
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 6 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 62 ++++++++--------
7 files changed, 184 insertions(+), 153 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 35e700edc6e6..36503e8870cd 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -72,7 +72,25 @@ struct rdt_domain_hdr {
};

/**
- * struct rdt_domain - group of CPUs sharing a resctrl resource
+ * struct rdt_ctrl_domain - group of CPUs sharing a resctrl control resource
+ * @hdr: common header for different domain types
+ * @cpu_mask: which CPUs share this resource
+ * @plr: pseudo-locked region (if any) associated with domain
+ * @staged_config: parsed configuration to be applied
+ * @mbps_val: When mba_sc is enabled, this holds the array of user
+ * specified control values for mba_sc in MBps, indexed
+ * by closid
+ */
+struct rdt_ctrl_domain {
+ struct rdt_domain_hdr hdr;
+ struct cpumask cpu_mask;
+ struct pseudo_lock_region *plr;
+ struct resctrl_staged_config staged_config[CDP_NUM_TYPES];
+ u32 *mbps_val;
+};
+
+/**
+ * struct rdt_mon_domain - group of CPUs sharing a resctrl control resource
* @hdr: common header for different domain types
* @rmid_busy_llc: bitmap of which limbo RMIDs are above threshold
* @mbm_total: saved state for MBM total bandwidth
@@ -81,13 +99,8 @@ struct rdt_domain_hdr {
* @cqm_limbo: worker to periodically read CQM h/w counters
* @mbm_work_cpu: worker CPU for MBM h/w counters
* @cqm_work_cpu: worker CPU for CQM h/w counters
- * @plr: pseudo-locked region (if any) associated with domain
- * @staged_config: parsed configuration to be applied
- * @mbps_val: When mba_sc is enabled, this holds the array of user
- * specified control values for mba_sc in MBps, indexed
- * by closid
*/
-struct rdt_domain {
+struct rdt_mon_domain {
struct rdt_domain_hdr hdr;
unsigned long *rmid_busy_llc;
struct mbm_state *mbm_total;
@@ -96,9 +109,6 @@ struct rdt_domain {
struct delayed_work cqm_limbo;
int mbm_work_cpu;
int cqm_work_cpu;
- struct pseudo_lock_region *plr;
- struct resctrl_staged_config staged_config[CDP_NUM_TYPES];
- u32 *mbps_val;
};

/**
@@ -202,7 +212,7 @@ struct rdt_resource {
const char *format_str;
int (*parse_ctrlval)(struct rdt_parse_data *data,
struct resctrl_schema *s,
- struct rdt_domain *d);
+ struct rdt_ctrl_domain *d);
struct list_head evt_list;
unsigned long fflags;
bool cdp_capable;
@@ -236,15 +246,15 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
* Update the ctrl_val and apply this config right now.
* Must be called on one of the domain's CPUs.
*/
-int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type t, u32 cfg_val);

-u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
+u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type type);
-int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
-int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
-void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
-void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
+int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d);
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d);
+void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d);
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d);

/**
* resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
@@ -260,7 +270,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
* Return:
* 0 on success, or -EIO, -EINVAL etc on error.
*/
-int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 rmid, enum resctrl_event_id eventid, u64 *val);

/**
@@ -273,7 +283,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
*
* This can be called from any CPU.
*/
-void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 rmid, enum resctrl_event_id eventid);

/**
@@ -285,7 +295,7 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
*
* This can be called from any CPU.
*/
-void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);

extern unsigned int resctrl_rmid_realloc_threshold;
extern unsigned int resctrl_rmid_realloc_limit;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 24bf9d7989a9..ce3a70657842 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -107,7 +107,7 @@ union mon_data_bits {
struct rmid_read {
struct rdtgroup *rgrp;
struct rdt_resource *r;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;
enum resctrl_event_id evtid;
bool first;
int err;
@@ -192,7 +192,7 @@ struct mongroup {
*/
struct pseudo_lock_region {
struct resctrl_schema *s;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
u32 cbm;
wait_queue_head_t lock_thread_wq;
int thread_done;
@@ -319,25 +319,41 @@ struct arch_mbm_state {
};

/**
- * struct rdt_hw_domain - Arch private attributes of a set of CPUs that share
- * a resource
+ * struct rdt_hw_ctrl_domain - Arch private attributes of a set of CPUs that share
+ * a resource for a control function
* @d_resctrl: Properties exposed to the resctrl file system
* @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID)
+ *
+ * Members of this structure are accessed via helpers that provide abstraction.
+ */
+struct rdt_hw_ctrl_domain {
+ struct rdt_ctrl_domain d_resctrl;
+ u32 *ctrl_val;
+};
+
+/**
+ * struct rdt_hw_mon_domain - Arch private attributes of a set of CPUs that share
+ * a resource for a monitor function
+ * @d_resctrl: Properties exposed to the resctrl file system
* @arch_mbm_total: arch private state for MBM total bandwidth
* @arch_mbm_local: arch private state for MBM local bandwidth
*
* Members of this structure are accessed via helpers that provide abstraction.
*/
-struct rdt_hw_domain {
- struct rdt_domain d_resctrl;
- u32 *ctrl_val;
+struct rdt_hw_mon_domain {
+ struct rdt_mon_domain d_resctrl;
struct arch_mbm_state *arch_mbm_total;
struct arch_mbm_state *arch_mbm_local;
};

-static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
+static inline struct rdt_hw_ctrl_domain *resctrl_to_arch_ctrl_dom(struct rdt_ctrl_domain *r)
+{
+ return container_of(r, struct rdt_hw_ctrl_domain, d_resctrl);
+}
+
+static inline struct rdt_hw_mon_domain *resctrl_to_arch_mon_dom(struct rdt_mon_domain *r)
{
- return container_of(r, struct rdt_hw_domain, d_resctrl);
+ return container_of(r, struct rdt_hw_mon_domain, d_resctrl);
}

/**
@@ -405,7 +421,7 @@ struct rdt_hw_resource {
struct rdt_resource r_resctrl;
u32 num_closid;
unsigned int msr_base;
- void (*msr_update) (struct rdt_domain *d, struct msr_param *m,
+ void (*msr_update) (struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r);
unsigned int mon_scale;
unsigned int mbm_width;
@@ -418,9 +434,9 @@ static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r
}

int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,
- struct rdt_domain *d);
+ struct rdt_ctrl_domain *d);
int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
- struct rdt_domain *d);
+ struct rdt_ctrl_domain *d);

extern struct mutex rdtgroup_mutex;

@@ -526,21 +542,21 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off);
int rdtgroup_schemata_show(struct kernfs_open_file *of,
struct seq_file *s, void *v);
-bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_domain *d,
+bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_ctrl_domain *d,
unsigned long cbm, int closid, bool exclusive);
-unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, struct rdt_domain *d,
+unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, struct rdt_ctrl_domain *d,
unsigned long cbm);
enum rdtgrp_mode rdtgroup_mode_by_closid(int closid);
int rdtgroup_tasks_assigned(struct rdtgroup *r);
int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp);
-bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, unsigned long cbm);
-bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d);
+bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_ctrl_domain *d, unsigned long cbm);
+bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_ctrl_domain *d);
int rdt_pseudo_lock_init(void);
void rdt_pseudo_lock_release(void);
int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp);
void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp);
-struct rdt_domain *get_ctrl_domain_from_cpu(int cpu, struct rdt_resource *r);
+struct rdt_ctrl_domain *get_ctrl_domain_from_cpu(int cpu, struct rdt_resource *r);
int closids_supported(void);
void closid_free(int closid);
int alloc_rmid(void);
@@ -550,17 +566,17 @@ bool __init rdt_cpu_has(int flag);
void mon_event_count(void *info);
int rdtgroup_mondata_show(struct seq_file *m, void *arg);
void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
- struct rdt_domain *d, struct rdtgroup *rdtgrp,
+ struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
int evtid, int first);
-void mbm_setup_overflow_handler(struct rdt_domain *dom,
+void mbm_setup_overflow_handler(struct rdt_mon_domain *dom,
unsigned long delay_ms);
void mbm_handle_overflow(struct work_struct *work);
void __init intel_rdt_mbm_apply_quirk(void);
bool is_mba_sc(struct rdt_resource *r);
-void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
+void cqm_setup_limbo_handler(struct rdt_mon_domain *dom, unsigned long delay_ms);
void cqm_handle_limbo(struct work_struct *work);
-bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
-void __check_limbo(struct rdt_domain *d, bool force_free);
+bool has_busy_rmid(struct rdt_resource *r, struct rdt_mon_domain *d);
+void __check_limbo(struct rdt_mon_domain *d, bool force_free);
void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
void __init thread_throttle_mode_init(void);
void __init mbm_config_rftype_init(const char *config);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 8dc2cb49358e..6bae0a658b94 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -49,12 +49,12 @@ int max_name_width, max_data_width;
bool rdt_alloc_capable;

static void
-mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
+mba_wrmsr_intel(struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r);
static void
-cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r);
+cat_wrmsr(struct rdt_ctrl_domain *d, struct msr_param *m, struct rdt_resource *r);
static void
-mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
+mba_wrmsr_amd(struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r);

#define ctrl_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.ctrl_domains)
@@ -307,11 +307,11 @@ static void rdt_get_cdp_l2_config(void)
}

static void
-mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
+mba_wrmsr_amd(struct rdt_ctrl_domain *d, struct msr_param *m, struct rdt_resource *r)
{
- unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ unsigned int i;

for (i = m->low; i < m->high; i++)
wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
@@ -332,12 +332,12 @@ static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
}

static void
-mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
+mba_wrmsr_intel(struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r)
{
- unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ unsigned int i;

/* Write the delay values for mba. */
for (i = m->low; i < m->high; i++)
@@ -345,19 +345,19 @@ mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
}

static void
-cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
+cat_wrmsr(struct rdt_ctrl_domain *d, struct msr_param *m, struct rdt_resource *r)
{
- unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ unsigned int i;

for (i = m->low; i < m->high; i++)
wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
}

-struct rdt_domain *get_ctrl_domain_from_cpu(int cpu, struct rdt_resource *r)
+struct rdt_ctrl_domain *get_ctrl_domain_from_cpu(int cpu, struct rdt_resource *r)
{
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;

list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
/* Find the domain that contains this CPU */
@@ -379,7 +379,7 @@ void rdt_ctrl_update(void *arg)
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
struct rdt_resource *r = m->res;
int cpu = smp_processor_id();
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;

d = get_ctrl_domain_from_cpu(cpu, r);
if (d) {
@@ -434,18 +434,23 @@ static void setup_default_ctrlval(struct rdt_resource *r, u32 *dc)
*dc = r->default_ctrl;
}

-static void domain_free(struct rdt_hw_domain *hw_dom)
+static void ctrl_domain_free(struct rdt_hw_ctrl_domain *hw_dom)
+{
+ kfree(hw_dom->ctrl_val);
+ kfree(hw_dom);
+}
+
+static void mon_domain_free(struct rdt_hw_mon_domain *hw_dom)
{
kfree(hw_dom->arch_mbm_total);
kfree(hw_dom->arch_mbm_local);
- kfree(hw_dom->ctrl_val);
kfree(hw_dom);
}

-static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
+static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *d)
{
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct msr_param m;
u32 *dc;

@@ -468,7 +473,7 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
* @num_rmid: The size of the MBM counter array
* @hw_dom: The domain that owns the allocated arrays
*/
-static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
+static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_mon_domain *hw_dom)
{
size_t tsize;

@@ -507,10 +512,10 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
+ struct rdt_hw_ctrl_domain *hw_dom;
struct list_head *add_pos = NULL;
- struct rdt_hw_domain *hw_dom;
struct rdt_domain_hdr *hdr;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
int err;

if (id < 0) {
@@ -525,7 +530,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
return;

- d = container_of(hdr, struct rdt_domain, hdr);
+ d = container_of(hdr, struct rdt_ctrl_domain, hdr);

cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
if (r->cache.arch_has_per_cpu_cfg)
@@ -545,7 +550,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
rdt_domain_reconfigure_cdp(r);

if (domain_setup_ctrlval(r, d)) {
- domain_free(hw_dom);
+ ctrl_domain_free(hw_dom);
return;
}

@@ -554,17 +559,17 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
err = resctrl_online_ctrl_domain(r, d);
if (err) {
list_del(&d->hdr.list);
- domain_free(hw_dom);
+ ctrl_domain_free(hw_dom);
}
}

static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->mon_scope);
+ struct rdt_hw_mon_domain *hw_dom;
struct list_head *add_pos = NULL;
- struct rdt_hw_domain *hw_dom;
struct rdt_domain_hdr *hdr;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;
int err;

if (id < 0) {
@@ -583,7 +588,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
return;

- d = container_of(hdr, struct rdt_domain, hdr);
+ d = container_of(hdr, struct rdt_mon_domain, hdr);

cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
return;
@@ -599,7 +604,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
cpumask_set_cpu(cpu, &d->hdr.cpu_mask);

if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
- domain_free(hw_dom);
+ mon_domain_free(hw_dom);
return;
}

@@ -608,7 +613,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
err = resctrl_online_mon_domain(r, d);
if (err) {
list_del(&d->hdr.list);
- domain_free(hw_dom);
+ mon_domain_free(hw_dom);
}
}

@@ -626,9 +631,9 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
- struct rdt_hw_domain *hw_dom;
+ struct rdt_hw_ctrl_domain *hw_dom;
struct rdt_domain_hdr *hdr;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;

if (id < 0)
return;
@@ -642,8 +647,8 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
return;

- d = container_of(hdr, struct rdt_domain, hdr);
- hw_dom = resctrl_to_arch_dom(d);
+ d = container_of(hdr, struct rdt_ctrl_domain, hdr);
+ hw_dom = resctrl_to_arch_ctrl_dom(d);

cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
if (cpumask_empty(&d->hdr.cpu_mask)) {
@@ -651,12 +656,12 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
list_del(&d->hdr.list);

/*
- * rdt_domain "d" is going to be freed below, so clear
+ * rdt_ctrl_domain "d" is going to be freed below, so clear
* its pointer from pseudo_lock_region struct.
*/
if (d->plr)
d->plr->d = NULL;
- domain_free(hw_dom);
+ ctrl_domain_free(hw_dom);

return;
}
@@ -665,9 +670,9 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->mon_scope);
- struct rdt_hw_domain *hw_dom;
+ struct rdt_hw_mon_domain *hw_dom;
struct rdt_domain_hdr *hdr;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;

if (id < 0)
return;
@@ -681,14 +686,14 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
return;

- d = container_of(hdr, struct rdt_domain, hdr);
- hw_dom = resctrl_to_arch_dom(d);
+ d = container_of(hdr, struct rdt_mon_domain, hdr);
+ hw_dom = resctrl_to_arch_mon_dom(d);

cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
if (cpumask_empty(&d->hdr.cpu_mask)) {
resctrl_offline_mon_domain(r, d);
list_del(&d->hdr.list);
- domain_free(hw_dom);
+ mon_domain_free(hw_dom);

return;
}
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 0b4136c42762..08fc97ce4135 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -58,7 +58,7 @@ static bool bw_validate(char *buf, unsigned long *data, struct rdt_resource *r)
}

int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
- struct rdt_domain *d)
+ struct rdt_ctrl_domain *d)
{
struct resctrl_staged_config *cfg;
u32 closid = data->rdtgrp->closid;
@@ -137,7 +137,7 @@ static bool cbm_validate(char *buf, u32 *data, struct rdt_resource *r)
* resource type.
*/
int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,
- struct rdt_domain *d)
+ struct rdt_ctrl_domain *d)
{
struct rdtgroup *rdtgrp = data->rdtgrp;
struct resctrl_staged_config *cfg;
@@ -206,8 +206,8 @@ static int parse_line(char *line, struct resctrl_schema *s,
struct resctrl_staged_config *cfg;
struct rdt_resource *r = s->res;
struct rdt_parse_data data;
+ struct rdt_ctrl_domain *d;
char *dom = NULL, *id;
- struct rdt_domain *d;
unsigned long dom_id;

if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
@@ -267,11 +267,11 @@ static u32 get_config_index(u32 closid, enum resctrl_conf_type type)
}
}

-static bool apply_config(struct rdt_hw_domain *hw_dom,
+static bool apply_config(struct rdt_hw_ctrl_domain *hw_dom,
struct resctrl_staged_config *cfg, u32 idx,
cpumask_var_t cpu_mask)
{
- struct rdt_domain *dom = &hw_dom->d_resctrl;
+ struct rdt_ctrl_domain *dom = &hw_dom->d_resctrl;

if (cfg->new_ctrl != hw_dom->ctrl_val[idx]) {
cpumask_set_cpu(cpumask_any(&dom->hdr.cpu_mask), cpu_mask);
@@ -283,11 +283,11 @@ static bool apply_config(struct rdt_hw_domain *hw_dom,
return false;
}

-int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type t, u32 cfg_val)
{
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
u32 idx = get_config_index(closid, t);
struct msr_param msr_param;

@@ -307,11 +307,11 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
{
struct resctrl_staged_config *cfg;
- struct rdt_hw_domain *hw_dom;
+ struct rdt_hw_ctrl_domain *hw_dom;
struct msr_param msr_param;
+ struct rdt_ctrl_domain *d;
enum resctrl_conf_type t;
cpumask_var_t cpu_mask;
- struct rdt_domain *d;
u32 idx;

if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
@@ -319,7 +319,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)

msr_param.res = NULL;
list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
- hw_dom = resctrl_to_arch_dom(d);
+ hw_dom = resctrl_to_arch_ctrl_dom(d);
for (t = 0; t < CDP_NUM_TYPES; t++) {
cfg = &hw_dom->d_resctrl.staged_config[t];
if (!cfg->have_new_ctrl)
@@ -449,10 +449,10 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
return ret ?: nbytes;
}

-u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
+u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type type)
{
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_ctrl_domain *hw_dom = resctrl_to_arch_ctrl_dom(d);
u32 idx = get_config_index(closid, type);

return hw_dom->ctrl_val[idx];
@@ -461,7 +461,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int closid)
{
struct rdt_resource *r = schema->res;
- struct rdt_domain *dom;
+ struct rdt_ctrl_domain *dom;
bool sep = false;
u32 ctrl_val;

@@ -523,7 +523,7 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
}

void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
- struct rdt_domain *d, struct rdtgroup *rdtgrp,
+ struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
int evtid, int first)
{
/*
@@ -543,11 +543,11 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
{
struct kernfs_open_file *of = m->private;
struct rdt_domain_hdr *hdr;
+ struct rdt_mon_domain *d;
u32 resid, evtid, domid;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
union mon_data_bits md;
- struct rdt_domain *d;
struct rmid_read rr;
int ret = 0;

@@ -568,7 +568,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
ret = -ENOENT;
goto out;
}
- d = container_of(hdr, struct rdt_domain, hdr);
+ d = container_of(hdr, struct rdt_mon_domain, hdr);

mon_event_read(&rr, r, d, rdtgrp, evtid, false);

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index ec5ad926c5dc..4e145f5620b0 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -170,7 +170,7 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
return 0;
}

-static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain *hw_dom,
+static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_dom,
u32 rmid,
enum resctrl_event_id eventid)
{
@@ -189,10 +189,10 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain *hw_dom,
return NULL;
}

-void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 rmid, enum resctrl_event_id eventid)
{
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
struct arch_mbm_state *am;

am = get_arch_mbm_state(hw_dom, rmid, eventid);
@@ -208,9 +208,9 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
* Assumes that hardware counters are also reset and thus that there is
* no need to record initial non-zero counts.
*/
-void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d)
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
{
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);

if (is_mbm_total_enabled())
memset(hw_dom->arch_mbm_total, 0,
@@ -229,11 +229,11 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
return chunks >> shift;
}

-int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 rmid, enum resctrl_event_id eventid, u64 *val)
{
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct arch_mbm_state *am;
u64 msr_val, chunks;
int ret;
@@ -266,7 +266,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
* decrement the count. If the busy count gets to zero on an RMID, we
* free the RMID
*/
-void __check_limbo(struct rdt_domain *d, bool force_free)
+void __check_limbo(struct rdt_mon_domain *d, bool force_free)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
struct rmid_entry *entry;
@@ -305,7 +305,7 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
}
}

-bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d)
+bool has_busy_rmid(struct rdt_resource *r, struct rdt_mon_domain *d)
{
return find_first_bit(d->rmid_busy_llc, r->num_rmid) != r->num_rmid;
}
@@ -334,7 +334,7 @@ int alloc_rmid(void)
static void add_rmid_to_limbo(struct rmid_entry *entry)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;
int cpu, err;
u64 val = 0;

@@ -383,7 +383,7 @@ void free_rmid(u32 rmid)
list_add_tail(&entry->list, &rmid_free_lru);
}

-static struct mbm_state *get_mbm_state(struct rdt_domain *d, u32 rmid,
+static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 rmid,
enum resctrl_event_id evtid)
{
switch (evtid) {
@@ -516,13 +516,13 @@ void mon_event_count(void *info)
* throttle MSRs already have low percentage values. To avoid
* unnecessarily restricting such rdtgroups, we also increase the bandwidth.
*/
-static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
+static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
{
u32 closid, rmid, cur_msr_val, new_msr_val;
struct mbm_state *pmbm_data, *cmbm_data;
+ struct rdt_ctrl_domain *dom_mba;
u32 cur_bw, delta_bw, user_bw;
struct rdt_resource *r_mba;
- struct rdt_domain *dom_mba;
struct list_head *head;
struct rdtgroup *entry;

@@ -600,7 +600,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
}
}

-static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
+static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d, int rmid)
{
struct rmid_read rr;

@@ -640,13 +640,13 @@ void cqm_handle_limbo(struct work_struct *work)
{
unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
int cpu = smp_processor_id();
+ struct rdt_mon_domain *d;
struct rdt_resource *r;
- struct rdt_domain *d;

mutex_lock(&rdtgroup_mutex);

r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- d = container_of(work, struct rdt_domain, cqm_limbo.work);
+ d = container_of(work, struct rdt_mon_domain, cqm_limbo.work);

__check_limbo(d, false);

@@ -656,7 +656,7 @@ void cqm_handle_limbo(struct work_struct *work)
mutex_unlock(&rdtgroup_mutex);
}

-void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms)
+void cqm_setup_limbo_handler(struct rdt_mon_domain *dom, unsigned long delay_ms)
{
unsigned long delay = msecs_to_jiffies(delay_ms);
int cpu;
@@ -672,9 +672,9 @@ void mbm_handle_overflow(struct work_struct *work)
unsigned long delay = msecs_to_jiffies(MBM_OVERFLOW_INTERVAL);
struct rdtgroup *prgrp, *crgrp;
int cpu = smp_processor_id();
+ struct rdt_mon_domain *d;
struct list_head *head;
struct rdt_resource *r;
- struct rdt_domain *d;

mutex_lock(&rdtgroup_mutex);

@@ -682,7 +682,7 @@ void mbm_handle_overflow(struct work_struct *work)
goto out_unlock;

r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- d = container_of(work, struct rdt_domain, mbm_over.work);
+ d = container_of(work, struct rdt_mon_domain, mbm_over.work);

list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
mbm_update(r, d, prgrp->mon.rmid);
@@ -701,7 +701,7 @@ void mbm_handle_overflow(struct work_struct *work)
mutex_unlock(&rdtgroup_mutex);
}

-void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
+void mbm_setup_overflow_handler(struct rdt_mon_domain *dom, unsigned long delay_ms)
{
unsigned long delay = msecs_to_jiffies(delay_ms);
int cpu;
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index ed6d59af1cef..08d35f828bc3 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -814,7 +814,7 @@ int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp)
* Return: true if @cbm overlaps with pseudo-locked region on @d, false
* otherwise.
*/
-bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, unsigned long cbm)
+bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_ctrl_domain *d, unsigned long cbm)
{
unsigned int cbm_len;
unsigned long cbm_b;
@@ -841,11 +841,11 @@ bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, unsigned long cbm
* if it is not possible to test due to memory allocation issue,
* false otherwise.
*/
-bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
+bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_ctrl_domain *d)
{
+ struct rdt_ctrl_domain *d_i;
cpumask_var_t cpu_with_psl;
struct rdt_resource *r;
- struct rdt_domain *d_i;
bool ret = false;

if (!zalloc_cpumask_var(&cpu_with_psl, GFP_KERNEL))
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 760013ed1bff..21bbd832f3f2 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -85,8 +85,8 @@ void rdt_last_cmd_printf(const char *fmt, ...)

void rdt_staged_configs_clear(void)
{
+ struct rdt_ctrl_domain *dom;
struct rdt_resource *r;
- struct rdt_domain *dom;

lockdep_assert_held(&rdtgroup_mutex);

@@ -976,7 +976,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
unsigned long sw_shareable = 0, hw_shareable = 0;
unsigned long exclusive = 0, pseudo_locked = 0;
struct rdt_resource *r = s->res;
- struct rdt_domain *dom;
+ struct rdt_ctrl_domain *dom;
int i, hwb, swb, excl, psl;
enum rdtgrp_mode mode;
bool sep = false;
@@ -1205,7 +1205,7 @@ static int rdt_has_sparse_bitmasks_show(struct kernfs_open_file *of,
*
* Return: false if CBM does not overlap, true if it does.
*/
-static bool __rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d,
+static bool __rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_ctrl_domain *d,
unsigned long cbm, int closid,
enum resctrl_conf_type type, bool exclusive)
{
@@ -1260,7 +1260,7 @@ static bool __rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d
*
* Return: true if CBM overlap detected, false if there is no overlap
*/
-bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_domain *d,
+bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_ctrl_domain *d,
unsigned long cbm, int closid, bool exclusive)
{
enum resctrl_conf_type peer_type = resctrl_peer_type(s->conf_type);
@@ -1291,10 +1291,10 @@ bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_domain *d,
static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
{
int closid = rdtgrp->closid;
+ struct rdt_ctrl_domain *d;
struct resctrl_schema *s;
struct rdt_resource *r;
bool has_cache = false;
- struct rdt_domain *d;
u32 ctrl;

list_for_each_entry(s, &resctrl_schema_all, list) {
@@ -1407,7 +1407,7 @@ static ssize_t rdtgroup_mode_write(struct kernfs_open_file *of,
* bitmap functions work correctly.
*/
unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
- struct rdt_domain *d, unsigned long cbm)
+ struct rdt_ctrl_domain *d, unsigned long cbm)
{
struct cpu_cacheinfo *ci;
unsigned int size = 0;
@@ -1439,9 +1439,9 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
{
struct resctrl_schema *schema;
enum resctrl_conf_type type;
+ struct rdt_ctrl_domain *d;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
- struct rdt_domain *d;
unsigned int size;
int ret = 0;
u32 closid;
@@ -1553,7 +1553,7 @@ static void mon_event_config_read(void *info)
mon_info->mon_config = msrval & MAX_EVT_CONFIG_BITS;
}

-static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
+static void mondata_config_read(struct rdt_mon_domain *d, struct mon_config_info *mon_info)
{
smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_read, mon_info, 1);
}
@@ -1561,7 +1561,7 @@ static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mo
static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
{
struct mon_config_info mon_info = {0};
- struct rdt_domain *dom;
+ struct rdt_mon_domain *dom;
bool sep = false;

mutex_lock(&rdtgroup_mutex);
@@ -1618,7 +1618,7 @@ static void mon_event_config_write(void *info)
}

static int mbm_config_write_domain(struct rdt_resource *r,
- struct rdt_domain *d, u32 evtid, u32 val)
+ struct rdt_mon_domain *d, u32 evtid, u32 val)
{
struct mon_config_info mon_info = {0};
int ret = 0;
@@ -1668,7 +1668,7 @@ static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
{
char *dom_str = NULL, *id_str;
unsigned long dom_id, val;
- struct rdt_domain *d;
+ struct rdt_mon_domain *d;
int ret = 0;

next:
@@ -2216,9 +2216,9 @@ static inline bool is_mba_linear(void)
static int set_cache_qos_cfg(int level, bool enable)
{
void (*update)(void *arg);
+ struct rdt_ctrl_domain *d;
struct rdt_resource *r_l;
cpumask_var_t cpu_mask;
- struct rdt_domain *d;
int cpu;

if (level == RDT_RESOURCE_L3)
@@ -2265,7 +2265,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
l3_qos_cfg_update(&hw_res->cdp_enabled);
}

-static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_domain *d)
+static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_ctrl_domain *d)
{
u32 num_closid = resctrl_arch_get_num_closid(r);
int cpu = cpumask_any(&d->hdr.cpu_mask);
@@ -2283,7 +2283,7 @@ static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_domain *d)
}

static void mba_sc_domain_destroy(struct rdt_resource *r,
- struct rdt_domain *d)
+ struct rdt_ctrl_domain *d)
{
kfree(d->mbps_val);
d->mbps_val = NULL;
@@ -2309,7 +2309,7 @@ static int set_mba_sc(bool mba_sc)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
u32 num_closid = resctrl_arch_get_num_closid(r);
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
int i;

if (!supports_mba_mbps() || mba_sc == is_mba_sc(r))
@@ -2578,7 +2578,7 @@ static int rdt_get_tree(struct fs_context *fc)
{
struct rdt_fs_context *ctx = rdt_fc2context(fc);
unsigned long flags = RFTYPE_CTRL_BASE;
- struct rdt_domain *dom;
+ struct rdt_mon_domain *dom;
struct rdt_resource *r;
int ret;

@@ -2762,10 +2762,10 @@ static int rdt_init_fs_context(struct fs_context *fc)
static int reset_all_ctrls(struct rdt_resource *r)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- struct rdt_hw_domain *hw_dom;
+ struct rdt_hw_ctrl_domain *hw_dom;
struct msr_param msr_param;
+ struct rdt_ctrl_domain *d;
cpumask_var_t cpu_mask;
- struct rdt_domain *d;
int i;

if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
@@ -2781,7 +2781,7 @@ static int reset_all_ctrls(struct rdt_resource *r)
* from each domain to update the MSRs below.
*/
list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
- hw_dom = resctrl_to_arch_dom(d);
+ hw_dom = resctrl_to_arch_ctrl_dom(d);
cpumask_set_cpu(cpumask_any(&d->hdr.cpu_mask), cpu_mask);

for (i = 0; i < hw_res->num_closid; i++)
@@ -2976,7 +2976,7 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
}

static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
- struct rdt_domain *d,
+ struct rdt_mon_domain *d,
struct rdt_resource *r, struct rdtgroup *prgrp)
{
union mon_data_bits priv;
@@ -3025,7 +3025,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
* and "monitor" groups with given domain id.
*/
static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
- struct rdt_domain *d)
+ struct rdt_mon_domain *d)
{
struct kernfs_node *parent_kn;
struct rdtgroup *prgrp, *crgrp;
@@ -3047,7 +3047,7 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn,
struct rdt_resource *r,
struct rdtgroup *prgrp)
{
- struct rdt_domain *dom;
+ struct rdt_mon_domain *dom;
int ret;

list_for_each_entry(dom, &r->mon_domains, hdr.list) {
@@ -3149,7 +3149,7 @@ static u32 cbm_ensure_valid(u32 _val, struct rdt_resource *r)
* Set the RDT domain up to start off with all usable allocations. That is,
* all shareable and unused bits. All-zero CBM is invalid.
*/
-static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema *s,
+static int __init_one_rdt_domain(struct rdt_ctrl_domain *d, struct resctrl_schema *s,
u32 closid)
{
enum resctrl_conf_type peer_type = resctrl_peer_type(s->conf_type);
@@ -3229,7 +3229,7 @@ static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema *s,
*/
static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid)
{
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;
int ret;

list_for_each_entry(d, &s->res->ctrl_domains, hdr.list) {
@@ -3245,7 +3245,7 @@ static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid)
static void rdtgroup_init_mba(struct rdt_resource *r, u32 closid)
{
struct resctrl_staged_config *cfg;
- struct rdt_domain *d;
+ struct rdt_ctrl_domain *d;

list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
if (is_mba_sc(r)) {
@@ -3842,14 +3842,14 @@ static void __init rdtgroup_setup_default(void)
mutex_unlock(&rdtgroup_mutex);
}

-static void domain_destroy_mon_state(struct rdt_domain *d)
+static void domain_destroy_mon_state(struct rdt_mon_domain *d)
{
bitmap_free(d->rmid_busy_llc);
kfree(d->mbm_total);
kfree(d->mbm_local);
}

-void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
+void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d)
{
lockdep_assert_held(&rdtgroup_mutex);

@@ -3857,7 +3857,7 @@ void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
mba_sc_domain_destroy(r, d);
}

-void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
{
lockdep_assert_held(&rdtgroup_mutex);

@@ -3886,7 +3886,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
domain_destroy_mon_state(d);
}

-static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
+static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain *d)
{
size_t tsize;

@@ -3916,7 +3916,7 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

-int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
+int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d)
{
lockdep_assert_held(&rdtgroup_mutex);

@@ -3926,7 +3926,7 @@ int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

-int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d)
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
{
int err;

--
2.41.0

2023-10-31 21:17:53

by Tony Luck

[permalink] [raw]
Subject: [PATCH v10 8/8] x86/resctrl: Update documentation with Sub-NUMA cluster changes

With Sub-NUMA Cluster mode enabled the scope of monitoring resources is
per-NODE instead of per-L3 cache. Suffixes of directories with "L3" in
their name refer to Sub-NUMA nodes instead of L3 cache ids.

Users should be aware that SNC mode also affects the amount of L3 cache
available for allocation within each SNC node.

Reviewed-by: Peter Newman <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
No changes since v9

Documentation/arch/x86/resctrl.rst | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index a6279df64a9d..d1db200db5f9 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -366,9 +366,9 @@ When control is enabled all CTRL_MON groups will also contain:
When monitoring is enabled all MON groups will also contain:

"mon_data":
- This contains a set of files organized by L3 domain and by
- RDT event. E.g. on a system with two L3 domains there will
- be subdirectories "mon_L3_00" and "mon_L3_01". Each of these
+ This contains a set of files organized by L3 domain or by NUMA
+ node (depending on whether Sub-NUMA Cluster (SNC) mode is disabled
+ or enabled respectively) and by RDT event. Each of these
directories have one file per event (e.g. "llc_occupancy",
"mbm_total_bytes", and "mbm_local_bytes"). In a MON group these
files provide a read out of the current value of the event for
@@ -478,6 +478,23 @@ if non-contiguous 1s value is supported. On a system with a 20-bit mask
each bit represents 5% of the capacity of the cache. You could partition
the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.

+Notes on Sub-NUMA Cluster mode
+==============================
+When SNC mode is enabled Linux may load balance tasks between Sub-NUMA
+nodes much more readily than between regular NUMA nodes since the CPUs
+on Sub-NUMA nodes share the same L3 cache and the system may report
+the NUMA distance between Sub-NUMA nodes with a lower value than used
+for regular NUMA nodes. Users who do not bind tasks to the CPUs of a
+specific Sub-NUMA node must read the "llc_occupancy", "mbm_total_bytes",
+and "mbm_local_bytes" for all Sub-NUMA nodes where the tasks may execute
+to get the full view of traffic for which the tasks were the source.
+
+The cache allocation feature still provides the same number of
+bits in a mask to control allocation into the L3 cache. But each
+of those ways has its capacity reduced because the cache is divided
+between the SNC nodes. The values reported in the resctrl
+"size" files are adjusted accordingly.
+
Memory bandwidth Allocation and monitoring
==========================================

--
2.41.0

2023-10-31 21:17:58

by Tony Luck

[permalink] [raw]
Subject: [PATCH v10 1/8] x86/resctrl: Prepare for new domain scope

Resctrl resources operate on subsets of CPUs in the system with the
defining attribute of each subset being an instance of a particular
level of cache. E.g. all CPUs sharing an L3 cache would be part of the
same domain.

In preparation for features that are scoped at the NUMA node level
change the code from explicit references to "cache_level" to a more
generic scope. At this point the only options for this scope are groups
of CPUs that share an L2 cache or L3 cache.

Clean up the error handling when looking up domains. Report invalid id's
before calling rdt_find_domain() in preparation for better messages when
scope can be other than cache scope. This means that rdt_find_domain()
will never return an error. So remove checks for error from the callsites.

Signed-off-by: Tony Luck <[email protected]>
---
Changes since v9
New test for invalid domain id before calling rdt_find_domain() means that
error handling in that function and at all call-sites can be simplified.
In pseudo_lock_region_init() use the new enum resctrl_scope for local variable.

include/linux/resctrl.h | 9 +++--
arch/x86/kernel/cpu/resctrl/core.c | 40 +++++++++++++++--------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 2 +-
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 6 +++-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 5 ++-
5 files changed, 44 insertions(+), 18 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 66942d7fba7f..7d4eb7df611d 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -144,13 +144,18 @@ struct resctrl_membw {
struct rdt_parse_data;
struct resctrl_schema;

+enum resctrl_scope {
+ RESCTRL_L2_CACHE = 2,
+ RESCTRL_L3_CACHE = 3,
+};
+
/**
* struct rdt_resource - attributes of a resctrl resource
* @rid: The index of the resource
* @alloc_capable: Is allocation available on this machine
* @mon_capable: Is monitor feature available on this machine
* @num_rmid: Number of RMIDs available
- * @cache_level: Which cache level defines scope of this resource
+ * @scope: Scope of this resource
* @cache: Cache allocation related data
* @membw: If the component has bandwidth controls, their properties.
* @domains: All domains for this resource
@@ -168,7 +173,7 @@ struct rdt_resource {
bool alloc_capable;
bool mon_capable;
int num_rmid;
- int cache_level;
+ enum resctrl_scope scope;
struct resctrl_cache cache;
struct resctrl_membw membw;
struct list_head domains;
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 19e0681f0435..47f92390edbb 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -65,7 +65,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_L3,
.name = "L3",
- .cache_level = 3,
+ .scope = RESCTRL_L3_CACHE,
.domains = domain_init(RDT_RESOURCE_L3),
.parse_ctrlval = parse_cbm,
.format_str = "%d=%0*x",
@@ -79,7 +79,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_L2,
.name = "L2",
- .cache_level = 2,
+ .scope = RESCTRL_L2_CACHE,
.domains = domain_init(RDT_RESOURCE_L2),
.parse_ctrlval = parse_cbm,
.format_str = "%d=%0*x",
@@ -93,7 +93,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_MBA,
.name = "MB",
- .cache_level = 3,
+ .scope = RESCTRL_L3_CACHE,
.domains = domain_init(RDT_RESOURCE_MBA),
.parse_ctrlval = parse_bw,
.format_str = "%d=%*u",
@@ -105,7 +105,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.r_resctrl = {
.rid = RDT_RESOURCE_SMBA,
.name = "SMBA",
- .cache_level = 3,
+ .scope = RESCTRL_L3_CACHE,
.domains = domain_init(RDT_RESOURCE_SMBA),
.parse_ctrlval = parse_bw,
.format_str = "%d=%*u",
@@ -401,9 +401,6 @@ struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
struct rdt_domain *d;
struct list_head *l;

- if (id < 0)
- return ERR_PTR(-ENODEV);
-
list_for_each(l, &r->domains) {
d = list_entry(l, struct rdt_domain, list);
/* When id is found, return its domain. */
@@ -491,6 +488,19 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
return 0;
}

+static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
+{
+ switch (scope) {
+ case RESCTRL_L2_CACHE:
+ case RESCTRL_L3_CACHE:
+ return get_cpu_cacheinfo_id(cpu, scope);
+ default:
+ break;
+ }
+
+ return -EINVAL;
+}
+
/*
* domain_add_cpu - Add a cpu to a resource's domain list.
*
@@ -506,17 +516,18 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
*/
static void domain_add_cpu(int cpu, struct rdt_resource *r)
{
- int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
+ int id = get_domain_id_from_scope(cpu, r->scope);
struct list_head *add_pos = NULL;
struct rdt_hw_domain *hw_dom;
struct rdt_domain *d;
int err;

- d = rdt_find_domain(r, id, &add_pos);
- if (IS_ERR(d)) {
- pr_warn("Couldn't find cache id for CPU %d\n", cpu);
+ if (id < 0) {
+ pr_warn_once("Can't find domain id for CPU:%d scope:%d for resource %s\n",
+ cpu, r->scope, r->name);
return;
}
+ d = rdt_find_domain(r, id, &add_pos);

if (d) {
cpumask_set_cpu(cpu, &d->cpu_mask);
@@ -556,12 +567,15 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)

static void domain_remove_cpu(int cpu, struct rdt_resource *r)
{
- int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
+ int id = get_domain_id_from_scope(cpu, r->scope);
struct rdt_hw_domain *hw_dom;
struct rdt_domain *d;

+ if (id < 0)
+ return;
+
d = rdt_find_domain(r, id, NULL);
- if (IS_ERR_OR_NULL(d)) {
+ if (!d) {
pr_warn("Couldn't find cache id for CPU %d\n", cpu);
return;
}
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index beccb0e87ba7..3f8891d57fac 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -563,7 +563,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)

r = &rdt_resources_all[resid].r_resctrl;
d = rdt_find_domain(r, domid, NULL);
- if (IS_ERR_OR_NULL(d)) {
+ if (!d) {
ret = -ENOENT;
goto out;
}
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 8f559eeae08e..2a682da9f43a 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -292,10 +292,14 @@ static void pseudo_lock_region_clear(struct pseudo_lock_region *plr)
*/
static int pseudo_lock_region_init(struct pseudo_lock_region *plr)
{
+ enum resctrl_scope scope = plr->s->res->scope;
struct cpu_cacheinfo *ci;
int ret;
int i;

+ if (WARN_ON_ONCE(scope != RESCTRL_L2_CACHE && scope != RESCTRL_L3_CACHE))
+ return -ENODEV;
+
/* Pick the first cpu we find that is associated with the cache. */
plr->cpu = cpumask_first(&plr->d->cpu_mask);

@@ -311,7 +315,7 @@ static int pseudo_lock_region_init(struct pseudo_lock_region *plr)
plr->size = rdtgroup_cbm_to_size(plr->s->res, plr->d, plr->cbm);

for (i = 0; i < ci->num_leaves; i++) {
- if (ci->info_list[i].level == plr->s->res->cache_level) {
+ if (ci->info_list[i].level == scope) {
plr->line_size = ci->info_list[i].coherency_line_size;
return 0;
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 69a1de92384a..c44be64d65ec 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1413,10 +1413,13 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
unsigned int size = 0;
int num_b, i;

+ if (WARN_ON_ONCE(r->scope != RESCTRL_L2_CACHE && r->scope != RESCTRL_L3_CACHE))
+ return size;
+
num_b = bitmap_weight(&cbm, r->cache.cbm_len);
ci = get_cpu_cacheinfo(cpumask_any(&d->cpu_mask));
for (i = 0; i < ci->num_leaves; i++) {
- if (ci->info_list[i].level == r->cache_level) {
+ if (ci->info_list[i].level == r->scope) {
size = ci->info_list[i].size / r->cache.cbm_len * num_b;
break;
}
--
2.41.0

2023-10-31 21:18:01

by Tony Luck

[permalink] [raw]
Subject: [PATCH v10 2/8] x86/resctrl: Prepare to split rdt_domain structure

The rdt_domain structure is used for both control and monitor features.
It is about to be split into separate structures for these two usages
because the scope for control and monitoring features for a resource
will be different for future resources.

To allow for common code that scans a list of domains looking for a
specific domain id, move all the common fields ("list", "id", "cpu_mask")
into their own structure within the rdt_domain structure.

Signed-off-by: Tony Luck <[email protected]>
---
Changes since v9
Include *all* common fields in the rdt_domain_hdr. Defer adding "type" until it is
used later in part #3.

include/linux/resctrl.h | 16 ++++--
arch/x86/kernel/cpu/resctrl/core.c | 26 +++++-----
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 22 ++++-----
arch/x86/kernel/cpu/resctrl/monitor.c | 10 ++--
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 14 +++---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 60 +++++++++++------------
6 files changed, 78 insertions(+), 70 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 7d4eb7df611d..c4067150a6b7 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -53,10 +53,20 @@ struct resctrl_staged_config {
};

/**
- * struct rdt_domain - group of CPUs sharing a resctrl resource
+ * struct rdt_domain_hdr - common header for different domain types
* @list: all instances of this resource
* @id: unique id for this instance
* @cpu_mask: which CPUs share this resource
+ */
+struct rdt_domain_hdr {
+ struct list_head list;
+ int id;
+ struct cpumask cpu_mask;
+};
+
+/**
+ * struct rdt_domain - group of CPUs sharing a resctrl resource
+ * @hdr: common header for different domain types
* @rmid_busy_llc: bitmap of which limbo RMIDs are above threshold
* @mbm_total: saved state for MBM total bandwidth
* @mbm_local: saved state for MBM local bandwidth
@@ -71,9 +81,7 @@ struct resctrl_staged_config {
* by closid
*/
struct rdt_domain {
- struct list_head list;
- int id;
- struct cpumask cpu_mask;
+ struct rdt_domain_hdr hdr;
unsigned long *rmid_busy_llc;
struct mbm_state *mbm_total;
struct mbm_state *mbm_local;
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 47f92390edbb..c26ecb2e415f 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -356,9 +356,9 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
{
struct rdt_domain *d;

- list_for_each_entry(d, &r->domains, list) {
+ list_for_each_entry(d, &r->domains, hdr.list) {
/* Find the domain that contains this CPU */
- if (cpumask_test_cpu(cpu, &d->cpu_mask))
+ if (cpumask_test_cpu(cpu, &d->hdr.cpu_mask))
return d;
}

@@ -402,12 +402,12 @@ struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
struct list_head *l;

list_for_each(l, &r->domains) {
- d = list_entry(l, struct rdt_domain, list);
+ d = list_entry(l, struct rdt_domain, hdr.list);
/* When id is found, return its domain. */
- if (id == d->id)
+ if (id == d->hdr.id)
return d;
/* Stop searching when finding id's position in sorted list. */
- if (id < d->id)
+ if (id < d->hdr.id)
break;
}

@@ -530,7 +530,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
d = rdt_find_domain(r, id, &add_pos);

if (d) {
- cpumask_set_cpu(cpu, &d->cpu_mask);
+ cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
if (r->cache.arch_has_per_cpu_cfg)
rdt_domain_reconfigure_cdp(r);
return;
@@ -541,8 +541,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
return;

d = &hw_dom->d_resctrl;
- d->id = id;
- cpumask_set_cpu(cpu, &d->cpu_mask);
+ d->hdr.id = id;
+ cpumask_set_cpu(cpu, &d->hdr.cpu_mask);

rdt_domain_reconfigure_cdp(r);

@@ -556,11 +556,11 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
return;
}

- list_add_tail(&d->list, add_pos);
+ list_add_tail(&d->hdr.list, add_pos);

err = resctrl_online_domain(r, d);
if (err) {
- list_del(&d->list);
+ list_del(&d->hdr.list);
domain_free(hw_dom);
}
}
@@ -581,10 +581,10 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
}
hw_dom = resctrl_to_arch_dom(d);

- cpumask_clear_cpu(cpu, &d->cpu_mask);
- if (cpumask_empty(&d->cpu_mask)) {
+ cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
+ if (cpumask_empty(&d->hdr.cpu_mask)) {
resctrl_offline_domain(r, d);
- list_del(&d->list);
+ list_del(&d->hdr.list);

/*
* rdt_domain "d" is going to be freed below, so clear
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 3f8891d57fac..23f8258d36a8 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -67,7 +67,7 @@ int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,

cfg = &d->staged_config[s->conf_type];
if (cfg->have_new_ctrl) {
- rdt_last_cmd_printf("Duplicate domain %d\n", d->id);
+ rdt_last_cmd_printf("Duplicate domain %d\n", d->hdr.id);
return -EINVAL;
}

@@ -146,7 +146,7 @@ int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,

cfg = &d->staged_config[s->conf_type];
if (cfg->have_new_ctrl) {
- rdt_last_cmd_printf("Duplicate domain %d\n", d->id);
+ rdt_last_cmd_printf("Duplicate domain %d\n", d->hdr.id);
return -EINVAL;
}

@@ -226,8 +226,8 @@ static int parse_line(char *line, struct resctrl_schema *s,
return -EINVAL;
}
dom = strim(dom);
- list_for_each_entry(d, &r->domains, list) {
- if (d->id == dom_id) {
+ list_for_each_entry(d, &r->domains, hdr.list) {
+ if (d->hdr.id == dom_id) {
data.buf = dom;
data.rdtgrp = rdtgrp;
if (r->parse_ctrlval(&data, s, d))
@@ -274,7 +274,7 @@ static bool apply_config(struct rdt_hw_domain *hw_dom,
struct rdt_domain *dom = &hw_dom->d_resctrl;

if (cfg->new_ctrl != hw_dom->ctrl_val[idx]) {
- cpumask_set_cpu(cpumask_any(&dom->cpu_mask), cpu_mask);
+ cpumask_set_cpu(cpumask_any(&dom->hdr.cpu_mask), cpu_mask);
hw_dom->ctrl_val[idx] = cfg->new_ctrl;

return true;
@@ -291,7 +291,7 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
u32 idx = get_config_index(closid, t);
struct msr_param msr_param;

- if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
+ if (!cpumask_test_cpu(smp_processor_id(), &d->hdr.cpu_mask))
return -EINVAL;

hw_dom->ctrl_val[idx] = cfg_val;
@@ -318,7 +318,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
return -ENOMEM;

msr_param.res = NULL;
- list_for_each_entry(d, &r->domains, list) {
+ list_for_each_entry(d, &r->domains, hdr.list) {
hw_dom = resctrl_to_arch_dom(d);
for (t = 0; t < CDP_NUM_TYPES; t++) {
cfg = &hw_dom->d_resctrl.staged_config[t];
@@ -466,7 +466,7 @@ static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int clo
u32 ctrl_val;

seq_printf(s, "%*s:", max_name_width, schema->name);
- list_for_each_entry(dom, &r->domains, list) {
+ list_for_each_entry(dom, &r->domains, hdr.list) {
if (sep)
seq_puts(s, ";");

@@ -476,7 +476,7 @@ static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int clo
ctrl_val = resctrl_arch_get_config(r, dom, closid,
schema->conf_type);

- seq_printf(s, r->format_str, dom->id, max_data_width,
+ seq_printf(s, r->format_str, dom->hdr.id, max_data_width,
ctrl_val);
sep = true;
}
@@ -505,7 +505,7 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
} else {
seq_printf(s, "%s:%d=%x\n",
rdtgrp->plr->s->res->name,
- rdtgrp->plr->d->id,
+ rdtgrp->plr->d->hdr.id,
rdtgrp->plr->cbm);
}
} else {
@@ -536,7 +536,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
rr->val = 0;
rr->first = first;

- smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
+ smp_call_function_any(&d->hdr.cpu_mask, mon_event_count, rr, 1);
}

int rdtgroup_mondata_show(struct seq_file *m, void *arg)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index f136ac046851..dd0ea1bc0092 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -238,7 +238,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
u64 msr_val, chunks;
int ret;

- if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
+ if (!cpumask_test_cpu(smp_processor_id(), &d->hdr.cpu_mask))
return -EINVAL;

ret = __rmid_read(rmid, eventid, &msr_val);
@@ -340,8 +340,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)

entry->busy = 0;
cpu = get_cpu();
- list_for_each_entry(d, &r->domains, list) {
- if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
+ list_for_each_entry(d, &r->domains, hdr.list) {
+ if (cpumask_test_cpu(cpu, &d->hdr.cpu_mask)) {
err = resctrl_arch_rmid_read(r, d, entry->rmid,
QOS_L3_OCCUP_EVENT_ID,
&val);
@@ -661,7 +661,7 @@ void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms)
unsigned long delay = msecs_to_jiffies(delay_ms);
int cpu;

- cpu = cpumask_any(&dom->cpu_mask);
+ cpu = cpumask_any(&dom->hdr.cpu_mask);
dom->cqm_work_cpu = cpu;

schedule_delayed_work_on(cpu, &dom->cqm_limbo, delay);
@@ -708,7 +708,7 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)

if (!static_branch_likely(&rdt_mon_enable_key))
return;
- cpu = cpumask_any(&dom->cpu_mask);
+ cpu = cpumask_any(&dom->hdr.cpu_mask);
dom->mbm_work_cpu = cpu;
schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
}
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 2a682da9f43a..fcbd99e2eb66 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -221,7 +221,7 @@ static int pseudo_lock_cstates_constrain(struct pseudo_lock_region *plr)
int cpu;
int ret;

- for_each_cpu(cpu, &plr->d->cpu_mask) {
+ for_each_cpu(cpu, &plr->d->hdr.cpu_mask) {
pm_req = kzalloc(sizeof(*pm_req), GFP_KERNEL);
if (!pm_req) {
rdt_last_cmd_puts("Failure to allocate memory for PM QoS\n");
@@ -301,7 +301,7 @@ static int pseudo_lock_region_init(struct pseudo_lock_region *plr)
return -ENODEV;

/* Pick the first cpu we find that is associated with the cache. */
- plr->cpu = cpumask_first(&plr->d->cpu_mask);
+ plr->cpu = cpumask_first(&plr->d->hdr.cpu_mask);

if (!cpu_online(plr->cpu)) {
rdt_last_cmd_printf("CPU %u associated with cache not online\n",
@@ -856,10 +856,10 @@ bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
* associated with them.
*/
for_each_alloc_capable_rdt_resource(r) {
- list_for_each_entry(d_i, &r->domains, list) {
+ list_for_each_entry(d_i, &r->domains, hdr.list) {
if (d_i->plr)
cpumask_or(cpu_with_psl, cpu_with_psl,
- &d_i->cpu_mask);
+ &d_i->hdr.cpu_mask);
}
}

@@ -867,7 +867,7 @@ bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
* Next test if new pseudo-locked region would intersect with
* existing region.
*/
- if (cpumask_intersects(&d->cpu_mask, cpu_with_psl))
+ if (cpumask_intersects(&d->hdr.cpu_mask, cpu_with_psl))
ret = true;

free_cpumask_var(cpu_with_psl);
@@ -1199,7 +1199,7 @@ static int pseudo_lock_measure_cycles(struct rdtgroup *rdtgrp, int sel)
}

plr->thread_done = 0;
- cpu = cpumask_first(&plr->d->cpu_mask);
+ cpu = cpumask_first(&plr->d->hdr.cpu_mask);
if (!cpu_online(cpu)) {
ret = -ENODEV;
goto out;
@@ -1529,7 +1529,7 @@ static int pseudo_lock_dev_mmap(struct file *filp, struct vm_area_struct *vma)
* may be scheduled elsewhere and invalidate entries in the
* pseudo-locked region.
*/
- if (!cpumask_subset(current->cpus_ptr, &plr->d->cpu_mask)) {
+ if (!cpumask_subset(current->cpus_ptr, &plr->d->hdr.cpu_mask)) {
mutex_unlock(&rdtgroup_mutex);
return -EINVAL;
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index c44be64d65ec..04d32602ac33 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -91,7 +91,7 @@ void rdt_staged_configs_clear(void)
lockdep_assert_held(&rdtgroup_mutex);

for_each_alloc_capable_rdt_resource(r) {
- list_for_each_entry(dom, &r->domains, list)
+ list_for_each_entry(dom, &r->domains, hdr.list)
memset(dom->staged_config, 0, sizeof(dom->staged_config));
}
}
@@ -295,7 +295,7 @@ static int rdtgroup_cpus_show(struct kernfs_open_file *of,
rdt_last_cmd_puts("Cache domain offline\n");
ret = -ENODEV;
} else {
- mask = &rdtgrp->plr->d->cpu_mask;
+ mask = &rdtgrp->plr->d->hdr.cpu_mask;
seq_printf(s, is_cpu_list(of) ?
"%*pbl\n" : "%*pb\n",
cpumask_pr_args(mask));
@@ -984,12 +984,12 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,

mutex_lock(&rdtgroup_mutex);
hw_shareable = r->cache.shareable_bits;
- list_for_each_entry(dom, &r->domains, list) {
+ list_for_each_entry(dom, &r->domains, hdr.list) {
if (sep)
seq_putc(seq, ';');
sw_shareable = 0;
exclusive = 0;
- seq_printf(seq, "%d=", dom->id);
+ seq_printf(seq, "%d=", dom->hdr.id);
for (i = 0; i < closids_supported(); i++) {
if (!closid_allocated(i))
continue;
@@ -1302,7 +1302,7 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
if (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)
continue;
has_cache = true;
- list_for_each_entry(d, &r->domains, list) {
+ list_for_each_entry(d, &r->domains, hdr.list) {
ctrl = resctrl_arch_get_config(r, d, closid,
s->conf_type);
if (rdtgroup_cbm_overlaps(s, d, ctrl, closid, false)) {
@@ -1417,7 +1417,7 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
return size;

num_b = bitmap_weight(&cbm, r->cache.cbm_len);
- ci = get_cpu_cacheinfo(cpumask_any(&d->cpu_mask));
+ ci = get_cpu_cacheinfo(cpumask_any(&d->hdr.cpu_mask));
for (i = 0; i < ci->num_leaves; i++) {
if (ci->info_list[i].level == r->scope) {
size = ci->info_list[i].size / r->cache.cbm_len * num_b;
@@ -1465,7 +1465,7 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
size = rdtgroup_cbm_to_size(rdtgrp->plr->s->res,
rdtgrp->plr->d,
rdtgrp->plr->cbm);
- seq_printf(s, "%d=%u\n", rdtgrp->plr->d->id, size);
+ seq_printf(s, "%d=%u\n", rdtgrp->plr->d->hdr.id, size);
}
goto out;
}
@@ -1477,7 +1477,7 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
type = schema->conf_type;
sep = false;
seq_printf(s, "%*s:", max_name_width, schema->name);
- list_for_each_entry(d, &r->domains, list) {
+ list_for_each_entry(d, &r->domains, hdr.list) {
if (sep)
seq_putc(s, ';');
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
@@ -1495,7 +1495,7 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
else
size = rdtgroup_cbm_to_size(r, d, ctrl);
}
- seq_printf(s, "%d=%u", d->id, size);
+ seq_printf(s, "%d=%u", d->hdr.id, size);
sep = true;
}
seq_putc(s, '\n');
@@ -1555,7 +1555,7 @@ static void mon_event_config_read(void *info)

static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
{
- smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
+ smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_read, mon_info, 1);
}

static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
@@ -1566,7 +1566,7 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid

mutex_lock(&rdtgroup_mutex);

- list_for_each_entry(dom, &r->domains, list) {
+ list_for_each_entry(dom, &r->domains, hdr.list) {
if (sep)
seq_puts(s, ";");

@@ -1574,7 +1574,7 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid
mon_info.evtid = evtid;
mondata_config_read(dom, &mon_info);

- seq_printf(s, "%d=0x%02x", dom->id, mon_info.mon_config);
+ seq_printf(s, "%d=0x%02x", dom->hdr.id, mon_info.mon_config);
sep = true;
}
seq_puts(s, "\n");
@@ -1646,7 +1646,7 @@ static int mbm_config_write_domain(struct rdt_resource *r,
* are scoped at the domain level. Writing any of these MSRs
* on one CPU is observed by all the CPUs in the domain.
*/
- smp_call_function_any(&d->cpu_mask, mon_event_config_write,
+ smp_call_function_any(&d->hdr.cpu_mask, mon_event_config_write,
&mon_info, 1);

/*
@@ -1689,8 +1689,8 @@ static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
return -EINVAL;
}

- list_for_each_entry(d, &r->domains, list) {
- if (d->id == dom_id) {
+ list_for_each_entry(d, &r->domains, hdr.list) {
+ if (d->hdr.id == dom_id) {
ret = mbm_config_write_domain(r, d, evtid, val);
if (ret)
return -EINVAL;
@@ -2232,14 +2232,14 @@ static int set_cache_qos_cfg(int level, bool enable)
return -ENOMEM;

r_l = &rdt_resources_all[level].r_resctrl;
- list_for_each_entry(d, &r_l->domains, list) {
+ list_for_each_entry(d, &r_l->domains, hdr.list) {
if (r_l->cache.arch_has_per_cpu_cfg)
/* Pick all the CPUs in the domain instance */
- for_each_cpu(cpu, &d->cpu_mask)
+ for_each_cpu(cpu, &d->hdr.cpu_mask)
cpumask_set_cpu(cpu, cpu_mask);
else
/* Pick one CPU from each domain instance to update MSR */
- cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
+ cpumask_set_cpu(cpumask_any(&d->hdr.cpu_mask), cpu_mask);
}

/* Update QOS_CFG MSR on all the CPUs in cpu_mask */
@@ -2268,7 +2268,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_domain *d)
{
u32 num_closid = resctrl_arch_get_num_closid(r);
- int cpu = cpumask_any(&d->cpu_mask);
+ int cpu = cpumask_any(&d->hdr.cpu_mask);
int i;

d->mbps_val = kcalloc_node(num_closid, sizeof(*d->mbps_val),
@@ -2317,7 +2317,7 @@ static int set_mba_sc(bool mba_sc)

r->membw.mba_sc = mba_sc;

- list_for_each_entry(d, &r->domains, list) {
+ list_for_each_entry(d, &r->domains, hdr.list) {
for (i = 0; i < num_closid; i++)
d->mbps_val[i] = MBA_MAX_MBPS;
}
@@ -2653,7 +2653,7 @@ static int rdt_get_tree(struct fs_context *fc)

if (is_mbm_enabled()) {
r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- list_for_each_entry(dom, &r->domains, list)
+ list_for_each_entry(dom, &r->domains, hdr.list)
mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL);
}

@@ -2780,9 +2780,9 @@ static int reset_all_ctrls(struct rdt_resource *r)
* CBMs in all domains to the maximum mask value. Pick one CPU
* from each domain to update the MSRs below.
*/
- list_for_each_entry(d, &r->domains, list) {
+ list_for_each_entry(d, &r->domains, hdr.list) {
hw_dom = resctrl_to_arch_dom(d);
- cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
+ cpumask_set_cpu(cpumask_any(&d->hdr.cpu_mask), cpu_mask);

for (i = 0; i < hw_res->num_closid; i++)
hw_dom->ctrl_val[i] = r->default_ctrl;
@@ -2986,7 +2986,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
char name[32];
int ret;

- sprintf(name, "mon_%s_%02d", r->name, d->id);
+ sprintf(name, "mon_%s_%02d", r->name, d->hdr.id);
/* create the directory */
kn = kernfs_create_dir(parent_kn, name, parent_kn->mode, prgrp);
if (IS_ERR(kn))
@@ -3002,7 +3002,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
}

priv.u.rid = r->rid;
- priv.u.domid = d->id;
+ priv.u.domid = d->hdr.id;
list_for_each_entry(mevt, &r->evt_list, list) {
priv.u.evtid = mevt->evtid;
ret = mon_addfile(kn, mevt->name, priv.priv);
@@ -3050,7 +3050,7 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn,
struct rdt_domain *dom;
int ret;

- list_for_each_entry(dom, &r->domains, list) {
+ list_for_each_entry(dom, &r->domains, hdr.list) {
ret = mkdir_mondata_subdir(parent_kn, dom, r, prgrp);
if (ret)
return ret;
@@ -3209,7 +3209,7 @@ static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema *s,
*/
tmp_cbm = cfg->new_ctrl;
if (bitmap_weight(&tmp_cbm, r->cache.cbm_len) < r->cache.min_cbm_bits) {
- rdt_last_cmd_printf("No space on %s:%d\n", s->name, d->id);
+ rdt_last_cmd_printf("No space on %s:%d\n", s->name, d->hdr.id);
return -ENOSPC;
}
cfg->have_new_ctrl = true;
@@ -3232,7 +3232,7 @@ static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid)
struct rdt_domain *d;
int ret;

- list_for_each_entry(d, &s->res->domains, list) {
+ list_for_each_entry(d, &s->res->domains, hdr.list) {
ret = __init_one_rdt_domain(d, s, closid);
if (ret < 0)
return ret;
@@ -3247,7 +3247,7 @@ static void rdtgroup_init_mba(struct rdt_resource *r, u32 closid)
struct resctrl_staged_config *cfg;
struct rdt_domain *d;

- list_for_each_entry(d, &r->domains, list) {
+ list_for_each_entry(d, &r->domains, hdr.list) {
if (is_mba_sc(r)) {
d->mbps_val[closid] = MBA_MAX_MBPS;
continue;
@@ -3864,7 +3864,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
* per domain monitor data directories.
*/
if (static_branch_unlikely(&rdt_mon_enable_key))
- rmdir_mondata_subdir_allrdtgrp(r, d->id);
+ rmdir_mondata_subdir_allrdtgrp(r, d->hdr.id);

if (is_mbm_enabled())
cancel_delayed_work(&d->mbm_over);
--
2.41.0

2023-10-31 21:18:08

by Tony Luck

[permalink] [raw]
Subject: [PATCH v10 5/8] x86/resctrl: Add node-scope to the options for feature scope

Currently supported resctrl features are all domain scoped the same as the
scope of the L2 or L3 caches.

Add RESCTRL_NODE as a new option for features that are scoped at the
same granularity as NUMA nodes. This is needed for Intel's Sub-NUMA
Cluster (SNC) feature where monitoring features are node scoped.

Reviewed-by: Peter Newman <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
No changes since v9

include/linux/resctrl.h | 1 +
arch/x86/kernel/cpu/resctrl/core.c | 2 ++
2 files changed, 3 insertions(+)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 36503e8870cd..f42a5e59027b 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -172,6 +172,7 @@ struct resctrl_schema;
enum resctrl_scope {
RESCTRL_L2_CACHE = 2,
RESCTRL_L3_CACHE = 3,
+ RESCTRL_NODE,
};

/**
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 6bae0a658b94..d2c1aa8411a3 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -502,6 +502,8 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
case RESCTRL_L2_CACHE:
case RESCTRL_L3_CACHE:
return get_cpu_cacheinfo_id(cpu, scope);
+ case RESCTRL_NODE:
+ return cpu_to_node(cpu);
default:
break;
}
--
2.41.0

2023-10-31 21:18:11

by Tony Luck

[permalink] [raw]
Subject: [PATCH v10 6/8] x86/resctrl: Introduce snc_nodes_per_l3_cache

Intel Sub-NUMA Cluster (SNC) is a feature that subdivides the CPU cores
and memory controllers on a socket into two or more groups. These are
presented to the operating system as NUMA nodes.

This may enable some workloads to have slightly lower latency to memory
as the memory controller(s) in an SNC node are electrically closer to the
CPU cores on that SNC node. This cost may be offset by lower bandwidth
since the memory accesses for each core can only be interleaved between
the memory controllers on the same SNC node.

Resctrl monitoring on an Intel system depends upon attaching RMIDs to tasks
to track L3 cache occupancy and memory bandwidth. There is an MSR that
controls how the RMIDs are shared between SNC nodes.

The default mode divides them numerically. E.g. when there are two SNC
nodes on a socket the lower number half of the RMIDs are given to the
first node, the remainder to the second node. This would be difficult
to use with the Linux resctrl interface as specific RMID values assigned
to resctrl groups are not visible to users.

The other mode divides the RMIDs and renumbers the ones on the second
SNC node to start from zero.

Even with this renumbering SNC mode requires several changes in resctrl
behavior for correct operation.

Add a global integer "snc_nodes_per_l3_cache" that will show how many
SNC nodes share each L3 cache. When this is "1", SNC mode is either
not implemented, or not enabled, but all places that need to check
it are updated to take appropriate action when SNC mode is enabled.

Code that needs to take action when SNC is enabled is:
1) The number of logical RMIDs per L3 cache available for use is the
number of physical RMIDs divided by the number of SNC nodes.
2) Likewise the "mon_scale" value must be divided by the number of SNC
nodes.
3) The RMID renumbering operates when using the value from the
IA32_PQR_ASSOC MSR to count accesses by a task. When reading an RMID
counter, code must adjust from the logical RMID used to the physical
RMID value for the SNC node that it wishes to read and load the
adjusted value into the IA32_QM_EVTSEL MSR.
4) The L3 cache is divided between the SNC nodes. So the value
reported in the resctrl "size" file is divided by the number of SNC
nodes because the effective amount of cache that can be allocated
is reduced by that factor.
5) The "-o mba_MBps" mount option must be disabled in SNC mode
because the monitoring is being done per SNC node, while the
bandwidth allocation is still done at the L3 cache scope.
Trying to use this feedback loop might result in contradictory
changes to the throttling level coming from each of the SNC
node bandwidth measurements.

Signed-off-by: Tony Luck <[email protected]>
---
Changes since v9
Fixed missing word s/monitoring on Intel/monitoring on an Intel/
Deleted "A later patch" paragraph.
Expanded description how how values are "adjusted" for mon_scale
and cache size.
Changed type of "snc_nodes_per_l3_cache" to "unsigned int".

arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
arch/x86/kernel/cpu/resctrl/core.c | 6 ++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 16 +++++++++++++---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 5 +++--
4 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index ce3a70657842..e7a75a439c16 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -446,6 +446,8 @@ DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);

extern struct dentry *debugfs_resctrl;

+extern unsigned int snc_nodes_per_l3_cache;
+
enum resctrl_res_level {
RDT_RESOURCE_L3,
RDT_RESOURCE_L2,
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index d2c1aa8411a3..97d2a5a7dd41 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -48,6 +48,12 @@ int max_name_width, max_data_width;
*/
bool rdt_alloc_capable;

+/*
+ * Number of SNC nodes that share each L3 cache. Default is 1 for
+ * systems that do not support SNC, or have SNC disabled.
+ */
+unsigned int snc_nodes_per_l3_cache = 1;
+
static void
mba_wrmsr_intel(struct rdt_ctrl_domain *d, struct msr_param *m,
struct rdt_resource *r);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 4e145f5620b0..30b7c3b9b517 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -148,8 +148,18 @@ static inline struct rmid_entry *__rmid_entry(u32 rmid)

static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ int cpu = smp_processor_id();
+ int rmid_offset = 0;
u64 msr_val;

+ /*
+ * When SNC mode is on, need to compute the offset to read the
+ * physical RMID counter for the node to which this CPU belongs.
+ */
+ if (snc_nodes_per_l3_cache > 1)
+ rmid_offset = (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->num_rmid;
+
/*
* As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
* with a valid event code for supported resource type and the bits
@@ -158,7 +168,7 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
* IA32_QM_CTR.Error (bit 63) and IA32_QM_CTR.Unavailable (bit 62)
* are error bits.
*/
- wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
+ wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid + rmid_offset);
rdmsrl(MSR_IA32_QM_CTR, msr_val);

if (msr_val & RMID_VAL_ERROR)
@@ -783,8 +793,8 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
int ret;

resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
- hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
- r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1;
+ hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_nodes_per_l3_cache;
+ r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;

if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 21bbd832f3f2..79d57dade568 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1425,7 +1425,7 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
}
}

- return size;
+ return size / snc_nodes_per_l3_cache;
}

/*
@@ -2298,7 +2298,8 @@ static bool supports_mba_mbps(void)
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;

return (is_mbm_local_enabled() &&
- r->alloc_capable && is_mba_linear());
+ r->alloc_capable && is_mba_linear() &&
+ snc_nodes_per_l3_cache == 1);
}

/*
--
2.41.0

2023-10-31 21:18:39

by Tony Luck

[permalink] [raw]
Subject: [PATCH v10 7/8] x86/resctrl: Sub NUMA Cluster detection and enable

There isn't a simple hardware bit that indicates whether a CPU is
running in Sub NUMA Cluster (SNC) mode. Infer the state by comparing
the ratio of NUMA nodes to L3 cache instances.

When SNC mode is detected, reconfigure the RMID counters by updating
the MSR_RMID_SNC_CONFIG MSR on each socket as CPUs are seen.

Clearing bit zero of the MSR divides the RMIDs and renumbers the ones
on the second SNC node to start from zero.

Signed-off-by: Tony Luck <[email protected]>
---
Changes since v9
Expand h/w to hardware (commit and code comments)
Remove "earlier commit" reference
s/counnter/counter/
Check for offline CPUs and warn user SNC detection may be broken.

arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/core.c | 100 ++++++++++++++++++++++++++++-
2 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e3fa9cecd599..4285a5ee81fe 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1109,6 +1109,7 @@
#define MSR_IA32_QM_CTR 0xc8e
#define MSR_IA32_PQR_ASSOC 0xc8f
#define MSR_IA32_L3_CBM_BASE 0xc90
+#define MSR_RMID_SNC_CONFIG 0xca0
#define MSR_IA32_L2_CBM_BASE 0xd10
#define MSR_IA32_MBA_THRTL_BASE 0xd50

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 97d2a5a7dd41..034f9797e1fb 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -16,11 +16,14 @@

#define pr_fmt(fmt) "resctrl: " fmt

+#include <linux/cpu.h>
#include <linux/slab.h>
#include <linux/err.h>
#include <linux/cacheinfo.h>
#include <linux/cpuhotplug.h>
+#include <linux/mod_devicetable.h>

+#include <asm/cpu_device_id.h>
#include <asm/intel-family.h>
#include <asm/resctrl.h>
#include "internal.h"
@@ -184,10 +187,10 @@ bool is_mba_sc(struct rdt_resource *r)

/*
* rdt_get_mb_table() - get a mapping of bandwidth(b/w) percentage values
- * exposed to user interface and the h/w understandable delay values.
+ * exposed to user interface and the hardware understandable delay values.
*
* The non-linear delay values have the granularity of power of two
- * and also the h/w does not guarantee a curve for configured delay
+ * and also the hardware does not guarantee a curve for configured delay
* values vs. actual b/w enforced.
* Hence we need a mapping that is pre calibrated so the user can
* express the memory b/w as a percentage value.
@@ -738,11 +741,42 @@ static void clear_closid_rmid(int cpu)
wrmsr(MSR_IA32_PQR_ASSOC, 0, 0);
}

+/*
+ * The power-on reset value of MSR_RMID_SNC_CONFIG is 0x1
+ * which indicates that RMIDs are configured in legacy mode.
+ * This mode is incompatible with Linux resctrl semantics
+ * as RMIDs are partitioned between SNC nodes, which requires
+ * a user to know which RMID is allocated to a task.
+ * Clearing bit 0 reconfigures the RMID counters for use
+ * in Sub NUMA Cluster mode. This mode is better for Linux.
+ * The RMID space is divided between all SNC nodes with the
+ * RMIDs renumbered to start from zero in each node when
+ * couning operations from tasks. Code to read the counters
+ * must adjust RMID counter numbers based on SNC node. See
+ * __rmid_read() for code that does this.
+ */
+static void snc_remap_rmids(int cpu)
+{
+ u64 val;
+
+ /* Only need to enable once per package. */
+ if (cpumask_first(topology_core_cpumask(cpu)) != cpu)
+ return;
+
+ rdmsrl(MSR_RMID_SNC_CONFIG, val);
+ val &= ~BIT_ULL(0);
+ wrmsrl(MSR_RMID_SNC_CONFIG, val);
+}
+
static int resctrl_online_cpu(unsigned int cpu)
{
struct rdt_resource *r;

mutex_lock(&rdtgroup_mutex);
+
+ if (snc_nodes_per_l3_cache > 1)
+ snc_remap_rmids(cpu);
+
for_each_capable_rdt_resource(r)
domain_add_cpu(cpu, r);
/* The cpu is set in default rdtgroup after online. */
@@ -997,11 +1031,73 @@ static __init bool get_rdt_resources(void)
return (rdt_mon_capable || rdt_alloc_capable);
}

+/* CPU models that support MSR_RMID_SNC_CONFIG */
+static const struct x86_cpu_id snc_cpu_ids[] __initconst = {
+ X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(GRANITERAPIDS_X, 0),
+ {}
+};
+
+/*
+ * There isn't a simple hardware bit that indicates whether a CPU is running
+ * in Sub NUMA Cluster (SNC) mode. Infer the state by comparing the
+ * ratio of NUMA nodes to L3 cache instances.
+ * It is not possible to accurately determine SNC state if the system is
+ * booted with a maxcpus=N parameter. That distorts the ratio of SNC nodes
+ * to L3 caches. It will be OK if system is booted with hyperthreading
+ * disabled (since this doesn't affect the ratio).
+ */
+static __init int snc_get_config(void)
+{
+ unsigned long *node_caches;
+ int mem_only_nodes = 0;
+ int cpu, node, ret;
+ int num_l3_caches;
+
+ if (!x86_match_cpu(snc_cpu_ids))
+ return 1;
+
+ node_caches = bitmap_zalloc(nr_node_ids, GFP_KERNEL);
+ if (!node_caches)
+ return 1;
+
+ cpus_read_lock();
+
+ if (num_online_cpus() != num_present_cpus())
+ pr_warn("Some CPUs offline, SNC detection may be incorrect\n");
+
+ for_each_node(node) {
+ cpu = cpumask_first(cpumask_of_node(node));
+ if (cpu < nr_cpu_ids)
+ set_bit(get_cpu_cacheinfo_id(cpu, 3), node_caches);
+ else
+ mem_only_nodes++;
+ }
+ cpus_read_unlock();
+
+ num_l3_caches = bitmap_weight(node_caches, nr_node_ids);
+ kfree(node_caches);
+
+ if (!num_l3_caches)
+ return 1;
+
+ ret = (nr_node_ids - mem_only_nodes) / num_l3_caches;
+
+ if (ret > 1)
+ rdt_resources_all[RDT_RESOURCE_L3].r_resctrl.mon_scope = RESCTRL_NODE;
+
+ return ret;
+}
+
static __init void rdt_init_res_defs_intel(void)
{
struct rdt_hw_resource *hw_res;
struct rdt_resource *r;

+ snc_nodes_per_l3_cache = snc_get_config();
+
for_each_rdt_resource(r) {
hw_res = resctrl_to_arch_res(r);

--
2.41.0

2023-11-01 19:54:17

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v10 3/8] x86/resctrl: Prepare for different scope for control/monitor operations

+/*
+ * domain_add_cpu - Add a cpu to either/both resource's domain lists.
+ */
+static void domain_add_cpu(int cpu, struct rdt_resource *r)

Bother. Missed one comment that needs s/cpu/CPU/

-Tony

2023-11-07 00:31:58

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v10 1/8] x86/resctrl: Prepare for new domain scope

Hi Tony,

On 10/31/2023 2:17 PM, Tony Luck wrote:
> Resctrl resources operate on subsets of CPUs in the system with the
> defining attribute of each subset being an instance of a particular
> level of cache. E.g. all CPUs sharing an L3 cache would be part of the
> same domain.
>
> In preparation for features that are scoped at the NUMA node level
> change the code from explicit references to "cache_level" to a more
> generic scope. At this point the only options for this scope are groups
> of CPUs that share an L2 cache or L3 cache.
>
> Clean up the error handling when looking up domains. Report invalid id's
> before calling rdt_find_domain() in preparation for better messages when
> scope can be other than cache scope. This means that rdt_find_domain()
> will never return an error. So remove checks for error from the callsites.
>
> Signed-off-by: Tony Luck <[email protected]>
> ---
> Changes since v9
> New test for invalid domain id before calling rdt_find_domain() means that
> error handling in that function and at all call-sites can be simplified.

These changes do not appear to be consistent in this series. Simplifying the
call-sites is indeed done in this patch but this work seems to be undone in
patch 3 where it reverts back to the previous error handling in
domain_add_cpu_mon(), domain_remove_cpu_ctrl(), and domain_remove_cpu_mon().

> In pseudo_lock_region_init() use the new enum resctrl_scope for local variable.
>
> include/linux/resctrl.h | 9 +++--
> arch/x86/kernel/cpu/resctrl/core.c | 40 +++++++++++++++--------
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 2 +-
> arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 6 +++-
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 5 ++-
> 5 files changed, 44 insertions(+), 18 deletions(-)
>

...

> @@ -506,17 +516,18 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
> */
> static void domain_add_cpu(int cpu, struct rdt_resource *r)
> {
> - int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
> + int id = get_domain_id_from_scope(cpu, r->scope);
> struct list_head *add_pos = NULL;
> struct rdt_hw_domain *hw_dom;
> struct rdt_domain *d;
> int err;
>
> - d = rdt_find_domain(r, id, &add_pos);
> - if (IS_ERR(d)) {
> - pr_warn("Couldn't find cache id for CPU %d\n", cpu);
> + if (id < 0) {
> + pr_warn_once("Can't find domain id for CPU:%d scope:%d for resource %s\n",
> + cpu, r->scope, r->name);
> return;
> }

Please add empty line here ...

> + d = rdt_find_domain(r, id, &add_pos);
>

... and remove this empty line.

> if (d) {
> cpumask_set_cpu(cpu, &d->cpu_mask);
> @@ -556,12 +567,15 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
>
> static void domain_remove_cpu(int cpu, struct rdt_resource *r)
> {
> - int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
> + int id = get_domain_id_from_scope(cpu, r->scope);
> struct rdt_hw_domain *hw_dom;
> struct rdt_domain *d;
>
> + if (id < 0)
> + return;
> +
> d = rdt_find_domain(r, id, NULL);
> - if (IS_ERR_OR_NULL(d)) {
> + if (!d) {
> pr_warn("Couldn't find cache id for CPU %d\n", cpu);

This error message is no longer accurate.

> return;
> }


Reinette

2023-11-07 00:33:08

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v10 3/8] x86/resctrl: Prepare for different scope for control/monitor operations

Hi Tony,

On 10/31/2023 2:17 PM, Tony Luck wrote:
> Resctrl assumes that control and monitor operations on a resource are
> performed at the same scope.
>
> Prepare for systems that use different scope (specifically Intel needs
> to split the RDT_RESOURCE_L3 resource to use L3 scope for cache control
> and NODE scope for cache occupancy and memory bandwidth monitoring).
>
> Create separate domain lists for control and monitor operations.
>
> Note that errors during initialization of either control or monitor
> functions on a domain would previously result in that domain being
> excluded from both control and monitor operations. Now the domains are
> allocated independently it is no longer required to disable both control
> and monitor operations if either fail.
>
> Signed-off-by: Tony Luck <[email protected]>
> ---
> Changes since v9
> Fix commit to be specific the only the RDT_RESOURCE_L3 resource is going
> to have different monitor and control scope.
> Rename get_domain_from_cpu() -> get_ctrl_domain_from_cpu()
> Rewrite comment for rdt_find_domains().
> Add "type" field to rdt_domain_hdr structure.
> Delete the /* RDT_RESOURCE_MBA is never mon_capable */ comment.
>
> include/linux/resctrl.h | 25 ++-
> arch/x86/kernel/cpu/resctrl/internal.h | 6 +-
> arch/x86/kernel/cpu/resctrl/core.c | 206 ++++++++++++++++------
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 12 +-
> arch/x86/kernel/cpu/resctrl/monitor.c | 4 +-
> arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 4 +-
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 55 +++---
> 7 files changed, 218 insertions(+), 94 deletions(-)
>
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index c4067150a6b7..35e700edc6e6 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -52,15 +52,22 @@ struct resctrl_staged_config {
> bool have_new_ctrl;
> };
>
> +enum resctrl_domain_type {
> + RESCTRL_CTRL_DOMAIN,
> + RESCTRL_MON_DOMAIN,
> +};
> +
> /**
> * struct rdt_domain_hdr - common header for different domain types
> * @list: all instances of this resource
> * @id: unique id for this instance
> + * @type: type of this instance
> * @cpu_mask: which CPUs share this resource
> */
> struct rdt_domain_hdr {
> struct list_head list;
> int id;
> + enum resctrl_domain_type type;
> struct cpumask cpu_mask;
> };
>
> @@ -163,10 +170,12 @@ enum resctrl_scope {
> * @alloc_capable: Is allocation available on this machine
> * @mon_capable: Is monitor feature available on this machine
> * @num_rmid: Number of RMIDs available
> - * @scope: Scope of this resource
> + * @ctrl_scope: Scope of this resource for control functions
> + * @mon_scope: Scope of this resource for monitor functions
> * @cache: Cache allocation related data
> * @membw: If the component has bandwidth controls, their properties.
> - * @domains: All domains for this resource
> + * @ctrl_domains: Control domains for this resource
> + * @mon_domains: Monitor domains for this resource
> * @name: Name to use in "schemata" file.
> * @data_width: Character width of data when displaying
> * @default_ctrl: Specifies default cache cbm or memory B/W percent.
> @@ -181,10 +190,12 @@ struct rdt_resource {
> bool alloc_capable;
> bool mon_capable;
> int num_rmid;
> - enum resctrl_scope scope;
> + enum resctrl_scope ctrl_scope;
> + enum resctrl_scope mon_scope;
> struct resctrl_cache cache;
> struct resctrl_membw membw;
> - struct list_head domains;
> + struct list_head ctrl_domains;
> + struct list_head mon_domains;
> char *name;
> int data_width;
> u32 default_ctrl;
> @@ -230,8 +241,10 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
>
> u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
> u32 closid, enum resctrl_conf_type type);
> -int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
> -void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
> +int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
> +int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
> +void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_domain *d);
> +void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain *d);
>
> /**
> * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index a4f1aa15f0a2..24bf9d7989a9 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -520,8 +520,8 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn);
> int rdtgroup_kn_mode_restrict(struct rdtgroup *r, const char *name);
> int rdtgroup_kn_mode_restore(struct rdtgroup *r, const char *name,
> umode_t mask);
> -struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
> - struct list_head **pos);
> +struct rdt_domain_hdr *rdt_find_domain(struct list_head *h, int id,
> + struct list_head **pos);
> ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
> char *buf, size_t nbytes, loff_t off);
> int rdtgroup_schemata_show(struct kernfs_open_file *of,
> @@ -540,7 +540,7 @@ int rdt_pseudo_lock_init(void);
> void rdt_pseudo_lock_release(void);
> int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp);
> void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp);
> -struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
> +struct rdt_domain *get_ctrl_domain_from_cpu(int cpu, struct rdt_resource *r);
> int closids_supported(void);
> void closid_free(int closid);
> int alloc_rmid(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index c26ecb2e415f..8dc2cb49358e 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -57,7 +57,8 @@ static void
> mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
> struct rdt_resource *r);
>
> -#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.domains)
> +#define ctrl_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.ctrl_domains)
> +#define mon_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.mon_domains)
>
> struct rdt_hw_resource rdt_resources_all[] = {
> [RDT_RESOURCE_L3] =
> @@ -65,8 +66,10 @@ struct rdt_hw_resource rdt_resources_all[] = {
> .r_resctrl = {
> .rid = RDT_RESOURCE_L3,
> .name = "L3",
> - .scope = RESCTRL_L3_CACHE,
> - .domains = domain_init(RDT_RESOURCE_L3),
> + .ctrl_scope = RESCTRL_L3_CACHE,
> + .mon_scope = RESCTRL_L3_CACHE,
> + .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_L3),
> + .mon_domains = mon_domain_init(RDT_RESOURCE_L3),
> .parse_ctrlval = parse_cbm,
> .format_str = "%d=%0*x",
> .fflags = RFTYPE_RES_CACHE,
> @@ -79,8 +82,8 @@ struct rdt_hw_resource rdt_resources_all[] = {
> .r_resctrl = {
> .rid = RDT_RESOURCE_L2,
> .name = "L2",
> - .scope = RESCTRL_L2_CACHE,
> - .domains = domain_init(RDT_RESOURCE_L2),
> + .ctrl_scope = RESCTRL_L2_CACHE,
> + .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_L2),
> .parse_ctrlval = parse_cbm,
> .format_str = "%d=%0*x",
> .fflags = RFTYPE_RES_CACHE,
> @@ -93,8 +96,8 @@ struct rdt_hw_resource rdt_resources_all[] = {
> .r_resctrl = {
> .rid = RDT_RESOURCE_MBA,
> .name = "MB",
> - .scope = RESCTRL_L3_CACHE,
> - .domains = domain_init(RDT_RESOURCE_MBA),
> + .ctrl_scope = RESCTRL_L3_CACHE,
> + .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_MBA),
> .parse_ctrlval = parse_bw,
> .format_str = "%d=%*u",
> .fflags = RFTYPE_RES_MB,
> @@ -105,8 +108,8 @@ struct rdt_hw_resource rdt_resources_all[] = {
> .r_resctrl = {
> .rid = RDT_RESOURCE_SMBA,
> .name = "SMBA",
> - .scope = RESCTRL_L3_CACHE,
> - .domains = domain_init(RDT_RESOURCE_SMBA),
> + .ctrl_scope = RESCTRL_L3_CACHE,
> + .ctrl_domains = ctrl_domain_init(RDT_RESOURCE_SMBA),
> .parse_ctrlval = parse_bw,
> .format_str = "%d=%*u",
> .fflags = RFTYPE_RES_MB,
> @@ -352,11 +355,11 @@ cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
> wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
> }
>
> -struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
> +struct rdt_domain *get_ctrl_domain_from_cpu(int cpu, struct rdt_resource *r)
> {
> struct rdt_domain *d;
>
> - list_for_each_entry(d, &r->domains, hdr.list) {
> + list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
> /* Find the domain that contains this CPU */
> if (cpumask_test_cpu(cpu, &d->hdr.cpu_mask))
> return d;
> @@ -378,7 +381,7 @@ void rdt_ctrl_update(void *arg)
> int cpu = smp_processor_id();
> struct rdt_domain *d;
>
> - d = get_domain_from_cpu(cpu, r);
> + d = get_ctrl_domain_from_cpu(cpu, r);
> if (d) {
> hw_res->msr_update(d, m, r);
> return;
> @@ -388,26 +391,26 @@ void rdt_ctrl_update(void *arg)
> }
>
> /*
> - * rdt_find_domain - Find a domain in a resource that matches input resource id
> + * rdt_find_domain - Search for a domain id in a resource domain list.
> *> - * Search resource r's domain list to find the resource id. If the resource
> - * id is found in a domain, return the domain. Otherwise, if requested by
> - * caller, return the first domain whose id is bigger than the input id.
> + * Search the list to find the resource id. If the domain id is found

This still talks about searching for a "resource id". And the "otherwise"
is not accurate ... it returns NULL if domain id cannot be found. The
"otherwise" refers to a value returned in a parameter, not the function
return value.

How about:
Search the domain list to find the domain id. If the domain id
is found, return the domain. NULL otherwise.
If the domain id is not found (and NULL returned) then the first
domain with id bigger than the input id can be returned to the
caller via @pos.

Please feel free to improve.

> + * in a domain, return the domain. Otherwise, if requested by caller,
> + * return the first domain whose id is bigger than the input id.
> * The domain list is sorted by id in ascending order.
> */
> -struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
> - struct list_head **pos)
> +struct rdt_domain_hdr *rdt_find_domain(struct list_head *h, int id,
> + struct list_head **pos)
> {
> - struct rdt_domain *d;
> + struct rdt_domain_hdr *d;
> struct list_head *l;
>
> - list_for_each(l, &r->domains) {
> - d = list_entry(l, struct rdt_domain, hdr.list);
> + list_for_each(l, h) {
> + d = list_entry(l, struct rdt_domain_hdr, list);
> /* When id is found, return its domain. */
> - if (id == d->hdr.id)
> + if (id == d->id)
> return d;
> /* Stop searching when finding id's position in sorted list. */
> - if (id < d->hdr.id)
> + if (id < d->id)
> break;
> }
>
> @@ -501,35 +504,29 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
> return -EINVAL;
> }
>
> -/*
> - * domain_add_cpu - Add a cpu to a resource's domain list.
> - *
> - * If an existing domain in the resource r's domain list matches the cpu's
> - * resource id, add the cpu in the domain.
> - *
> - * Otherwise, a new domain is allocated and inserted into the right position
> - * in the domain list sorted by id in ascending order.
> - *
> - * The order in the domain list is visible to users when we print entries
> - * in the schemata file and schemata input is validated to have the same order
> - * as this list.
> - */
> -static void domain_add_cpu(int cpu, struct rdt_resource *r)
> +static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
> {
> - int id = get_domain_id_from_scope(cpu, r->scope);
> + int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
> struct list_head *add_pos = NULL;
> struct rdt_hw_domain *hw_dom;
> + struct rdt_domain_hdr *hdr;
> struct rdt_domain *d;
> int err;
>
> if (id < 0) {
> - pr_warn_once("Can't find domain id for CPU:%d scope:%d for resource %s\n",
> - cpu, r->scope, r->name);
> + pr_warn_once("Can't find control domain id for CPU:%d scope:%d for resource %s\n",
> + cpu, r->ctrl_scope, r->name);
> return;
> }
> - d = rdt_find_domain(r, id, &add_pos);
>
> - if (d) {
> + hdr = rdt_find_domain(&r->ctrl_domains, id, &add_pos);
> +

Please remove empty line.

> + if (hdr) {
> + if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
> + return;
> +
> + d = container_of(hdr, struct rdt_domain, hdr);
> +
> cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
> if (r->cache.arch_has_per_cpu_cfg)
> rdt_domain_reconfigure_cdp(r);
> @@ -542,48 +539,115 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
>
> d = &hw_dom->d_resctrl;
> d->hdr.id = id;
> + d->hdr.type = RESCTRL_CTRL_DOMAIN;
> cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
>
> rdt_domain_reconfigure_cdp(r);
>
> - if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
> + if (domain_setup_ctrlval(r, d)) {
> domain_free(hw_dom);
> return;
> }
>
> - if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
> + list_add_tail(&d->hdr.list, add_pos);
> +
> + err = resctrl_online_ctrl_domain(r, d);
> + if (err) {
> + list_del(&d->hdr.list);
> + domain_free(hw_dom);
> + }
> +}
> +
> +static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
> +{
> + int id = get_domain_id_from_scope(cpu, r->mon_scope);
> + struct list_head *add_pos = NULL;
> + struct rdt_hw_domain *hw_dom;
> + struct rdt_domain_hdr *hdr;
> + struct rdt_domain *d;
> + int err;
> +
> + if (id < 0) {
> + pr_warn_once("Can't find monitor domain id for CPU:%d scope:%d for resource %s\n",
> + cpu, r->mon_scope, r->name);
> + return;
> + }
> +
> + hdr = rdt_find_domain(&r->mon_domains, id, &add_pos);
> + if (IS_ERR(hdr)) {

Can this happen with rdt_find_domain() no longer returning an error?

It does not seem as though changes were made consistently to this series.

> + pr_warn("Couldn't find monitor scope id=%d for CPU %d\n", id, cpu);
> + return;
> + }
> +
> + if (hdr) {
> + if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
> + return;
> +
> + d = container_of(hdr, struct rdt_domain, hdr);
> +
> + cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
> + return;
> + }
> +
> + hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu));
> + if (!hw_dom)
> + return;
> +
> + d = &hw_dom->d_resctrl;
> + d->hdr.id = id;
> + d->hdr.type = RESCTRL_MON_DOMAIN;
> + cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
> +
> + if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
> domain_free(hw_dom);
> return;
> }
>
> list_add_tail(&d->hdr.list, add_pos);
>
> - err = resctrl_online_domain(r, d);
> + err = resctrl_online_mon_domain(r, d);
> if (err) {
> list_del(&d->hdr.list);
> domain_free(hw_dom);
> }
> }
>
> -static void domain_remove_cpu(int cpu, struct rdt_resource *r)
> +/*
> + * domain_add_cpu - Add a cpu to either/both resource's domain lists.
> + */
> +static void domain_add_cpu(int cpu, struct rdt_resource *r)
> +{
> + if (r->alloc_capable)
> + domain_add_cpu_ctrl(cpu, r);
> + if (r->mon_capable)
> + domain_add_cpu_mon(cpu, r);
> +}
> +
> +static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
> {
> - int id = get_domain_id_from_scope(cpu, r->scope);
> + int id = get_domain_id_from_scope(cpu, r->ctrl_scope);
> struct rdt_hw_domain *hw_dom;
> + struct rdt_domain_hdr *hdr;
> struct rdt_domain *d;
>
> if (id < 0)
> return;
>
> - d = rdt_find_domain(r, id, NULL);
> - if (!d) {
> - pr_warn("Couldn't find cache id for CPU %d\n", cpu);
> + hdr = rdt_find_domain(&r->ctrl_domains, id, NULL);
> + if (IS_ERR_OR_NULL(hdr)) {
> + pr_warn("Couldn't find control scope id=%d for CPU %d\n", id, cpu);

Why did the !d error checking transition to IS_ERR_OR_NULL() here?

Also, the error message does not sound reasonable for what can be encountered
at this point.

> return;
> }
> +
> + if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
> + return;
> +
> + d = container_of(hdr, struct rdt_domain, hdr);
> hw_dom = resctrl_to_arch_dom(d);
>
> cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
> if (cpumask_empty(&d->hdr.cpu_mask)) {
> - resctrl_offline_domain(r, d);
> + resctrl_offline_ctrl_domain(r, d);
> list_del(&d->hdr.list);
>
> /*
> @@ -596,6 +660,38 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
>
> return;
> }
> +}
> +
> +static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
> +{
> + int id = get_domain_id_from_scope(cpu, r->mon_scope);
> + struct rdt_hw_domain *hw_dom;
> + struct rdt_domain_hdr *hdr;
> + struct rdt_domain *d;
> +
> + if (id < 0)
> + return;
> +
> + hdr = rdt_find_domain(&r->mon_domains, id, NULL);
> + if (IS_ERR_OR_NULL(hdr)) {
> + pr_warn("Couldn't find scope id=%d for CPU %d\n", id, cpu);

same here

> + return;
> + }
> +
> + if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
> + return;
> +
> + d = container_of(hdr, struct rdt_domain, hdr);
> + hw_dom = resctrl_to_arch_dom(d);
> +
> + cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
> + if (cpumask_empty(&d->hdr.cpu_mask)) {
> + resctrl_offline_mon_domain(r, d);
> + list_del(&d->hdr.list);
> + domain_free(hw_dom);
> +
> + return;
> + }
>
> if (r == &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl) {
> if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {


Reinette

2023-11-07 00:33:10

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v10 4/8] x86/resctrl: Split the rdt_domain and rdt_hw_domain structures

Hi Tony,

On 10/31/2023 2:17 PM, Tony Luck wrote:
> The same rdt_domain structure is used for both control and monitor
> functions. But this results in wasted memory as some of the fields are
> only used by control functions, while most are only used for monitor
> functions.
>
> Split into separate rdt_ctrl_domain and rdt_mon_domain structures with
> just the fields required for control and monitoring respectively.
>
> Similar split of the rdt_hw_domain structure into rdt_hw_ctrl_domain
> and rdt_hw_mon_domain.
>
> Signed-off-by: Tony Luck <[email protected]>
> ---
> Changes since v9
> Comment against patch 4, but now fixed in patch #2. cpu_mask
> is included in common header.
>
> include/linux/resctrl.h | 50 +++++++------
> arch/x86/kernel/cpu/resctrl/internal.h | 60 ++++++++++------
> arch/x86/kernel/cpu/resctrl/core.c | 87 ++++++++++++-----------
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 32 ++++-----
> arch/x86/kernel/cpu/resctrl/monitor.c | 40 +++++------
> arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 6 +-
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 62 ++++++++--------
> 7 files changed, 184 insertions(+), 153 deletions(-)
>
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 35e700edc6e6..36503e8870cd 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -72,7 +72,25 @@ struct rdt_domain_hdr {
> };
>
> /**
> - * struct rdt_domain - group of CPUs sharing a resctrl resource
> + * struct rdt_ctrl_domain - group of CPUs sharing a resctrl control resource
> + * @hdr: common header for different domain types
> + * @cpu_mask: which CPUs share this resource
> + * @plr: pseudo-locked region (if any) associated with domain
> + * @staged_config: parsed configuration to be applied
> + * @mbps_val: When mba_sc is enabled, this holds the array of user
> + * specified control values for mba_sc in MBps, indexed
> + * by closid
> + */
> +struct rdt_ctrl_domain {
> + struct rdt_domain_hdr hdr;
> + struct cpumask cpu_mask;

This patch did not change what it said it changed.

Reinette

2023-11-07 00:33:33

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v10 7/8] x86/resctrl: Sub NUMA Cluster detection and enable

Hi Tony,

On 10/31/2023 2:17 PM, Tony Luck wrote:
> There isn't a simple hardware bit that indicates whether a CPU is
> running in Sub NUMA Cluster (SNC) mode. Infer the state by comparing
> the ratio of NUMA nodes to L3 cache instances.
>
> When SNC mode is detected, reconfigure the RMID counters by updating
> the MSR_RMID_SNC_CONFIG MSR on each socket as CPUs are seen.
>
> Clearing bit zero of the MSR divides the RMIDs and renumbers the ones
> on the second SNC node to start from zero.
>
> Signed-off-by: Tony Luck <[email protected]>
> ---
> Changes since v9
> Expand h/w to hardware (commit and code comments)
> Remove "earlier commit" reference
> s/counnter/counter/
> Check for offline CPUs and warn user SNC detection may be broken.
>
> arch/x86/include/asm/msr-index.h | 1 +
> arch/x86/kernel/cpu/resctrl/core.c | 100 ++++++++++++++++++++++++++++-
> 2 files changed, 99 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index e3fa9cecd599..4285a5ee81fe 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1109,6 +1109,7 @@
> #define MSR_IA32_QM_CTR 0xc8e
> #define MSR_IA32_PQR_ASSOC 0xc8f
> #define MSR_IA32_L3_CBM_BASE 0xc90
> +#define MSR_RMID_SNC_CONFIG 0xca0
> #define MSR_IA32_L2_CBM_BASE 0xd10
> #define MSR_IA32_MBA_THRTL_BASE 0xd50
>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 97d2a5a7dd41..034f9797e1fb 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -16,11 +16,14 @@
>
> #define pr_fmt(fmt) "resctrl: " fmt
>
> +#include <linux/cpu.h>
> #include <linux/slab.h>
> #include <linux/err.h>
> #include <linux/cacheinfo.h>
> #include <linux/cpuhotplug.h>
> +#include <linux/mod_devicetable.h>
>
> +#include <asm/cpu_device_id.h>
> #include <asm/intel-family.h>
> #include <asm/resctrl.h>
> #include "internal.h"
> @@ -184,10 +187,10 @@ bool is_mba_sc(struct rdt_resource *r)
>
> /*
> * rdt_get_mb_table() - get a mapping of bandwidth(b/w) percentage values
> - * exposed to user interface and the h/w understandable delay values.
> + * exposed to user interface and the hardware understandable delay values.
> *
> * The non-linear delay values have the granularity of power of two
> - * and also the h/w does not guarantee a curve for configured delay
> + * and also the hardware does not guarantee a curve for configured delay
> * values vs. actual b/w enforced.
> * Hence we need a mapping that is pre calibrated so the user can
> * express the memory b/w as a percentage value.

This seems out of place here. If you want to make such a global change
it can be done as a separate patch. For this work it can just consistently
use "hardware" in own areas changed.

Reinette