2024-03-27 20:06:20

by Luck, Tony

[permalink] [raw]
Subject: [PATCH 00/10] Add support for Sub-NUMA cluster (SNC) systems

This series on top of v6.9-rc1 plus these two patches:

Link: https://lore.kernel.org/all/[email protected]/

The Sub-NUMA cluster feature on some Intel processors partitions the CPUs
that share an L3 cache into two or more sets. This plays havoc with the
Resource Director Technology (RDT) monitoring features. Prior to this
patch Intel has advised that SNC and RDT are incompatible.

Some of these CPU support an MSR that can partition the RMID counters in
the same way. This allows monitoring features to be used. With the caveat
that users must be aware that Linux may migrate tasks more frequently
between SNC nodes than between "regular" NUMA nodes, so reading counters
from all SNC nodes may be needed to get a complete picture of activity
for tasks.

Cache and memory bandwidth allocation features continue to operate at
the scope of the L3 cache.

This is a new approach triggered by the discussions that started with
"How can users tell that SNC is enabled?" but then drifted into
whether users of the legacy interface would really get what they
expected when reading from monitor files in the mon_L3_* directories.

During that discussion I'd mentioned providing monitor values for both
the L3 level, and also for each SNC node. That would provide full ABI
compatibility while also giving the finer grained reporting from each
SNC node.

Implementation sets up a new rdt_resource to track monitor resources
with domains for each SNC node. This resource is only used when SNC
mode is detected.

On SNC systems there is a parent-child relationship between the
old L3 resource and the new SUBL3 resource. Reading from legacy
files like mon_data/mon_L3_00/llc_occupancy reads and sums the RMID
counters from all "child" domains in the SUBL3 resource. E.g. on
an SNC3 system:

$ grep . mon_L3_01/llc_occupancy mon_L3_01/*/llc_occupancy
mon_L3_01/llc_occupancy:413097984
mon_L3_01/mon_SUBL3_03/llc_occupancy:141484032
mon_L3_01/mon_SUBL3_04/llc_occupancy:135659520
mon_L3_01/mon_SUBL3_05/llc_occupancy:135954432

So the L3 occupancy shows the total L3 occupancy which is
the sum of the cache occupancy on each of the SNC nodes
that share that L3 cache instance.

Patch 0001 has been salvaged from the previous postings.
All the rest are new.

Signed-off-by: Tony Luck <[email protected]>

Tony Luck (10):
x86/resctrl: Prepare for new domain scope
x86/resctrl: Add new rdt_resource for sub-node monitoring
x86/resctrl: Add new "enabled" state for monitor resources
x86/resctrl: Add pointer to enabled monitor resource
x86/resctrl: Add parent/child information to rdt_resource and
rdt_domain
x86/resctrl: Update mkdir_mondata_subdir() for Sub-NUMA domains
x86/resctrl: Update rmdir_mondata_subdir_allrdtgrp() for Sub-NUMA
domains
x86/resctrl: Mark L3 monitor files with summation flag.
x86/resctrl: Update __mon_event_count() for Sub-NUMA domains
x86/resctrl: Determine Sub-NUMA configuration

include/linux/resctrl.h | 20 ++-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 23 ++-
arch/x86/kernel/cpu/resctrl/core.c | 76 +++++++---
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 3 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 136 +++++++++++++++--
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 6 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 170 +++++++++++++++++-----
8 files changed, 364 insertions(+), 71 deletions(-)

--
2.44.0



2024-03-27 20:06:32

by Luck, Tony

[permalink] [raw]
Subject: [PATCH 06/10] x86/resctrl: Update mkdir_mondata_subdir() for Sub-NUMA domains

When Sub-NUMA domain mode is enabled all the monitoring happens
at the SUBL3 resource level. Changes are required to populate the
"mon_data" directories in the resctrl file system.

When making the first SUBL3 directory code must first create the
parent mon_L3_XX directory and fill it with files for each of
the monitoring functions. Subsequent SUBL3 creation requests will
simply use the directory that was created by the first request.

Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/kernel/cpu/resctrl/internal.h | 12 +++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 74 ++++++++++++++++++--------
2 files changed, 65 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 21d81f51838f..7f05502454c5 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -456,6 +456,18 @@ static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r
return container_of(r, struct rdt_hw_resource, r_resctrl);
}

+static inline struct rdt_domain *get_parent_domain(struct rdt_resource *r,
+ struct rdt_domain *d)
+{
+ struct rdt_domain *pd;
+
+ list_for_each_entry(pd, &r->parent->domains, list)
+ if (pd->id == d->parent)
+ return pd;
+
+ return NULL;
+}
+
int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,
struct rdt_domain *d);
int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f2af58a791a4..22effd8dc207 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3026,31 +3026,16 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
}
}

-static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
- struct rdt_domain *d,
- struct rdt_resource *r, struct rdtgroup *prgrp)
+static int mon_add_all_files(struct kernfs_node *kn, struct rdt_domain *d,
+ struct rdt_resource *r, struct rdtgroup *prgrp)
{
union mon_data_bits priv;
- struct kernfs_node *kn;
struct mon_evt *mevt;
struct rmid_read rr;
- char name[32];
int ret;

- sprintf(name, "mon_%s_%02d", r->name, d->id);
- /* create the directory */
- kn = kernfs_create_dir(parent_kn, name, parent_kn->mode, prgrp);
- if (IS_ERR(kn))
- return PTR_ERR(kn);
-
- ret = rdtgroup_kn_set_ugid(kn);
- if (ret)
- goto out_destroy;
-
- if (WARN_ON(list_empty(&r->evt_list))) {
- ret = -EPERM;
- goto out_destroy;
- }
+ if (WARN_ON(list_empty(&r->evt_list)))
+ return -EPERM;

priv.u.rid = r->rid;
priv.u.domid = d->id;
@@ -3058,11 +3043,58 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
priv.u.evtid = mevt->evtid;
ret = mon_addfile(kn, mevt->name, priv.priv);
if (ret)
- goto out_destroy;
+ return ret;

if (is_mbm_event(mevt->evtid))
mon_event_read(&rr, r, d, prgrp, mevt->evtid, true);
}
+
+ return 0;
+}
+
+static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
+ struct rdt_domain *d,
+ struct rdt_resource *r, struct rdtgroup *prgrp)
+{
+ struct kernfs_node *kn = parent_kn, *ckn;
+ struct rdt_domain *pd;
+ char name[32];
+ int ret;
+
+ if (r->parent) {
+ pd = get_parent_domain(r, d);
+ if (!pd)
+ return 0;
+ sprintf(name, "mon_%s_%02d", r->parent->name, pd->id);
+ kn = kernfs_find_and_get_ns(parent_kn, name, NULL);
+ if (kn) {
+ parent_kn = kn;
+ goto subdir;
+ }
+ kn = kernfs_create_dir(parent_kn, name, parent_kn->mode, prgrp);
+ if (IS_ERR(kn))
+ return PTR_ERR(kn);
+ ret = rdtgroup_kn_set_ugid(kn);
+ if (ret)
+ goto out_destroy;
+ ret = mon_add_all_files(kn, d, r, prgrp);
+ if (ret)
+ goto out_destroy;
+ }
+subdir:
+ sprintf(name, "mon_%s_%02d", r->name, d->id);
+ /* create the directory */
+ ckn = kernfs_create_dir(kn, name, parent_kn->mode, prgrp);
+ if (IS_ERR(ckn))
+ goto out_destroy;
+
+ ret = rdtgroup_kn_set_ugid(ckn);
+ if (ret)
+ goto out_destroy;
+
+ ret = mon_add_all_files(ckn, d, r, prgrp);
+ if (ret)
+ goto out_destroy;
kernfs_activate(kn);
return 0;

@@ -3078,8 +3110,8 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
struct rdt_domain *d)
{
- struct kernfs_node *parent_kn;
struct rdtgroup *prgrp, *crgrp;
+ struct kernfs_node *parent_kn;
struct list_head *head;

list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
--
2.44.0


2024-03-27 20:06:47

by Luck, Tony

[permalink] [raw]
Subject: [PATCH 04/10] x86/resctrl: Add pointer to enabled monitor resource

Sub-NUMA cluster enabled systems do all monitoring using the SUBL3
resource. Others use the regular L3 resource.

Replace the hard-coded rdt_resources_all[RDT_RESOURCE_L3].r_resctrl
references with a new pointer "rdt_l3_mon_resource".

Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/kernel/cpu/resctrl/internal.h | 1 +
arch/x86/kernel/cpu/resctrl/core.c | 2 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 19 ++++++++++++-------
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 7 +++----
4 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 5fcff861e185..21d81f51838f 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -464,6 +464,7 @@ int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
extern struct mutex rdtgroup_mutex;

extern struct rdt_hw_resource rdt_resources_all[];
+extern struct rdt_resource *rdt_l3_mon_resource;
extern struct rdtgroup rdtgroup_default;
extern struct dentry *debugfs_resctrl;

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index bfa179f20802..2fa04375da8c 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -996,7 +996,7 @@ late_initcall(resctrl_late_init);

static void __exit resctrl_exit(void)
{
- struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ struct rdt_resource *r = rdt_l3_mon_resource;

cpuhp_remove_state(rdt_online);

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 84a2056190c8..95455cb187eb 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -78,6 +78,9 @@ static struct rmid_entry *rmid_ptrs;
*/
bool rdt_mon_capable;

+/* Resource (L3 or SUBL3) that is base for monitoring */
+struct rdt_resource *rdt_l3_mon_resource;
+
/*
* Global to indicate which monitoring events are enabled.
*/
@@ -321,7 +324,7 @@ static void limbo_release_entry(struct rmid_entry *entry)
*/
void __check_limbo(struct rdt_domain *d, bool force_free)
{
- struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ struct rdt_resource *r = rdt_l3_mon_resource;
u32 idx_limit = resctrl_arch_system_num_rmid_idx();
struct rmid_entry *entry;
u32 idx, cur_idx = 1;
@@ -467,7 +470,7 @@ int alloc_rmid(u32 closid)

static void add_rmid_to_limbo(struct rmid_entry *entry)
{
- struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ struct rdt_resource *r = rdt_l3_mon_resource;
struct rdt_domain *d;
u32 idx;

@@ -839,7 +842,7 @@ void mbm_handle_overflow(struct work_struct *work)
if (!resctrl_mounted || !resctrl_arch_mon_capable())
goto out_unlock;

- r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ r = rdt_l3_mon_resource;
d = container_of(work, struct rdt_domain, mbm_over.work);

list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
@@ -1036,7 +1039,9 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
*/
resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(threshold);

- ret = dom_data_init(r);
+ rdt_l3_mon_resource = r;
+
+ ret = dom_data_init(rdt_l3_mon_resource);
if (ret)
return ret;

@@ -1057,10 +1062,10 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
}
}

- l3_mon_evt_init(r);
+ l3_mon_evt_init(rdt_l3_mon_resource);

- r->mon_capable = true;
- r->mon_enabled = true;
+ rdt_l3_mon_resource->mon_capable = true;
+ rdt_l3_mon_resource->mon_enabled = true;

return 0;
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index ffcafe567b25..a2ebd7e051bb 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2703,7 +2703,7 @@ static int rdt_get_tree(struct fs_context *fc)
resctrl_mounted = true;

if (is_mbm_enabled()) {
- r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ r = rdt_l3_mon_resource;
list_for_each_entry(dom, &r->domains, list)
mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL,
RESCTRL_PICK_ANY_CPU);
@@ -4062,7 +4062,6 @@ static void clear_childcpus(struct rdtgroup *r, unsigned int cpu)

void resctrl_offline_cpu(unsigned int cpu)
{
- struct rdt_resource *l3 = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
struct rdtgroup *rdtgrp;
struct rdt_domain *d;

@@ -4074,10 +4073,10 @@ void resctrl_offline_cpu(unsigned int cpu)
}
}

- if (!l3->mon_enabled)
+ if (!rdt_l3_mon_resource->mon_enabled)
goto out_unlock;

- d = get_domain_from_cpu(cpu, l3);
+ d = get_domain_from_cpu(cpu, rdt_l3_mon_resource);
if (d) {
if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {
cancel_delayed_work(&d->mbm_over);
--
2.44.0


2024-03-27 20:06:49

by Luck, Tony

[permalink] [raw]
Subject: [PATCH 07/10] x86/resctrl: Update rmdir_mondata_subdir_allrdtgrp() for Sub-NUMA domains

Old code could simply remove the mon_L3_XX directory with all
contents when a domain was taken offline. With Sub-NUMA cluster
enabled this becomes a multi-stage process.

While the parent L3 domain is still online the offline process for
a SUBL3 domain removes just one directory from mon_L3_XX. When the
last such domain goes offline the whole mon_L3_XX must be removed.

Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 46 ++++++++++++++++++++++----
1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 22effd8dc207..399dc3175292 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3012,17 +3012,49 @@ static int mon_addfile(struct kernfs_node *parent_kn, const char *name,
* and monitor groups with given domain id.
*/
static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
- unsigned int dom_id)
+ struct rdt_domain *d)
{
struct rdtgroup *prgrp, *crgrp;
- char name[32];
+ char pname[32], name[32];
+ bool remove_all = false;
+ struct kernfs_node *kn;
+ struct rdt_domain *pd;
+
+ sprintf(name, "mon_%s_%02d", r->name, d->id);
+
+ if (!r->parent) {
+ list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
+ kernfs_remove_by_name(prgrp->mon.mon_data_kn, name);
+
+ list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list)
+ kernfs_remove_by_name(crgrp->mon.mon_data_kn, name);
+ }
+ return;
+ }
+
+ sprintf(pname, "mon_%s_%02d", r->parent->name, d->parent);
+ pd = get_parent_domain(r, d);
+ if (!pd)
+ remove_all = true;

list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
- sprintf(name, "mon_%s_%02d", r->name, dom_id);
- kernfs_remove_by_name(prgrp->mon.mon_data_kn, name);
+ if (remove_all) {
+ kernfs_remove_by_name(prgrp->mon.mon_data_kn, pname);
+ } else {
+ kn = kernfs_find_and_get_ns(prgrp->mon.mon_data_kn, pname, NULL);
+ if (kn)
+ kernfs_remove_by_name(kn, name);
+ }

- list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list)
- kernfs_remove_by_name(crgrp->mon.mon_data_kn, name);
+ list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list) {
+ if (remove_all) {
+ kernfs_remove_by_name(crgrp->mon.mon_data_kn, pname);
+ } else {
+ kn = kernfs_find_and_get_ns(crgrp->mon.mon_data_kn, pname, NULL);
+ if (kn)
+ kernfs_remove_by_name(kn, name);
+ }
+ }
}
}

@@ -3979,7 +4011,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
* per domain monitor data directories.
*/
if (resctrl_mounted && resctrl_arch_mon_capable())
- rmdir_mondata_subdir_allrdtgrp(r, d->id);
+ rmdir_mondata_subdir_allrdtgrp(r, d);

if (is_mbm_enabled())
cancel_delayed_work(&d->mbm_over);
--
2.44.0


2024-03-27 20:06:54

by Luck, Tony

[permalink] [raw]
Subject: [PATCH 08/10] x86/resctrl: Mark L3 monitor files with summation flag.

When Sub-NUMA cluster mode is enabled, the parent monitor
files in the mon_L3_XX directories need to sum all child
siblings from the SUBL3 domains.

Add a flag to the parent files and pass that down to the routines
that will read values from hardware RMID counters.

Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/kernel/cpu/resctrl/internal.h | 5 ++++-
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 1 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 +++++---
3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 7f05502454c5..4263fbdf7ee7 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -130,6 +130,7 @@ struct mon_evt {
* @priv: Used to store monitoring event data in @u
* as kernfs private data
* @rid: Resource id associated with the event file
+ * @sumdomains: Report total of all SNC domains sharing an L3 cache
* @evtid: Event id associated with the event file
* @domid: The domain to which the event file belongs
* @u: Name of the bit fields struct
@@ -137,7 +138,8 @@ struct mon_evt {
union mon_data_bits {
void *priv;
struct {
- unsigned int rid : 10;
+ unsigned int rid : 9;
+ unsigned int sumdomains : 1;
enum resctrl_event_id evtid : 8;
unsigned int domid : 14;
} u;
@@ -149,6 +151,7 @@ struct rmid_read {
struct rdt_domain *d;
enum resctrl_event_id evtid;
bool first;
+ bool sumdomains;
int err;
u64 val;
void *arch_mon_ctx;
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 2bf021d42500..fedcf50df6e5 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -574,6 +574,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
resid = md.u.rid;
domid = md.u.domid;
evtid = md.u.evtid;
+ rr.sumdomains = md.u.sumdomains;

r = &rdt_resources_all[resid].r_resctrl;
d = rdt_find_domain(r, domid, NULL);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 399dc3175292..3e7f2e36b71e 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3059,7 +3059,8 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
}

static int mon_add_all_files(struct kernfs_node *kn, struct rdt_domain *d,
- struct rdt_resource *r, struct rdtgroup *prgrp)
+ struct rdt_resource *r, struct rdtgroup *prgrp,
+ bool sumdomains)
{
union mon_data_bits priv;
struct mon_evt *mevt;
@@ -3071,6 +3072,7 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_domain *d,

priv.u.rid = r->rid;
priv.u.domid = d->id;
+ priv.u.sumdomains = sumdomains;
list_for_each_entry(mevt, &r->evt_list, list) {
priv.u.evtid = mevt->evtid;
ret = mon_addfile(kn, mevt->name, priv.priv);
@@ -3109,7 +3111,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
ret = rdtgroup_kn_set_ugid(kn);
if (ret)
goto out_destroy;
- ret = mon_add_all_files(kn, d, r, prgrp);
+ ret = mon_add_all_files(kn, d, r, prgrp, true);
if (ret)
goto out_destroy;
}
@@ -3124,7 +3126,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
if (ret)
goto out_destroy;

- ret = mon_add_all_files(ckn, d, r, prgrp);
+ ret = mon_add_all_files(ckn, d, r, prgrp, false);
if (ret)
goto out_destroy;
kernfs_activate(kn);
--
2.44.0


2024-03-27 20:16:30

by Luck, Tony

[permalink] [raw]
Subject: [PATCH 09/10] x86/resctrl: Update __mon_event_count() for Sub-NUMA domains

Chack the flag to see if the request is for a file that must
calculate the sum of RMID counters from all sibling domains.

Note that when reading RMID counters to sum across SUBL3 domains
it is OK to read from any CPU in the parent L3 domain.

Signed-off-by: Tony Luck <[email protected]>
---
include/linux/resctrl.h | 2 +
arch/x86/kernel/cpu/resctrl/monitor.c | 56 ++++++++++++++++++++++++---
2 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index b7ade7627e08..4cf256053f53 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -177,6 +177,7 @@ enum resctrl_scope {
* @evt_list: List of monitoring events
* @parent: Parent of this resource
* @child: Child of this resource
+ * @num_siblings: Number of sibling domains that share a parent
* @fflags: flags to choose base and info files
* @cdp_capable: Is the CDP feature available on this resource
*/
@@ -200,6 +201,7 @@ struct rdt_resource {
struct list_head evt_list;
struct rdt_resource *parent;
struct rdt_resource *child;
+ int num_siblings;
unsigned long fflags;
bool cdp_capable;
};
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 95455cb187eb..1ba40d5f5d77 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -279,12 +279,24 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct arch_mbm_state *am;
u64 msr_val, chunks;
- int ret;
+ int cpu, ret;

resctrl_arch_rmid_read_context_check();

- if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
- return -EINVAL;
+ /* This sanity check is now heavyweight. Keep it? */
+ cpu = smp_processor_id();
+ if (r->parent) {
+ struct rdt_domain *pd;
+
+ pd = rdt_find_domain(r->parent, d->parent, NULL);
+ if (WARN_ON_ONCE(!pd))
+ return -EINVAL;
+ if (!cpumask_test_cpu(cpu, &pd->cpu_mask))
+ return -EINVAL;
+ } else {
+ if (!cpumask_test_cpu(cpu, &d->cpu_mask))
+ return -EINVAL;
+ }

ret = __rmid_read(rmid, eventid, &msr_val);
if (ret)
@@ -538,7 +550,7 @@ static struct mbm_state *get_mbm_state(struct rdt_domain *d, u32 closid,
}
}

-static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
+static int ___mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr, u64 *rrval)
{
struct mbm_state *m;
u64 tval = 0;
@@ -556,11 +568,44 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
if (rr->err)
return rr->err;

- rr->val += tval;
+ *rrval += tval;

return 0;
}

+static u32 get_node_rmid(struct rdt_resource *r, struct rdt_domain *d, u32 rmid)
+{
+ int cpu = cpumask_any(&d->cpu_mask);
+
+ return rmid + (cpu_to_node(cpu) % r->num_siblings) * r->num_rmid;
+}
+
+static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
+{
+ struct rdt_domain *d;
+ struct rmid_read tmp;
+ u32 node_rmid;
+ int ret = 0;
+
+ if (!rr->sumdomains) {
+ node_rmid = get_node_rmid(rr->r, rr->d, rmid);
+ return ___mon_event_count(closid, node_rmid, rr, &rr->val);
+ }
+
+ tmp = *rr;
+ list_for_each_entry(d, &rr->r->domains, list) {
+ if (d->parent == rr->d->parent) {
+ tmp.d = d;
+ node_rmid = get_node_rmid(rr->r, d, rmid);
+ ret = ___mon_event_count(closid, node_rmid, &tmp, &rr->val);
+ if (ret)
+ break;
+ }
+ }
+
+ return ret;
+}
+
/*
* mbm_bw_count() - Update bw count from values previously read by
* __mon_event_count().
@@ -1039,6 +1084,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
*/
resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(threshold);

+ r->num_siblings = 1;
rdt_l3_mon_resource = r;

ret = dom_data_init(rdt_l3_mon_resource);
--
2.44.0


2024-03-27 20:22:46

by Luck, Tony

[permalink] [raw]
Subject: [PATCH 03/10] x86/resctrl: Add new "enabled" state for monitor resources

When Sub-NUMA cluster is enabled resctrl needs to build domain
lists for both the regular L3 resource (scoped at L3 cache level)
and the SUBL3 resource (scoped at NUMA node level).

But only one of these resources will be used for all monitoring
functions.

Add a new "enabled" flag to indicate which resource should allocate
space for MBM counters, run MBM overflow and LLC occupancy timeouts
etc.

Signed-off-by: Tony Luck <[email protected]>
---
include/linux/resctrl.h | 2 ++
arch/x86/kernel/cpu/resctrl/internal.h | 4 ++++
arch/x86/kernel/cpu/resctrl/core.c | 4 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++----
5 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index f39a07b27a98..dea79f6a8122 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -161,6 +161,7 @@ enum resctrl_scope {
* @rid: The index of the resource
* @alloc_capable: Is allocation available on this machine
* @mon_capable: Is monitor feature available on this machine
+ * @mon_enabled: Monitor feature enabled for this resource
* @num_rmid: Number of RMIDs available
* @scope: Scope of this resource
* @cache: Cache allocation related data
@@ -179,6 +180,7 @@ struct rdt_resource {
int rid;
bool alloc_capable;
bool mon_capable;
+ bool mon_enabled;
int num_rmid;
enum resctrl_scope scope;
struct resctrl_cache cache;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 24fad5ecc158..5fcff861e185 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -514,6 +514,10 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
for_each_rdt_resource(r) \
if (r->mon_capable)

+#define for_each_mon_enabled_rdt_resource(r) \
+ for_each_rdt_resource(r) \
+ if (r->mon_enabled)
+
/* CPUID.(EAX=10H, ECX=ResID=1).EAX */
union cpuid_0x10_1_eax {
struct {
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 395bcb3199f8..bfa179f20802 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -559,7 +559,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
return;
}

- if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+ if (r->mon_enabled && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
domain_free(hw_dom);
return;
}
@@ -1002,7 +1002,7 @@ static void __exit resctrl_exit(void)

rdtgroup_exit();

- if (r->mon_capable)
+ if (r->mon_enabled)
rdt_put_mon_l3_config();
}

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index c34a35ec0f03..84a2056190c8 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1060,6 +1060,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
l3_mon_evt_init(r);

r->mon_capable = true;
+ r->mon_enabled = true;

return 0;
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index b8588ce88eef..ffcafe567b25 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3150,7 +3150,7 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
* Create the subdirectories for each domain. Note that all events
* in a domain like L3 are grouped into a resource whose domain is L3
*/
- for_each_mon_capable_rdt_resource(r) {
+ for_each_mon_enabled_rdt_resource(r) {
ret = mkdir_mondata_subdir_alldom(kn, r, prgrp);
if (ret)
goto out_destroy;
@@ -3937,7 +3937,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
mba_sc_domain_destroy(r, d);

- if (!r->mon_capable)
+ if (!r->mon_enabled)
goto out_unlock;

/*
@@ -4011,7 +4011,7 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
goto out_unlock;
}

- if (!r->mon_capable)
+ if (!r->mon_enabled)
goto out_unlock;

err = domain_setup_mon_state(r, d);
@@ -4074,7 +4074,7 @@ void resctrl_offline_cpu(unsigned int cpu)
}
}

- if (!l3->mon_capable)
+ if (!l3->mon_enabled)
goto out_unlock;

d = get_domain_from_cpu(cpu, l3);
--
2.44.0


2024-03-27 20:26:20

by Luck, Tony

[permalink] [raw]
Subject: [PATCH 10/10] x86/resctrl: Determine Sub-NUMA configuration

There isn't an explicit enumeration of Sub-NUMA cluster mode. Use
the ratio of the number of CPUs that share an L3 cache instance with
CPU 0 against the number of CPUs that share a node with CPU0.

When Sub-NUMA cluster mode is enabled, adjust the number of RMIDs,
the sclaing factor, and setup the parent/child pointers in the
L3 and SUBL3 rdt_resource structures, etc.

As each Sub-NUMA domain is brought online, update the MSR_RMID_SNC_CONFIG
to remap RMID counters.

Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/monitor.c | 68 ++++++++++++++++++++++++--
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 26 ++++++++++
3 files changed, 91 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 05956bd8bacf..b54c26016c93 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1158,6 +1158,7 @@
#define MSR_IA32_QM_CTR 0xc8e
#define MSR_IA32_PQR_ASSOC 0xc8f
#define MSR_IA32_L3_CBM_BASE 0xc90
+#define MSR_RMID_SNC_CONFIG 0xca0
#define MSR_IA32_L2_CBM_BASE 0xd10
#define MSR_IA32_MBA_THRTL_BASE 0xd50

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 1ba40d5f5d77..757d475158a3 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -17,6 +17,7 @@

#include <linux/cpu.h>
#include <linux/module.h>
+#include <linux/cacheinfo.h>
#include <linux/sizes.h>
#include <linux/slab.h>

@@ -1051,16 +1052,59 @@ static void l3_mon_evt_init(struct rdt_resource *r)
list_add_tail(&mbm_local_event.list, &r->evt_list);
}

+/* CPU models that support MSR_RMID_SNC_CONFIG */
+static const struct x86_cpu_id snc_cpu_ids[] __initconst = {
+ X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(GRANITERAPIDS_X, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_CRESTMONT_X, 0),
+ {}
+};
+
+static __init int snc_get_config(void)
+{
+ struct cpu_cacheinfo *ci = get_cpu_cacheinfo(0);
+ cpumask_t *l3_cpumask = NULL;
+ const cpumask_t *node0_cpumask;
+ int i;
+
+ if (!x86_match_cpu(snc_cpu_ids))
+ return 1;
+
+ for (i = 0; i < ci->num_leaves; i++) {
+ if (ci->info_list[i].level == 3) {
+ if (ci->info_list[i].attributes & CACHE_ID) {
+ l3_cpumask = &ci->info_list[i].shared_cpu_map;
+ break;
+ }
+ }
+ }
+ if (!l3_cpumask) {
+ pr_info("can't get CPU0 L3 mask\n");
+ return 1;
+ }
+
+ node0_cpumask = cpumask_of_node(cpu_to_node(0));
+
+ return bitmap_weight(cpumask_bits(l3_cpumask), nr_cpu_ids) /
+ bitmap_weight(cpumask_bits(node0_cpumask), nr_cpu_ids);
+}
+
int __init rdt_get_mon_l3_config(struct rdt_resource *r)
{
unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
unsigned int threshold;
+ int snc_ways;
int ret;

+ snc_ways = snc_get_config();
+ if (snc_ways > 1)
+ pr_info("Sub-NUMA cluster detected with %d nodes per L3 cache\n", snc_ways);
resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
- hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
- r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1;
+ hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_ways;
+ r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_ways;
hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;

if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
@@ -1084,8 +1128,23 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
*/
resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(threshold);

- r->num_siblings = 1;
- rdt_l3_mon_resource = r;
+ if (snc_ways > 1) {
+ struct rdt_hw_resource *shw_res;
+
+ rdt_l3_mon_resource = &rdt_resources_all[RDT_RESOURCE_SUBL3].r_resctrl;
+ rdt_l3_mon_resource->num_rmid = r->num_rmid;
+ rdt_l3_mon_resource->num_siblings = snc_ways;
+
+ shw_res = resctrl_to_arch_res(rdt_l3_mon_resource);
+ shw_res->mon_scale = hw_res->mon_scale;
+ shw_res->mbm_width = hw_res->mbm_width;
+
+ r->child = rdt_l3_mon_resource;
+ rdt_l3_mon_resource->parent = r;
+ } else {
+ r->num_siblings = 1;
+ rdt_l3_mon_resource = r;
+ }

ret = dom_data_init(rdt_l3_mon_resource);
if (ret)
@@ -1110,6 +1169,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)

l3_mon_evt_init(rdt_l3_mon_resource);

+ r->mon_capable = true;
rdt_l3_mon_resource->mon_capable = true;
rdt_l3_mon_resource->mon_enabled = true;

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 3e7f2e36b71e..b1f79fafa333 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -4067,6 +4067,29 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

+/*
+ * The power-on reset value of MSR_RMID_SNC_CONFIG is 0x1
+ * which indicates that RMIDs are configured in legacy mode.
+ * This mode is incompatible with Linux resctrl semantics
+ * as RMIDs are partitioned between SNC nodes, which requires
+ * a user to know which RMID is allocated to a task.
+ * Clearing bit 0 reconfigures the RMID counters for use
+ * in Sub NUMA Cluster mode. This mode is better for Linux.
+ * The RMID space is divided between all SNC nodes with the
+ * RMIDs renumbered to start from zero in each node when
+ * couning operations from tasks. Code to read the counters
+ * must adjust RMID counter numbers based on SNC node. See
+ * __rmid_read() for code that does this.
+ */
+static void snc_remap_rmids(void)
+{
+ u64 val;
+
+ rdmsrl(MSR_RMID_SNC_CONFIG, val);
+ val &= ~BIT_ULL(0);
+ wrmsrl(MSR_RMID_SNC_CONFIG, val);
+}
+
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
{
int err = 0;
@@ -4082,6 +4105,9 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
if (!r->mon_enabled)
goto out_unlock;

+ if (r->num_siblings > 1)
+ snc_remap_rmids();
+
err = domain_setup_mon_state(r, d);
if (err)
goto out_unlock;
--
2.44.0


2024-05-02 17:01:17

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH 00/10] Add support for Sub-NUMA cluster (SNC) systems

Hi Tony,

On 3/27/2024 1:03 PM, Tony Luck wrote:
> This series on top of v6.9-rc1 plus these two patches:
>
> Link: https://lore.kernel.org/all/[email protected]/
>
> The Sub-NUMA cluster feature on some Intel processors partitions the CPUs
> that share an L3 cache into two or more sets. This plays havoc with the
> Resource Director Technology (RDT) monitoring features. Prior to this
> patch Intel has advised that SNC and RDT are incompatible.
>
> Some of these CPU support an MSR that can partition the RMID counters in
> the same way. This allows monitoring features to be used. With the caveat
> that users must be aware that Linux may migrate tasks more frequently
> between SNC nodes than between "regular" NUMA nodes, so reading counters
> from all SNC nodes may be needed to get a complete picture of activity
> for tasks.
>
> Cache and memory bandwidth allocation features continue to operate at
> the scope of the L3 cache.
>
> This is a new approach triggered by the discussions that started with
> "How can users tell that SNC is enabled?" but then drifted into
> whether users of the legacy interface would really get what they
> expected when reading from monitor files in the mon_L3_* directories.
>
> During that discussion I'd mentioned providing monitor values for both
> the L3 level, and also for each SNC node. That would provide full ABI
> compatibility while also giving the finer grained reporting from each
> SNC node.
>
> Implementation sets up a new rdt_resource to track monitor resources
> with domains for each SNC node. This resource is only used when SNC
> mode is detected.
>
> On SNC systems there is a parent-child relationship between the
> old L3 resource and the new SUBL3 resource. Reading from legacy
> files like mon_data/mon_L3_00/llc_occupancy reads and sums the RMID
> counters from all "child" domains in the SUBL3 resource. E.g. on
> an SNC3 system:
>
> $ grep . mon_L3_01/llc_occupancy mon_L3_01/*/llc_occupancy
> mon_L3_01/llc_occupancy:413097984
> mon_L3_01/mon_SUBL3_03/llc_occupancy:141484032
> mon_L3_01/mon_SUBL3_04/llc_occupancy:135659520
> mon_L3_01/mon_SUBL3_05/llc_occupancy:135954432
>
> So the L3 occupancy shows the total L3 occupancy which is
> the sum of the cache occupancy on each of the SNC nodes
> that share that L3 cache instance.
>

This is back to the "duplicate a resource" solution that does the
unnecessary duplication of an entire resource and splits data structure
use between two resources for each resource to manage one portion.
My objections to this foundation on which this solution is
built remains.

This solution continues to build on this foundation in a way that
I find inappropriate. This solution wedges a new architectural concept
("parent" and "child" resources) into resctrl but does not integrate it.
All resources continue to be managed the same (not hierarchical) while
this artificial relationship is only selectively used when convenient
for SNC.

This solution also brings back a concept that was removed and yet
claims that it is "new" (bab6ee736873 ("x86/resctrl: Merge mon_capable
and mon_enabled"). Having gone through effort to clean this up deserves
a thorough justification why it is necessary to bring it back. I do
not see a justification though since it is used as a workaround
for resctrl to be able to deal with the unnecessary duplication of the
monitoring resource. Same thing for the rdt_l3_mon_resource global
pointer introduced here that is another workaround to deal with the
unnecessary duplication. These "tricks" to get this solution to work
makes this all very difficult to understand and maintain.

Using a "parent" resource to decide if one domain is a "child" of
another domain looks arbitrary to me. Just because the CPU mask of one
domain of one resource is a subset of the CPU mask of another domain of
another resource does not make the domain a child domain of the other.

It looks to me as though the "info" directory will now display two
monitoring resources to user space? I find that to be confusing and
the data exposed in that way is not clear to me ... for example,
both will now allow to set the global max_threshold_occupancy? How
should the number RMIDs be interpreted?

As I understand this rewrite aims to support the new file
hierarchy and to be able to have higher level files contain the sum of
the files below. I do not think a rewrite is necessary to
support this. I do continue to believe that the previous SNC enabling
is appropriate and as I understand support for this new display to user
space can be added to it. For example, could this not be accomplished
by adding one new member to struct rdt_resource that specifies the scope at
which to display the monitor data to user space? mkdir_mondata_subdir() can
use this field to decide how to create the files ... if the "display"
scope is the same as the monitoring scope then files are created as
it is now, if the "display" scope is different (required to be a superset)
then the files are created within a directory that matches the "display"
scope. I do not see why it would be required to have to work from a
stored CPU mask to determine relationships. I believe the id of the
"display" scope to create a sub-directory in can be determined dynamically
at this time. Similarly, the metadata of the files are used to indicate
when a sum is required.

Reinette


2024-05-03 15:39:05

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH 00/10] Add support for Sub-NUMA cluster (SNC) systems

> As I understand this rewrite aims to support the new file
> hierarchy and to be able to have higher level files contain the sum of
> the files below. I do not think a rewrite is necessary to
> support this. I do continue to believe that the previous SNC enabling
> is appropriate and as I understand support for this new display to user
> space can be added to it. For example, could this not be accomplished
> by adding one new member to struct rdt_resource that specifies the scope at
> which to display the monitor data to user space? mkdir_mondata_subdir() can
> use this field to decide how to create the files ... if the "display"
> scope is the same as the monitoring scope then files are created as
> it is now, if the "display" scope is different (required to be a superset)
> then the files are created within a directory that matches the "display"
> scope. I do not see why it would be required to have to work from a
> stored CPU mask to determine relationships. I believe the id of the
> "display" scope to create a sub-directory in can be determined dynamically
> at this time. Similarly, the metadata of the files are used to indicate
> when a sum is required.

Reinette,

I was concerned about dynamically figuring out the relationships. In
particular I thought it would be much easier to track addition/removal
of domains at both SNC node and L3 cache level if I used the existing
cpu hotplug tracking functions. Hence separate rdt_resources for each.

But after reading your suggestions above, I experimented with building
on top of the v17 series. It doesn't look anywhere as complex as I had
imagined. I'm going to pursue that today.

Thanks

-Tony