Hi folks,
Thanks to Reinette and Jamie for the comments on v1. Major changes in v2 are
to keep the closid in resctrl_arch_update_domains(), eliminating one patch,
splitting another that was making two sorts of change, and to re-order the
first few patches. See each patches changelog for more.
----
This series re-folds the resctrl code so the CDP resources (L3CODE et al)
behaviour is all contained in the filesystem parts, with a minimum amount
of arch specific code.
Arm have some CPU support for dividing caches into portions, and
applying bandwidth limits at various points in the SoC. The collective term
for these features is MPAM: Memory Partitioning and Monitoring.
MPAM is similar enough to Intel RDT, that it should use the defacto linux
interface: resctrl. This filesystem currently lives under arch/x86, and is
tightly coupled to the architecture.
Ultimately, my plan is to split the existing resctrl code up to have an
arch<->fs abstraction, then move all the bits out to fs/resctrl. From there
MPAM can be wired up.
x86 might have two resources with cache controls, (L2 and L3) but has
extra copies for CDP: L{2,3}{CODE,DATA}, which are marked as enabled
if CDP is enabled for the corresponding cache.
MPAM has an equivalent feature to CDP, but its a property of the CPU,
not the cache. Resctrl needs to have x86's odd/even behaviour, as that
its the ABI, but this isn't how the MPAM hardware works. It is entirely
possible that an in-kernel user of MPAM would not be using CDP, whereas
resctrl is.
Pretending L3CODE and L3DATA are entirely separate resources is a neat
trick, but doing this is specific to x86.
Doing this leaves the arch code in control of various parts of the
filesystem ABI: the resources names, and the way the schemata are parsed.
Allowing this stuff to vary between architectures is bad for user space.
This series collapses the CODE/DATA resources, moving all the user-visible
resctrl ABI into what becomes the filesystem code. CDP becomes the type of
configuration being applied to a cache. This is done by adding a
struct resctrl_schema to the parts of resctrl that will move to fs. This
holds the arch-code resource that is in use for this schema, along with
other properties like the name, and whether the configuration being applied
is CODE/DATA/BOTH.
This lets us fold the extra resources out of the arch code so that they
don't need to be duplicated if the equivalent feature to CDP is missing, or
implemented in a different way.
The first two patches split the resource and domain structs to have an
arch specific 'hw' portion, and the rest that is visible to resctrl.
Future series massage the resctrl code so there are no accesses to 'hw'
structures in the parts of resctrl that will move to fs, providing helpers
where necessary.
This series adds temporary scaffolding, which it removes a few patches
later. This is to allow things like the ctrlval arrays and resources to be
merged separately, which should make is easier to bisect. These things
are marked temporary, and should all be gone by the end of the series.
This series is a little rough around the monitors, would a fake
struct resctrl_schema for the monitors simplify things, or be a source
of bugs?
This series is based on v5.12-rc2, and can be retrieved from:
git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/resctrl_merge_cdp/v2
v1 was posted here:
https://lore.kernel.org/lkml/[email protected]/
Parts were previously posted as an RFC here:
https://lore.kernel.org/lkml/[email protected]/
Thanks,
James Morse (24):
x86/resctrl: Split struct rdt_resource
x86/resctrl: Split struct rdt_domain
x86/resctrl: Add a separate schema list for resctrl
x86/resctrl: Pass the schema in info dir's private pointer
x86/resctrl: Label the resources with their configuration type
x86/resctrl: Walk the resctrl schema list instead of an arch list
x86/resctrl: Store the effective num_closid in the schema
x86/resctrl: Add resctrl_arch_get_num_closid()
x86/resctrl: Pass the schema to resctrl filesystem functions
x86/resctrl: Swizzle rdt_resource and resctrl_schema in
pseudo_lock_region
x86/resctrl: Move the schemata names into struct resctrl_schema
x86/resctrl: Group staged configuration into a separate struct
x86/resctrl: Allow different CODE/DATA configurations to be staged
x86/resctrl: Rename update_domains() resctrl_arch_update_domains()
x86/resctrl: Add a helper to read a closid's configuration
x86/resctrl: Add a helper to read/set the CDP configuration
x86/resctrl: Use cdp_enabled in rdt_domain_reconfigure_cdp()
x86/resctrl: Pass configuration type to resctrl_arch_get_config()
x86/resctrl: Make ctrlval arrays the same size
x86/resctrl: Apply offset correction when config is staged
x86/resctrl: Calculate the index from the configuration type
x86/resctrl: Merge the ctrl_val arrays
x86/resctrl: Remove rdt_cdp_peer_get()
x86/resctrl: Merge the CDP resources
arch/x86/kernel/cpu/resctrl/core.c | 272 +++++--------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 164 +++++---
arch/x86/kernel/cpu/resctrl/internal.h | 218 +++--------
arch/x86/kernel/cpu/resctrl/monitor.c | 44 ++-
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 12 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 447 ++++++++++++----------
include/linux/resctrl.h | 176 +++++++++
7 files changed, 741 insertions(+), 592 deletions(-)
--
2.30.0
Now that the configuration can be read from either CDP peer,
rdt_cdp_peer_get() is not needed to to map the resource and search for the
corresponding domain.
As __rdtgroup_cbm_overlaps() takes the configuration type from the schema,
this can be replaced with a second call for the other configuration type
if cdp is enabled.
Reviewed-by: Jamie Iles <[email protected]>
Signed-off-by: James Morse <[email protected]>
Changes since v1:
* Expanded commit mesasge.
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 99 ++++----------------------
1 file changed, 14 insertions(+), 85 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 801bff59db06..36e2905f4da6 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1092,82 +1092,17 @@ static int rdtgroup_mode_show(struct kernfs_open_file *of,
return 0;
}
-/**
- * rdt_cdp_peer_get - Retrieve CDP peer if it exists
- * @r: RDT resource to which RDT domain @d belongs
- * @d: Cache instance for which a CDP peer is requested
- * @r_cdp: RDT resource that shares hardware with @r (RDT resource peer)
- * Used to return the result.
- * @d_cdp: RDT domain that shares hardware with @d (RDT domain peer)
- * Used to return the result.
- * @peer_type: The CDP configuration type of the peer resource.
- *
- * RDT resources are managed independently and by extension the RDT domains
- * (RDT resource instances) are managed independently also. The Code and
- * Data Prioritization (CDP) RDT resources, while managed independently,
- * could refer to the same underlying hardware. For example,
- * RDT_RESOURCE_L2CODE and RDT_RESOURCE_L2DATA both refer to the L2 cache.
- *
- * When provided with an RDT resource @r and an instance of that RDT
- * resource @d rdt_cdp_peer_get() will return if there is a peer RDT
- * resource and the exact instance that shares the same hardware.
- *
- * Return: 0 if a CDP peer was found, <0 on error or if no CDP peer exists.
- * If a CDP peer was found, @r_cdp will point to the peer RDT resource
- * and @d_cdp will point to the peer RDT domain.
- */
-static int rdt_cdp_peer_get(struct rdt_resource *r, struct rdt_domain *d,
- struct rdt_resource **r_cdp,
- struct rdt_domain **d_cdp,
- enum resctrl_conf_type *peer_type)
+static enum resctrl_conf_type resctrl_peer_type(enum resctrl_conf_type my_type)
{
- struct rdt_resource *_r_cdp = NULL;
- struct rdt_domain *_d_cdp = NULL;
- int ret = 0;
-
- switch (r->rid) {
- case RDT_RESOURCE_L3DATA:
- _r_cdp = &rdt_resources_all[RDT_RESOURCE_L3CODE].resctrl;
- *peer_type = CDP_CODE;
- break;
- case RDT_RESOURCE_L3CODE:
- _r_cdp = &rdt_resources_all[RDT_RESOURCE_L3DATA].resctrl;
- *peer_type = CDP_DATA;
- break;
- case RDT_RESOURCE_L2DATA:
- _r_cdp = &rdt_resources_all[RDT_RESOURCE_L2CODE].resctrl;
- *peer_type = CDP_CODE;
- break;
- case RDT_RESOURCE_L2CODE:
- _r_cdp = &rdt_resources_all[RDT_RESOURCE_L2DATA].resctrl;
- *peer_type = CDP_DATA;
- break;
+ switch (my_type) {
+ case CDP_CODE:
+ return CDP_DATA;
+ case CDP_DATA:
+ return CDP_CODE;
default:
- ret = -ENOENT;
- goto out;
- }
-
- /*
- * When a new CPU comes online and CDP is enabled then the new
- * RDT domains (if any) associated with both CDP RDT resources
- * are added in the same CPU online routine while the
- * rdtgroup_mutex is held. It should thus not happen for one
- * RDT domain to exist and be associated with its RDT CDP
- * resource but there is no RDT domain associated with the
- * peer RDT CDP resource. Hence the WARN.
- */
- _d_cdp = rdt_find_domain(_r_cdp, d->id, NULL);
- if (WARN_ON(IS_ERR_OR_NULL(_d_cdp))) {
- _r_cdp = NULL;
- _d_cdp = NULL;
- ret = -EINVAL;
+ case CDP_BOTH:
+ return CDP_BOTH;
}
-
-out:
- *r_cdp = _r_cdp;
- *d_cdp = _d_cdp;
-
- return ret;
}
/**
@@ -1248,19 +1183,16 @@ static bool __rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d
bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_domain *d,
unsigned long cbm, int closid, bool exclusive)
{
- enum resctrl_conf_type peer_type;
+ enum resctrl_conf_type peer_type = resctrl_peer_type(s->conf_type);
struct rdt_resource *r = s->res;
- struct rdt_resource *r_cdp;
- struct rdt_domain *d_cdp;
if (__rdtgroup_cbm_overlaps(r, d, cbm, closid, s->conf_type,
exclusive))
return true;
- if (rdt_cdp_peer_get(r, d, &r_cdp, &d_cdp, &peer_type) < 0)
+ if (!resctrl_arch_get_cdp_enabled(r->rid))
return false;
-
- return __rdtgroup_cbm_overlaps(r_cdp, d_cdp, cbm, closid, peer_type, exclusive);
+ return __rdtgroup_cbm_overlaps(r, d, cbm, closid, peer_type, exclusive);
}
/**
@@ -2746,11 +2678,9 @@ static u32 cbm_ensure_valid(u32 _val, struct rdt_resource *r)
static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema *s,
u32 closid)
{
+ enum resctrl_conf_type peer_type = resctrl_peer_type(s->conf_type);
enum resctrl_conf_type t = s->conf_type;
- struct rdt_resource *r_cdp = NULL;
struct resctrl_staged_config *cfg;
- enum resctrl_conf_type peer_type;
- struct rdt_domain *d_cdp = NULL;
struct rdt_resource *r = s->res;
u32 used_b = 0, unused_b = 0;
unsigned long tmp_cbm;
@@ -2758,7 +2688,6 @@ static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema *s,
u32 peer_ctl, ctrl_val;
int i;
- rdt_cdp_peer_get(r, d, &r_cdp, &d_cdp, &peer_type);
cfg = &d->staged_config[t];
cfg->have_new_ctrl = false;
cfg->new_ctrl = r->cache.shareable_bits;
@@ -2778,8 +2707,8 @@ static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema *s,
* usage to ensure there is no overlap
* with an exclusive group.
*/
- if (d_cdp)
- resctrl_arch_get_config(r_cdp, d_cdp, i, peer_type, &peer_ctl);
+ if (resctrl_arch_get_cdp_enabled(r->rid))
+ resctrl_arch_get_config(r, d, i, peer_type, &peer_ctl);
else
peer_ctl = 0;
resctrl_arch_get_config(r, d, i, s->conf_type, &ctrl_val);
--
2.30.0
Once the arch code is abstracted from the resctrl filesystem code
the separate schema for CDP are created by the filesystem code. This
means the same resource is used for different schema, or types of
configuration.
Helpers like rdtgroup_cbm_overlaps() need the resctrl_schema to
retrieve the configuration (or configurations). Before these
helpers can be changed to take the schema instead of the resource,
their callers must have the schema on hand.
Change the users of for_each_alloc_enabled_rdt_resource() to walk
the schema instead. Schema were only created for alloc_enabled resources
so these two lists are currently equivalent.
schemata_list_create() and rdt_kill_sb() are ignored. The first
creates the schema list, and will eventually loop over the resource
indexes using an arch helper to retrieve the resource. rdt_kill_sb()
will eventually make use of an arch 'reset everything' helper.
After the filesystem code is moved, rdtgroup_pseudo_locked_in_hierarchy()
remains part of the x86 specific hooks to support psuedo lock. This code
walks each domain, and still does this after the separate resources are
merged.
Reviewed-by: Jamie Iles <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v1:
* Expanded commit message
* Split from a larger patch
---
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 23 +++++++++++++++--------
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 18 ++++++++++++------
2 files changed, 27 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 2e7466659af3..a6f9548a8a59 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -287,10 +287,12 @@ static int rdtgroup_parse_resource(char *resname, char *tok,
struct rdtgroup *rdtgrp)
{
struct rdt_hw_resource *hw_res;
+ struct resctrl_schema *s;
struct rdt_resource *r;
- for_each_alloc_enabled_rdt_resource(r) {
- hw_res = resctrl_to_arch_res(r);
+ list_for_each_entry(s, &resctrl_schema_all, list) {
+ r = s->res;
+ hw_res = resctrl_to_arch_res(s->res);
if (!strcmp(resname, r->name) && rdtgrp->closid < hw_res->num_closid)
return parse_line(tok, r, rdtgrp);
}
@@ -301,6 +303,7 @@ static int rdtgroup_parse_resource(char *resname, char *tok,
ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
+ struct resctrl_schema *s;
struct rdtgroup *rdtgrp;
struct rdt_domain *dom;
struct rdt_resource *r;
@@ -331,8 +334,8 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
goto out;
}
- for_each_alloc_enabled_rdt_resource(r) {
- list_for_each_entry(dom, &r->domains, list)
+ list_for_each_entry(s, &resctrl_schema_all, list) {
+ list_for_each_entry(dom, &s->res->domains, list)
dom->have_new_ctrl = false;
}
@@ -353,7 +356,8 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
goto out;
}
- for_each_alloc_enabled_rdt_resource(r) {
+ list_for_each_entry(s, &resctrl_schema_all, list) {
+ r = s->res;
ret = update_domains(r, rdtgrp->closid);
if (ret)
goto out;
@@ -401,6 +405,7 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
struct seq_file *s, void *v)
{
struct rdt_hw_resource *hw_res;
+ struct resctrl_schema *schema;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
int ret = 0;
@@ -409,8 +414,10 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
rdtgrp = rdtgroup_kn_lock_live(of->kn);
if (rdtgrp) {
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
- for_each_alloc_enabled_rdt_resource(r)
+ list_for_each_entry(schema, &resctrl_schema_all, list) {
+ r = schema->res;
seq_printf(s, "%s:uninitialized\n", r->name);
+ }
} else if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
if (!rdtgrp->plr->d) {
rdt_last_cmd_clear();
@@ -424,8 +431,8 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
}
} else {
closid = rdtgrp->closid;
- for_each_alloc_enabled_rdt_resource(r) {
- hw_res = resctrl_to_arch_res(r);
+ list_for_each_entry(schema, &resctrl_schema_all, list) {
+ hw_res = resctrl_to_arch_res(schema->res);
if (closid < hw_res->num_closid)
show_doms(s, r, closid);
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index b5702238797b..4f25c3c6f6e8 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -104,12 +104,12 @@ int closids_supported(void)
static void closid_init(void)
{
struct rdt_hw_resource *hw_res;
- struct rdt_resource *r;
+ struct resctrl_schema *s;
int rdt_min_closid = 32;
/* Compute rdt_min_closid across all resources */
- for_each_alloc_enabled_rdt_resource(r) {
- hw_res = resctrl_to_arch_res(r);
+ list_for_each_entry(s, &resctrl_schema_all, list) {
+ hw_res = resctrl_to_arch_res(s->res);
rdt_min_closid = min(rdt_min_closid, hw_res->num_closid);
}
@@ -1276,11 +1276,13 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
{
struct rdt_hw_domain *hw_dom;
int closid = rdtgrp->closid;
+ struct resctrl_schema *s;
struct rdt_resource *r;
bool has_cache = false;
struct rdt_domain *d;
- for_each_alloc_enabled_rdt_resource(r) {
+ list_for_each_entry(s, &resctrl_schema_all, list) {
+ r = s->res;
if (r->rid == RDT_RESOURCE_MBA)
continue;
has_cache = true;
@@ -1418,6 +1420,7 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
static int rdtgroup_size_show(struct kernfs_open_file *of,
struct seq_file *s, void *v)
{
+ struct resctrl_schema *schema;
struct rdt_hw_domain *hw_dom;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
@@ -1449,7 +1452,8 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
goto out;
}
- for_each_alloc_enabled_rdt_resource(r) {
+ list_for_each_entry(schema, &resctrl_schema_all, list) {
+ r = schema->res;
sep = false;
seq_printf(s, "%*s:", max_name_width, r->name);
list_for_each_entry(d, &r->domains, list) {
@@ -2815,10 +2819,12 @@ static void rdtgroup_init_mba(struct rdt_resource *r)
/* Initialize the RDT group's allocations. */
static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
{
+ struct resctrl_schema *s;
struct rdt_resource *r;
int ret;
- for_each_alloc_enabled_rdt_resource(r) {
+ list_for_each_entry(s, &resctrl_schema_all, list) {
+ r = s->res;
if (r->rid == RDT_RESOURCE_MBA) {
rdtgroup_init_mba(r);
} else {
--
2.30.0
rdt_domain_reconfigure_cdp() infers whether CDP is enabled by
checking the alloc_capable and alloc_enabled flags of the data
resources.
Now that we have an explicit cdp_enabled, use that.
Reviewed-by: Jamie Iles <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 55c17adf763c..378cad0020da 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1933,14 +1933,16 @@ static int set_cache_qos_cfg(int level, bool enable)
/* Restore the qos cfg state when a domain comes online */
void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
{
- if (!r->alloc_capable)
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+
+ if (!hw_res->cdp_capable)
return;
if (r == &rdt_resources_all[RDT_RESOURCE_L2DATA].resctrl)
- l2_qos_cfg_update(&r->alloc_enabled);
+ l2_qos_cfg_update(&hw_res->cdp_enabled);
if (r == &rdt_resources_all[RDT_RESOURCE_L3DATA].resctrl)
- l3_qos_cfg_update(&r->alloc_enabled);
+ l3_qos_cfg_update(&hw_res->cdp_enabled);
}
/*
--
2.30.0
resctrl uses cbm_idx() to map a closid to an index in the
configuration array. This is based on a multiplier and offset
that are held in the resource.
To merge the resources, the resctrl arch code needs to calculate
the index from something else, as there will only be one resource.
Decide based on the staged configuration type. This makes the
static mult and offset parameters redundant.
Reviewed-by: Jamie Iles <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
arch/x86/kernel/cpu/resctrl/core.c | 12 ------------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 17 +++++++++++------
include/linux/resctrl.h | 6 ------
3 files changed, 11 insertions(+), 24 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index cb3186bc248b..8d5c1e9eefa1 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -69,8 +69,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level = 3,
.cache = {
.min_cbm_bits = 1,
- .cbm_idx_mult = 1,
- .cbm_idx_offset = 0,
},
.domains = domain_init(RDT_RESOURCE_L3),
.parse_ctrlval = parse_cbm,
@@ -89,8 +87,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level = 3,
.cache = {
.min_cbm_bits = 1,
- .cbm_idx_mult = 2,
- .cbm_idx_offset = 0,
},
.domains = domain_init(RDT_RESOURCE_L3DATA),
.parse_ctrlval = parse_cbm,
@@ -109,8 +105,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level = 3,
.cache = {
.min_cbm_bits = 1,
- .cbm_idx_mult = 2,
- .cbm_idx_offset = 1,
},
.domains = domain_init(RDT_RESOURCE_L3CODE),
.parse_ctrlval = parse_cbm,
@@ -129,8 +123,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level = 2,
.cache = {
.min_cbm_bits = 1,
- .cbm_idx_mult = 1,
- .cbm_idx_offset = 0,
},
.domains = domain_init(RDT_RESOURCE_L2),
.parse_ctrlval = parse_cbm,
@@ -149,8 +141,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level = 2,
.cache = {
.min_cbm_bits = 1,
- .cbm_idx_mult = 2,
- .cbm_idx_offset = 0,
},
.domains = domain_init(RDT_RESOURCE_L2DATA),
.parse_ctrlval = parse_cbm,
@@ -169,8 +159,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level = 2,
.cache = {
.min_cbm_bits = 1,
- .cbm_idx_mult = 2,
- .cbm_idx_offset = 1,
},
.domains = domain_init(RDT_RESOURCE_L2CODE),
.parse_ctrlval = parse_cbm,
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 12a898d42689..50266b524222 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -246,12 +246,17 @@ static int parse_line(char *line, struct resctrl_schema *s,
return -EINVAL;
}
-static unsigned int cbm_idx(struct rdt_resource *r, unsigned int closid)
+static u32 get_config_index(u32 closid, enum resctrl_conf_type type)
{
- if (r->rid == RDT_RESOURCE_MBA)
+ switch (type) {
+ default:
+ case CDP_BOTH:
return closid;
-
- return closid * r->cache.cbm_idx_mult + r->cache.cbm_idx_offset;
+ case CDP_CODE:
+ return (closid * 2) + 1;
+ case CDP_DATA:
+ return (closid * 2);
+ }
}
static bool apply_config(struct rdt_hw_domain *hw_dom,
@@ -297,7 +302,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
if (!cfg->have_new_ctrl)
continue;
- idx = cbm_idx(r, closid);
+ idx = get_config_index(closid, t);
if (!apply_config(hw_dom, cfg, idx, cpu_mask, mba_sc))
continue;
@@ -430,7 +435,7 @@ void resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type, u32 *value)
{
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
- u32 idx = cbm_idx(r, closid);
+ u32 idx = get_config_index(closid, type);
if (!is_mba_sc(r))
*value = hw_dom->ctrl_val[idx];
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 250e96c073db..abe280f8a76b 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -68,10 +68,6 @@ struct rdt_domain {
* struct resctrl_cache - Cache allocation related data
* @cbm_len: Length of the cache bit mask
* @min_cbm_bits: Minimum number of consecutive bits to be set
- * @cbm_idx_mult: Multiplier of CBM index
- * @cbm_idx_offset: Offset of CBM index. CBM index is computed by:
- * closid * cbm_idx_multi + cbm_idx_offset
- * in a cache bit mask
* @shareable_bits: Bitmask of shareable resource with other
* executing entities
* @arch_has_sparse_bitmaps: True if a bitmap like f00f is valid.
@@ -82,8 +78,6 @@ struct rdt_domain {
struct resctrl_cache {
unsigned int cbm_len;
unsigned int min_cbm_bits;
- unsigned int cbm_idx_mult; // TODO remove this
- unsigned int cbm_idx_offset; // TODO remove this
unsigned int shareable_bits;
bool arch_has_sparse_bitmaps;
bool arch_has_empty_bitmaps;
--
2.30.0
Now that resctrl uses the schema's configuration type is the source of
CODE/DATA configuration styles, and there is only one configuration
array between the three views of the resource, remove the CODE and DATA
aliases.
This allows the alloc_ctrlval_array() and complications around free()ing
the ctrl_val arrays to be removed.
To continue providing the CDP user-interface, the resctrl filesystem
code creates two schema for one resource when CDP is enabled, and
generates the names itself. This ensures the user interface is the
same when another architecture emulates CDPs behaviour.
Reviewed-by: Jamie Iles <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v1:
* rdt_get_cdp_config() is kept for its comment.
---
arch/x86/kernel/cpu/resctrl/core.c | 174 ++-----------------------
arch/x86/kernel/cpu/resctrl/internal.h | 4 -
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 122 ++++++++---------
3 files changed, 75 insertions(+), 225 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 5021a726e87d..7c20d0469b3a 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -78,42 +78,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.msr_base = MSR_IA32_L3_CBM_BASE,
.msr_update = cat_wrmsr,
},
- [RDT_RESOURCE_L3DATA] =
- {
- .conf_type = CDP_DATA,
- .resctrl = {
- .rid = RDT_RESOURCE_L3DATA,
- .name = "L3DATA",
- .cache_level = 3,
- .cache = {
- .min_cbm_bits = 1,
- },
- .domains = domain_init(RDT_RESOURCE_L3DATA),
- .parse_ctrlval = parse_cbm,
- .format_str = "%d=%0*x",
- .fflags = RFTYPE_RES_CACHE,
- },
- .msr_base = MSR_IA32_L3_CBM_BASE,
- .msr_update = cat_wrmsr,
- },
- [RDT_RESOURCE_L3CODE] =
- {
- .conf_type = CDP_CODE,
- .resctrl = {
- .rid = RDT_RESOURCE_L3CODE,
- .name = "L3CODE",
- .cache_level = 3,
- .cache = {
- .min_cbm_bits = 1,
- },
- .domains = domain_init(RDT_RESOURCE_L3CODE),
- .parse_ctrlval = parse_cbm,
- .format_str = "%d=%0*x",
- .fflags = RFTYPE_RES_CACHE,
- },
- .msr_base = MSR_IA32_L3_CBM_BASE,
- .msr_update = cat_wrmsr,
- },
[RDT_RESOURCE_L2] =
{
.conf_type = CDP_BOTH,
@@ -132,42 +96,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.msr_base = MSR_IA32_L2_CBM_BASE,
.msr_update = cat_wrmsr,
},
- [RDT_RESOURCE_L2DATA] =
- {
- .conf_type = CDP_DATA,
- .resctrl = {
- .rid = RDT_RESOURCE_L2DATA,
- .name = "L2DATA",
- .cache_level = 2,
- .cache = {
- .min_cbm_bits = 1,
- },
- .domains = domain_init(RDT_RESOURCE_L2DATA),
- .parse_ctrlval = parse_cbm,
- .format_str = "%d=%0*x",
- .fflags = RFTYPE_RES_CACHE,
- },
- .msr_base = MSR_IA32_L2_CBM_BASE,
- .msr_update = cat_wrmsr,
- },
- [RDT_RESOURCE_L2CODE] =
- {
- .conf_type = CDP_CODE,
- .resctrl = {
- .rid = RDT_RESOURCE_L2CODE,
- .name = "L2CODE",
- .cache_level = 2,
- .cache = {
- .min_cbm_bits = 1,
- },
- .domains = domain_init(RDT_RESOURCE_L2CODE),
- .parse_ctrlval = parse_cbm,
- .format_str = "%d=%0*x",
- .fflags = RFTYPE_RES_CACHE,
- },
- .msr_base = MSR_IA32_L2_CBM_BASE,
- .msr_update = cat_wrmsr,
- },
[RDT_RESOURCE_MBA] =
{
.conf_type = CDP_BOTH,
@@ -339,40 +267,24 @@ static void rdt_get_cache_alloc_cfg(int idx, struct rdt_resource *r)
r->alloc_enabled = true;
}
-static void rdt_get_cdp_config(int level, int type)
+static void rdt_get_cdp_config(int level)
{
- struct rdt_resource *r_l = &rdt_resources_all[level].resctrl;
- struct rdt_hw_resource *hw_res_l = resctrl_to_arch_res(r_l);
- struct rdt_resource *r = &rdt_resources_all[type].resctrl;
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
-
- hw_res->num_closid = hw_res_l->num_closid;
- r->cache.cbm_len = r_l->cache.cbm_len;
- r->default_ctrl = r_l->default_ctrl;
- r->cache.shareable_bits = r_l->cache.shareable_bits;
- r->data_width = (r->cache.cbm_len + 3) / 4;
- r->alloc_capable = true;
/*
* By default, CDP is disabled. CDP can be enabled by mount parameter
* "cdp" during resctrl file system mount time.
*/
- r->alloc_enabled = false;
rdt_resources_all[level].cdp_enabled = false;
- rdt_resources_all[type].cdp_enabled = false;
rdt_resources_all[level].cdp_capable = true;
- rdt_resources_all[type].cdp_capable = true;
}
static void rdt_get_cdp_l3_config(void)
{
- rdt_get_cdp_config(RDT_RESOURCE_L3, RDT_RESOURCE_L3DATA);
- rdt_get_cdp_config(RDT_RESOURCE_L3, RDT_RESOURCE_L3CODE);
+ rdt_get_cdp_config(RDT_RESOURCE_L3);
}
static void rdt_get_cdp_l2_config(void)
{
- rdt_get_cdp_config(RDT_RESOURCE_L2, RDT_RESOURCE_L2DATA);
- rdt_get_cdp_config(RDT_RESOURCE_L2, RDT_RESOURCE_L2CODE);
+ rdt_get_cdp_config(RDT_RESOURCE_L2);
}
static void
@@ -509,58 +421,6 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
}
}
-static u32 *alloc_ctrlval_array(struct rdt_resource *r, struct rdt_domain *d,
- bool mba_sc)
-{
- /* these are for the underlying hardware, they may not match r/d */
- struct rdt_domain *underlying_domain;
- struct rdt_hw_resource *hw_res;
- struct rdt_hw_domain *hw_dom;
- bool remapped;
-
- switch (r->rid) {
- case RDT_RESOURCE_L3DATA:
- case RDT_RESOURCE_L3CODE:
- hw_res = &rdt_resources_all[RDT_RESOURCE_L3];
- remapped = true;
- break;
- case RDT_RESOURCE_L2DATA:
- case RDT_RESOURCE_L2CODE:
- hw_res = &rdt_resources_all[RDT_RESOURCE_L2];
- remapped = true;
- break;
- default:
- hw_res = resctrl_to_arch_res(r);
- remapped = false;
- }
-
-
- /*
- * If we changed the resource, we need to search for the underlying
- * domain. Doing this for all resources would make it tricky to add the
- * first resource, as domains aren't added to a resource list until
- * after the ctrlval arrays have been allocated.
- */
- if (remapped)
- underlying_domain = rdt_find_domain(&hw_res->resctrl, d->id,
- NULL);
- else
- underlying_domain = d;
- hw_dom = resctrl_to_arch_dom(underlying_domain);
-
- if (mba_sc) {
- if (hw_dom->mbps_val)
- return hw_dom->mbps_val;
- return kmalloc_array(hw_res->num_closid,
- sizeof(*hw_dom->mbps_val), GFP_KERNEL);
- } else {
- if (hw_dom->ctrl_val)
- return hw_dom->ctrl_val;
- return kmalloc_array(hw_res->num_closid,
- sizeof(*hw_dom->ctrl_val), GFP_KERNEL);
- }
-}
-
static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
@@ -568,11 +428,13 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
struct msr_param m;
u32 *dc, *dm;
- dc = alloc_ctrlval_array(r, d, false);
+ dc = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->ctrl_val),
+ GFP_KERNEL);
if (!dc)
return -ENOMEM;
- dm = alloc_ctrlval_array(r, d, true);
+ dm = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->mbps_val),
+ GFP_KERNEL);
if (!dm) {
kfree(dc);
return -ENOMEM;
@@ -731,14 +593,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
if (d->plr)
d->plr->d = NULL;
- /* temporary: these four don't have a unique ctrlval array */
- if ((r->rid != RDT_RESOURCE_L3CODE) &&
- (r->rid != RDT_RESOURCE_L3DATA) &&
- (r->rid != RDT_RESOURCE_L2CODE) &&
- (r->rid != RDT_RESOURCE_L2DATA)) {
- kfree(hw_dom->ctrl_val);
- kfree(hw_dom->mbps_val);
- }
+ kfree(hw_dom->ctrl_val);
+ kfree(hw_dom->mbps_val);
bitmap_free(d->rmid_busy_llc);
kfree(d->mbm_total);
kfree(d->mbm_local);
@@ -1011,11 +867,7 @@ static __init void rdt_init_res_defs_intel(void)
hw_res = resctrl_to_arch_res(r);
if (r->rid == RDT_RESOURCE_L3 ||
- r->rid == RDT_RESOURCE_L3DATA ||
- r->rid == RDT_RESOURCE_L3CODE ||
- r->rid == RDT_RESOURCE_L2 ||
- r->rid == RDT_RESOURCE_L2DATA ||
- r->rid == RDT_RESOURCE_L2CODE) {
+ r->rid == RDT_RESOURCE_L2) {
r->cache.arch_has_sparse_bitmaps = false;
r->cache.arch_has_empty_bitmaps = false;
r->cache.arch_has_per_cpu_cfg = false;
@@ -1035,11 +887,7 @@ static __init void rdt_init_res_defs_amd(void)
hw_res = resctrl_to_arch_res(r);
if (r->rid == RDT_RESOURCE_L3 ||
- r->rid == RDT_RESOURCE_L3DATA ||
- r->rid == RDT_RESOURCE_L3CODE ||
- r->rid == RDT_RESOURCE_L2 ||
- r->rid == RDT_RESOURCE_L2DATA ||
- r->rid == RDT_RESOURCE_L2CODE) {
+ r->rid == RDT_RESOURCE_L2) {
r->cache.arch_has_sparse_bitmaps = true;
r->cache.arch_has_empty_bitmaps = true;
r->cache.arch_has_per_cpu_cfg = true;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index ac36249f43ba..e06bc5ccf6cd 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -402,11 +402,7 @@ extern struct dentry *debugfs_resctrl;
enum resctrl_res_level {
RDT_RESOURCE_L3,
- RDT_RESOURCE_L3DATA,
- RDT_RESOURCE_L3CODE,
RDT_RESOURCE_L2,
- RDT_RESOURCE_L2DATA,
- RDT_RESOURCE_L2CODE,
RDT_RESOURCE_MBA,
/* Must be the last */
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 36e2905f4da6..1dfb41e1277a 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1880,10 +1880,10 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
if (!hw_res->cdp_capable)
return;
- if (r == &rdt_resources_all[RDT_RESOURCE_L2DATA].resctrl)
+ if (r->rid == RDT_RESOURCE_L2)
l2_qos_cfg_update(&hw_res->cdp_enabled);
- if (r == &rdt_resources_all[RDT_RESOURCE_L3DATA].resctrl)
+ if (r->rid == RDT_RESOURCE_L3)
l3_qos_cfg_update(&hw_res->cdp_enabled);
}
@@ -1912,68 +1912,42 @@ static int set_mba_sc(bool mba_sc)
return 0;
}
-static int cdp_enable(int level, int data_type, int code_type)
+static int cdp_enable(int level)
{
- struct rdt_resource *r_ldata = &rdt_resources_all[data_type].resctrl;
- struct rdt_resource *r_lcode = &rdt_resources_all[code_type].resctrl;
struct rdt_resource *r_l = &rdt_resources_all[level].resctrl;
int ret;
- if (!r_l->alloc_capable || !r_ldata->alloc_capable ||
- !r_lcode->alloc_capable)
+ if (!r_l->alloc_capable)
return -EINVAL;
ret = set_cache_qos_cfg(level, true);
- if (!ret) {
- r_l->alloc_enabled = false;
- r_ldata->alloc_enabled = true;
- r_lcode->alloc_enabled = true;
+ if (!ret)
rdt_resources_all[level].cdp_enabled = true;
- rdt_resources_all[data_type].cdp_enabled = true;
- rdt_resources_all[code_type].cdp_enabled = true;
- }
+
return ret;
}
-static void cdp_disable(int level, int data_type, int code_type)
+static void cdp_disable(int level)
{
struct rdt_hw_resource *r_hw = &rdt_resources_all[level];
- struct rdt_resource *r = &r_hw->resctrl;
-
- r->alloc_enabled = r->alloc_capable;
if (r_hw->cdp_enabled) {
- rdt_resources_all[data_type].resctrl.alloc_enabled = false;
- rdt_resources_all[code_type].resctrl.alloc_enabled = false;
set_cache_qos_cfg(level, false);
r_hw->cdp_enabled = false;
- rdt_resources_all[data_type].cdp_enabled = false;
- rdt_resources_all[code_type].cdp_enabled = false;
}
}
int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable)
{
struct rdt_hw_resource *hw_res = &rdt_resources_all[l];
- enum resctrl_res_level code_type, data_type;
if (!hw_res->cdp_capable)
return -EINVAL;
- if (l == RDT_RESOURCE_L3) {
- code_type = RDT_RESOURCE_L3CODE;
- data_type = RDT_RESOURCE_L3DATA;
- } else if (l == RDT_RESOURCE_L2) {
- code_type = RDT_RESOURCE_L2CODE;
- data_type = RDT_RESOURCE_L2DATA;
- } else {
- return -EINVAL;
- }
-
if (enable)
- return cdp_enable(l, data_type, code_type);
+ return cdp_enable(l);
- cdp_disable(l, data_type, code_type);
+ cdp_disable(l);
return 0;
}
@@ -2072,40 +2046,72 @@ static int rdt_enable_ctx(struct rdt_fs_context *ctx)
return ret;
}
-static int schemata_list_create(void)
+static int schemata_list_add(struct rdt_resource *r, enum resctrl_conf_type type)
{
- struct rdt_resource *r;
struct resctrl_schema *s;
+ const char *suffix = "";
int ret, cl;
- for_each_alloc_enabled_rdt_resource(r) {
- s = kzalloc(sizeof(*s), GFP_KERNEL);
- if (!s)
- return -ENOMEM;
-
- s->res = r;
- s->conf_type = resctrl_to_arch_res(r)->conf_type;
- s->num_closid = resctrl_arch_get_num_closid(r);
- if (resctrl_arch_get_cdp_enabled(r->rid))
- s->num_closid /= 2;
-
- ret = snprintf(s->name, sizeof(s->name), r->name);
- if (ret >= sizeof(s->name)) {
- kfree(s);
- return -EINVAL;
- }
+ s = kzalloc(sizeof(*s), GFP_KERNEL);
+ if (!s)
+ return -ENOMEM;
- cl = strlen(s->name);
- if (cl > max_name_width)
- max_name_width = cl;
+ s->res = r;
+ s->num_closid = resctrl_arch_get_num_closid(r);
+ if (resctrl_arch_get_cdp_enabled(r->rid))
+ s->num_closid /= 2;
- INIT_LIST_HEAD(&s->list);
- list_add(&s->list, &resctrl_schema_all);
+ s->conf_type = type;
+ switch (type) {
+ case CDP_CODE:
+ suffix = "CODE";
+ break;
+ case CDP_DATA:
+ suffix = "DATA";
+ break;
+ case CDP_BOTH:
+ suffix = "";
+ break;
}
+ ret = snprintf(s->name, sizeof(s->name), "%s%s", r->name, suffix);
+ if (ret >= sizeof(s->name)) {
+ kfree(s);
+ return -EINVAL;
+ }
+
+ cl = strlen(s->name);
+ if (cl > max_name_width)
+ max_name_width = cl;
+
+ INIT_LIST_HEAD(&s->list);
+ list_add(&s->list, &resctrl_schema_all);
+
return 0;
}
+static int schemata_list_create(void)
+{
+ struct rdt_resource *r;
+ int ret = 0;
+
+ for_each_alloc_enabled_rdt_resource(r) {
+ if (resctrl_arch_get_cdp_enabled(r->rid)) {
+ ret = schemata_list_add(r, CDP_CODE);
+ if (ret)
+ break;
+
+ ret = schemata_list_add(r, CDP_DATA);
+ } else
+ ret = schemata_list_add(r, CDP_BOTH);
+
+ if (ret)
+ break;
+ }
+
+ return ret;
+}
+
static void schemata_list_destroy(void)
{
struct resctrl_schema *s, *tmp;
--
2.30.0
Hi James,
Thanks for the patches. Few comments below.
On 3/12/21 11:58 AM, James Morse wrote:
> Hi folks,
>
> Thanks to Reinette and Jamie for the comments on v1. Major changes in v2 are
> to keep the closid in resctrl_arch_update_domains(), eliminating one patch,
> splitting another that was making two sorts of change, and to re-order the
> first few patches. See each patches changelog for more.
> ----
>
> This series re-folds the resctrl code so the CDP resources (L3CODE et al)
> behaviour is all contained in the filesystem parts, with a minimum amount
> of arch specific code.
>
> Arm have some CPU support for dividing caches into portions, and
> applying bandwidth limits at various points in the SoC. The collective term
> for these features is MPAM: Memory Partitioning and Monitoring.
>
> MPAM is similar enough to Intel RDT, that it should use the defacto linux
> interface: resctrl. This filesystem currently lives under arch/x86, and is
> tightly coupled to the architecture.
> Ultimately, my plan is to split the existing resctrl code up to have an
> arch<->fs abstraction, then move all the bits out to fs/resctrl. From there
> MPAM can be wired up.
>
> x86 might have two resources with cache controls, (L2 and L3) but has
> extra copies for CDP: L{2,3}{CODE,DATA}, which are marked as enabled
> if CDP is enabled for the corresponding cache.
>
> MPAM has an equivalent feature to CDP, but its a property of the CPU,
> not the cache. Resctrl needs to have x86's odd/even behaviour, as that
> its the ABI, but this isn't how the MPAM hardware works. It is entirely
> possible that an in-kernel user of MPAM would not be using CDP, whereas
> resctrl is.
> Pretending L3CODE and L3DATA are entirely separate resources is a neat
> trick, but doing this is specific to x86.
> Doing this leaves the arch code in control of various parts of the
> filesystem ABI: the resources names, and the way the schemata are parsed.
> Allowing this stuff to vary between architectures is bad for user space.
>
> This series collapses the CODE/DATA resources, moving all the user-visible
> resctrl ABI into what becomes the filesystem code. CDP becomes the type of
> configuration being applied to a cache. This is done by adding a
> struct resctrl_schema to the parts of resctrl that will move to fs. This
> holds the arch-code resource that is in use for this schema, along with
> other properties like the name, and whether the configuration being applied
> is CODE/DATA/BOTH.
I applied your patches on my AMD box. Seeing some difference in the behavior.
Before these patches.
# dmesg |grep -i resctrl
[ 13.076973] resctrl: L3 allocation detected
[ 13.087835] resctrl: L3DATA allocation detected
[ 13.092886] resctrl: L3CODE allocation detected
[ 13.097936] resctrl: MB allocation detected
[ 13.102599] resctrl: L3 monitoring detected
After the patches.
# dmesg |grep -i resctrl
[ 13.076973] resctrl: L3 allocation detected
[ 13.097936] resctrl: MB allocation detected
[ 13.102599] resctrl: L3 monitoring detected
You can see that L3DATA and L3CODE disappeared. I think we should keep the
behavior same for x86(at least).
I am still not clear why we needed resctrl_conf_type
enum resctrl_conf_type {
CDP_BOTH,
CDP_CODE,
CDP_DATA,
};
Right now, I see all the resources are initialized as CDP_BOTH.
[RDT_RESOURCE_L3] =
{
.conf_type = CDP_BOTH,
[RDT_RESOURCE_L2] =
{
.conf_type = CDP_BOTH,
[RDT_RESOURCE_MBA] =
{
.conf_type = CDP_BOTH,
If all the resources are CDP_BOTH, then why we need separate CDP_CODE and
CDP_DATA? Are these going to be different for ARM?
Also initializing RDT_RESOURCE_MBA as CDP_BOTH does not seem right. I dont
think there will CDP support in MBA in future.
>
> This lets us fold the extra resources out of the arch code so that they
> don't need to be duplicated if the equivalent feature to CDP is missing, or
> implemented in a different way.
>
>
> The first two patches split the resource and domain structs to have an
> arch specific 'hw' portion, and the rest that is visible to resctrl.
> Future series massage the resctrl code so there are no accesses to 'hw'
> structures in the parts of resctrl that will move to fs, providing helpers
> where necessary.
>
> This series adds temporary scaffolding, which it removes a few patches
> later. This is to allow things like the ctrlval arrays and resources to be
> merged separately, which should make is easier to bisect. These things
> are marked temporary, and should all be gone by the end of the series.
>
> This series is a little rough around the monitors, would a fake
> struct resctrl_schema for the monitors simplify things, or be a source
> of bugs?
>
> This series is based on v5.12-rc2, and can be retrieved from:
> git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/resctrl_merge_cdp/v2
>
> v1 was posted here:
> https://lore.kernel.org/lkml/[email protected]/
>
> Parts were previously posted as an RFC here:
> https://lore.kernel.org/lkml/[email protected]/
>
>
> Thanks,
>
> James Morse (24):
> x86/resctrl: Split struct rdt_resource
> x86/resctrl: Split struct rdt_domain
> x86/resctrl: Add a separate schema list for resctrl
> x86/resctrl: Pass the schema in info dir's private pointer
> x86/resctrl: Label the resources with their configuration type
> x86/resctrl: Walk the resctrl schema list instead of an arch list
> x86/resctrl: Store the effective num_closid in the schema
> x86/resctrl: Add resctrl_arch_get_num_closid()
> x86/resctrl: Pass the schema to resctrl filesystem functions
> x86/resctrl: Swizzle rdt_resource and resctrl_schema in
> pseudo_lock_region
> x86/resctrl: Move the schemata names into struct resctrl_schema
> x86/resctrl: Group staged configuration into a separate struct
> x86/resctrl: Allow different CODE/DATA configurations to be staged
> x86/resctrl: Rename update_domains() resctrl_arch_update_domains()
> x86/resctrl: Add a helper to read a closid's configuration
> x86/resctrl: Add a helper to read/set the CDP configuration
> x86/resctrl: Use cdp_enabled in rdt_domain_reconfigure_cdp()
> x86/resctrl: Pass configuration type to resctrl_arch_get_config()
> x86/resctrl: Make ctrlval arrays the same size
> x86/resctrl: Apply offset correction when config is staged
> x86/resctrl: Calculate the index from the configuration type
> x86/resctrl: Merge the ctrl_val arrays
> x86/resctrl: Remove rdt_cdp_peer_get()
> x86/resctrl: Merge the CDP resources
>
> arch/x86/kernel/cpu/resctrl/core.c | 272 +++++--------
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 164 +++++---
> arch/x86/kernel/cpu/resctrl/internal.h | 218 +++--------
> arch/x86/kernel/cpu/resctrl/monitor.c | 44 ++-
> arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 12 +-
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 447 ++++++++++++----------
> include/linux/resctrl.h | 176 +++++++++
> 7 files changed, 741 insertions(+), 592 deletions(-)
>
Hi James,
On 3/12/2021 9:58 AM, James Morse wrote:
> Once the arch code is abstracted from the resctrl filesystem code
> the separate schema for CDP are created by the filesystem code. This
> means the same resource is used for different schema, or types of
> configuration.
>
> Helpers like rdtgroup_cbm_overlaps() need the resctrl_schema to
> retrieve the configuration (or configurations). Before these
> helpers can be changed to take the schema instead of the resource,
> their callers must have the schema on hand.
>
> Change the users of for_each_alloc_enabled_rdt_resource() to walk
> the schema instead. Schema were only created for alloc_enabled resources
> so these two lists are currently equivalent.
Currently equivalent? Does this mean that at some point they will not be
equivalent and this change will be impacted?
>
> schemata_list_create() and rdt_kill_sb() are ignored. The first
> creates the schema list, and will eventually loop over the resource
> indexes using an arch helper to retrieve the resource. rdt_kill_sb()
> will eventually make use of an arch 'reset everything' helper.
Please elaborate on what "eventually" means here. It does not seem to
indicate this patch series so please clarify that and any impacts.
>
> After the filesystem code is moved, rdtgroup_pseudo_locked_in_hierarchy()
> remains part of the x86 specific hooks to support psuedo lock. This code
psuedo -> pseudo
Thank you
Reinette
Hi Babu,
On 30/03/2021 21:36, Babu Moger wrote:
> On 3/12/21 11:58 AM, James Morse wrote:
>> This series re-folds the resctrl code so the CDP resources (L3CODE et al)
>> behaviour is all contained in the filesystem parts, with a minimum amount
>> of arch specific code.
>>
>> Arm have some CPU support for dividing caches into portions, and
>> applying bandwidth limits at various points in the SoC. The collective term
>> for these features is MPAM: Memory Partitioning and Monitoring.
>>
>> MPAM is similar enough to Intel RDT, that it should use the defacto linux
>> interface: resctrl. This filesystem currently lives under arch/x86, and is
>> tightly coupled to the architecture.
>> Ultimately, my plan is to split the existing resctrl code up to have an
>> arch<->fs abstraction, then move all the bits out to fs/resctrl. From there
>> MPAM can be wired up.
>>
>> x86 might have two resources with cache controls, (L2 and L3) but has
>> extra copies for CDP: L{2,3}{CODE,DATA}, which are marked as enabled
>> if CDP is enabled for the corresponding cache.
>>
>> MPAM has an equivalent feature to CDP, but its a property of the CPU,
>> not the cache. Resctrl needs to have x86's odd/even behaviour, as that
>> its the ABI, but this isn't how the MPAM hardware works. It is entirely
>> possible that an in-kernel user of MPAM would not be using CDP, whereas
>> resctrl is.
>> Pretending L3CODE and L3DATA are entirely separate resources is a neat
>> trick, but doing this is specific to x86.
>> Doing this leaves the arch code in control of various parts of the
>> filesystem ABI: the resources names, and the way the schemata are parsed.
>> Allowing this stuff to vary between architectures is bad for user space.
>>
>> This series collapses the CODE/DATA resources, moving all the user-visible
>> resctrl ABI into what becomes the filesystem code. CDP becomes the type of
>> configuration being applied to a cache. This is done by adding a
>> struct resctrl_schema to the parts of resctrl that will move to fs. This
>> holds the arch-code resource that is in use for this schema, along with
>> other properties like the name, and whether the configuration being applied
>> is CODE/DATA/BOTH.
> I applied your patches on my AMD box.
Great! Thanks for taking a look,
> Seeing some difference in the behavior.
Ooer,
> Before these patches.
>
> # dmesg |grep -i resctrl
> [ 13.076973] resctrl: L3 allocation detected
> [ 13.087835] resctrl: L3DATA allocation detected
> [ 13.092886] resctrl: L3CODE allocation detected
> [ 13.097936] resctrl: MB allocation detected
> [ 13.102599] resctrl: L3 monitoring detected
>
>
> After the patches.
>
> # dmesg |grep -i resctrl
> [ 13.076973] resctrl: L3 allocation detected
> [ 13.097936] resctrl: MB allocation detected
> [ 13.102599] resctrl: L3 monitoring detected
>
> You can see that L3DATA and L3CODE disappeared. I think we should keep the
> behavior same for x86(at least).
This is the kernel log ... what user-space software is parsing that for an expected value?
What happens if the resctrl strings have been overwritten by more kernel log?
I don't think user-space should be relying on this. I'd argue any user-space doing this is
already broken. Is it just the kernel selftest's filter_dmesg()? It doesn't seem to do
anything useful
Whether resctrl is support can be read from /proc/filesystems. CDP is probably a
try-it-and-see. User-space could parse /proc/cpuinfo, but its probably not a good idea.
Its easy to fix, but it seems odd that the kernel has to print things for user-space to
try and parse. (I'd like to point at the user-space software that depends on this)
> I am still not clear why we needed resctrl_conf_type
>
> enum resctrl_conf_type {
> CDP_BOTH,
> CDP_CODE,
> CDP_DATA,
> };
>
> Right now, I see all the resources are initialized as CDP_BOTH.
>
> [RDT_RESOURCE_L3] =
> {
> .conf_type = CDP_BOTH,
> [RDT_RESOURCE_L2] =
> {
> .conf_type = CDP_BOTH,
> [RDT_RESOURCE_MBA] =
> {
> .conf_type = CDP_BOTH,
Ah, those should have been removed in patch 24. Once all the resources are the same, the
resource doesn't need to describe what kind it is.
> If all the resources are CDP_BOTH, then why we need separate CDP_CODE and
> CDP_DATA?
The filesystem code for resctrl that will eventually move out of arch/x86 needs to be able
to describe the type of configuration change being made back to the arch code. The enum
gets used for that.
x86 needs this as it affects which MSRs the configuration value is written to.
> Are these going to be different for ARM?
Nope. Arm's MPAM ends up emulating CDP with the closid values that get applied to
transactions.
> Also initializing RDT_RESOURCE_MBA as CDP_BOTH does not seem right. I dont
> think there will CDP support in MBA in future.
Its not code or data, which makes it both. 'BOTH' is more of a 'do nothing special', there
may be a better name, but I'm not very good at naming things. (any suggestions?)
Thanks,
James
On 4/6/21 12:19 PM, James Morse wrote:
> Hi Babu,
>
> On 30/03/2021 21:36, Babu Moger wrote:
>> On 3/12/21 11:58 AM, James Morse wrote:
>>> This series re-folds the resctrl code so the CDP resources (L3CODE et al)
>>> behaviour is all contained in the filesystem parts, with a minimum amount
>>> of arch specific code.
>>>
>>> Arm have some CPU support for dividing caches into portions, and
>>> applying bandwidth limits at various points in the SoC. The collective term
>>> for these features is MPAM: Memory Partitioning and Monitoring.
>>>
>>> MPAM is similar enough to Intel RDT, that it should use the defacto linux
>>> interface: resctrl. This filesystem currently lives under arch/x86, and is
>>> tightly coupled to the architecture.
>>> Ultimately, my plan is to split the existing resctrl code up to have an
>>> arch<->fs abstraction, then move all the bits out to fs/resctrl. From there
>>> MPAM can be wired up.
>>>
>>> x86 might have two resources with cache controls, (L2 and L3) but has
>>> extra copies for CDP: L{2,3}{CODE,DATA}, which are marked as enabled
>>> if CDP is enabled for the corresponding cache.
>>>
>>> MPAM has an equivalent feature to CDP, but its a property of the CPU,
>>> not the cache. Resctrl needs to have x86's odd/even behaviour, as that
>>> its the ABI, but this isn't how the MPAM hardware works. It is entirely
>>> possible that an in-kernel user of MPAM would not be using CDP, whereas
>>> resctrl is.
>>> Pretending L3CODE and L3DATA are entirely separate resources is a neat
>>> trick, but doing this is specific to x86.
>>> Doing this leaves the arch code in control of various parts of the
>>> filesystem ABI: the resources names, and the way the schemata are parsed.
>>> Allowing this stuff to vary between architectures is bad for user space.
>>>
>>> This series collapses the CODE/DATA resources, moving all the user-visible
>>> resctrl ABI into what becomes the filesystem code. CDP becomes the type of
>>> configuration being applied to a cache. This is done by adding a
>>> struct resctrl_schema to the parts of resctrl that will move to fs. This
>>> holds the arch-code resource that is in use for this schema, along with
>>> other properties like the name, and whether the configuration being applied
>>> is CODE/DATA/BOTH.
>
>
>> I applied your patches on my AMD box.
>
> Great! Thanks for taking a look,
>
>
>> Seeing some difference in the behavior.
>
> Ooer,
>
>
>> Before these patches.
>>
>> # dmesg |grep -i resctrl
>> [ 13.076973] resctrl: L3 allocation detected
>> [ 13.087835] resctrl: L3DATA allocation detected
>> [ 13.092886] resctrl: L3CODE allocation detected
>> [ 13.097936] resctrl: MB allocation detected
>> [ 13.102599] resctrl: L3 monitoring detected
>>
>>
>> After the patches.
>>
>> # dmesg |grep -i resctrl
>> [ 13.076973] resctrl: L3 allocation detected
>> [ 13.097936] resctrl: MB allocation detected
>> [ 13.102599] resctrl: L3 monitoring detected
>>
>> You can see that L3DATA and L3CODE disappeared. I think we should keep the
>> behavior same for x86(at least).
>
> This is the kernel log ... what user-space software is parsing that for an expected value?
> What happens if the resctrl strings have been overwritten by more kernel log?
>
> I don't think user-space should be relying on this. I'd argue any user-space doing this is
> already broken. Is it just the kernel selftest's filter_dmesg()? It doesn't seem to do
> anything useful
>
> Whether resctrl is support can be read from /proc/filesystems. CDP is probably a
> try-it-and-see. User-space could parse /proc/cpuinfo, but its probably not a good idea.
Yes. Agree. Looking at the dmesg may no be right way to figure out all the
support details. As a normal practice, I searched for these texts and
noticed difference. That is why I felt it is best to keep those texts same
as before.
>
>
> Its easy to fix, but it seems odd that the kernel has to print things for user-space to
> try and parse. (I'd like to point at the user-space software that depends on this)
I dont think there is any software that parses the dmesg for these
details. These are info messages for the developers.
>
>
>> I am still not clear why we needed resctrl_conf_type
>>
>> enum resctrl_conf_type {
>> CDP_BOTH,
>> CDP_CODE,
>> CDP_DATA,
>> };
>>
>> Right now, I see all the resources are initialized as CDP_BOTH.
>>
>> [RDT_RESOURCE_L3] =
>> {
>> .conf_type = CDP_BOTH,
>> [RDT_RESOURCE_L2] =
>> {
>> .conf_type = CDP_BOTH,
>> [RDT_RESOURCE_MBA] =
>> {
>> .conf_type = CDP_BOTH,
>
> Ah, those should have been removed in patch 24. Once all the resources are the same, the
> resource doesn't need to describe what kind it is.
>
>
>> If all the resources are CDP_BOTH, then why we need separate CDP_CODE and
>> CDP_DATA?
>
> The filesystem code for resctrl that will eventually move out of arch/x86 needs to be able
> to describe the type of configuration change being made back to the arch code. The enum
> gets used for that.
>
> x86 needs this as it affects which MSRs the configuration value is written to.
>
>
>> Are these going to be different for ARM?
>
> Nope. Arm's MPAM ends up emulating CDP with the closid values that get applied to
> transactions.
>
>
>> Also initializing RDT_RESOURCE_MBA as CDP_BOTH does not seem right. I dont
>> think there will CDP support in MBA in future.
>
> Its not code or data, which makes it both. 'BOTH' is more of a 'do nothing special', there
> may be a better name, but I'm not very good at naming things. (any suggestions?)
Do you think CDP_NONE will make some sense?
>
>
> Thanks,
>
> James
>
Hi Babu,
On 06/04/2021 22:37, Babu Moger wrote:
> On 4/6/21 12:19 PM, James Morse wrote:
>> On 30/03/2021 21:36, Babu Moger wrote:
>>> On 3/12/21 11:58 AM, James Morse wrote:
>>>> This series re-folds the resctrl code so the CDP resources (L3CODE et al)
>>>> behaviour is all contained in the filesystem parts, with a minimum amount
>>>> of arch specific code.
>>>> This series collapses the CODE/DATA resources, moving all the user-visible
>>>> resctrl ABI into what becomes the filesystem code. CDP becomes the type of
>>>> configuration being applied to a cache. This is done by adding a
>>>> struct resctrl_schema to the parts of resctrl that will move to fs. This
>>>> holds the arch-code resource that is in use for this schema, along with
>>>> other properties like the name, and whether the configuration being applied
>>>> is CODE/DATA/BOTH.
>>
>>
>>> I applied your patches on my AMD box.
>>
>> Great! Thanks for taking a look,
>>
>>
>>> Seeing some difference in the behavior.
>>
>> Ooer,
>>
>>
>>> Before these patches.
>>>
>>> # dmesg |grep -i resctrl
>>> [ 13.076973] resctrl: L3 allocation detected
>>> [ 13.087835] resctrl: L3DATA allocation detected
>>> [ 13.092886] resctrl: L3CODE allocation detected
>>> [ 13.097936] resctrl: MB allocation detected
>>> [ 13.102599] resctrl: L3 monitoring detected
>>>
>>>
>>> After the patches.
>>>
>>> # dmesg |grep -i resctrl
>>> [ 13.076973] resctrl: L3 allocation detected
>>> [ 13.097936] resctrl: MB allocation detected
>>> [ 13.102599] resctrl: L3 monitoring detected
>>>
>>> You can see that L3DATA and L3CODE disappeared. I think we should keep the
>>> behavior same for x86(at least).
>>
>> This is the kernel log ... what user-space software is parsing that for an expected value?
>> What happens if the resctrl strings have been overwritten by more kernel log?
>>
>> I don't think user-space should be relying on this. I'd argue any user-space doing this is
>> already broken. Is it just the kernel selftest's filter_dmesg()? It doesn't seem to do
>> anything useful
>>
>> Whether resctrl is support can be read from /proc/filesystems. CDP is probably a
>> try-it-and-see. User-space could parse /proc/cpuinfo, but its probably not a good idea.
> Yes. Agree. Looking at the dmesg may no be right way to figure out all the
> support details. As a normal practice, I searched for these texts and
> noticed difference. That is why I felt it is best to keep those texts same
> as before.
>> Its easy to fix, but it seems odd that the kernel has to print things for user-space to
>> try and parse. (I'd like to point at the user-space software that depends on this)
>
> I dont think there is any software that parses the dmesg for these
> details. These are info messages for the developers.
The kernel log changes all the time, messages at boot aren't something you can depend on
seeing later. Unless there is some user-space software broken by this, I'm afraid I don't
think its a good idea to add extra code to keep it the same.
Printing 'CDP supported by Lx' would be more useful to developers perusing the log. Even
more useful would be exposing feature attributes via sysfs to say what resctrl supports
without having to mount-it-and-see. This can then be used by user-space too.
e.g.:
| cat /sys/fs/ext4/features/fast_commit
>>> I am still not clear why we needed resctrl_conf_type
>>>
>>> enum resctrl_conf_type {
>>> CDP_BOTH,
>>> CDP_CODE,
>>> CDP_DATA,
>>> };
>>>
>>> Right now, I see all the resources are initialized as CDP_BOTH.
>>>
>>> [RDT_RESOURCE_L3] =
>>> {
>>> .conf_type = CDP_BOTH,
>>> [RDT_RESOURCE_L2] =
>>> {
>>> .conf_type = CDP_BOTH,
>>> [RDT_RESOURCE_MBA] =
>>> {
>>> .conf_type = CDP_BOTH,
>>
>> Ah, those should have been removed in patch 24. Once all the resources are the same, the
>> resource doesn't need to describe what kind it is.
>>
>>
>>> If all the resources are CDP_BOTH, then why we need separate CDP_CODE and
>>> CDP_DATA?
>>
>> The filesystem code for resctrl that will eventually move out of arch/x86 needs to be able
>> to describe the type of configuration change being made back to the arch code. The enum
>> gets used for that.
>>
>> x86 needs this as it affects which MSRs the configuration value is written to.
>>
>>
>>> Are these going to be different for ARM?
>>
>> Nope. Arm's MPAM ends up emulating CDP with the closid values that get applied to
>> transactions.
>>
>>
>>> Also initializing RDT_RESOURCE_MBA as CDP_BOTH does not seem right. I dont
>>> think there will CDP support in MBA in future.
>>
>> Its not code or data, which makes it both. 'BOTH' is more of a 'do nothing special', there
>> may be a better name, but I'm not very good at naming things. (any suggestions?)
> Do you think CDP_NONE will make some sense?
If you think that is clearer, sure.
Thanks,
James