2022-09-02 15:58:46

by James Morse

[permalink] [raw]
Subject: [PATCH v6 00/21] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes

Changes in this version?
* Changed supports_mba_mbps() to use is_mbm_local_enabled()

---
The aim of this series is to insert a split between the parts of the monitor
code that the architecture must implement, and those that are part of the
resctrl filesystem. The eventual aim is to move all filesystem parts out
to live in /fs/resctrl, so that resctrl can be wired up for MPAM.

What's MPAM? See the cover letter of a previous series. [1]

The series adds domain online/offline callbacks to allow the filesystem to
manage some of its structures itself, then moves all the 'mba_sc' behaviour
to be part of the filesystem.
This means another architecture doesn't need to provide an mbps_val array.
As its all software, the resctrl filesystem should be able to do this without
any help from the architecture code.

Finally __rmid_read() is refactored to be the API call that the architecture
provides to read a counter value. All the hardware specific overflow detection,
scaling and value correction should occur behind this helper.


This series is based on v6.0-rc3, and can be retrieved from:
git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/resctrl_monitors_in_bytes/v6

[0] git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/resctrl_merge_cdp/v7
[1] https://lore.kernel.org/lkml/[email protected]/

Previous versions of this series:
[v1] https://lore.kernel.org/lkml/[email protected]/
[v2] https://lore.kernel.org/lkml/[email protected]/
[v3] https://lore.kernel.org/lkml/[email protected]/
[v4] https://lore.kernel.org/lkml/[email protected]/
[v5] https://lore.kernel.org/lkml/[email protected]/

James Morse (21):
x86/resctrl: Kill off alloc_enabled
x86/resctrl: Merge mon_capable and mon_enabled
x86/resctrl: Add domain online callback for resctrl work
x86/resctrl: Group struct rdt_hw_domain cleanup
x86/resctrl: Add domain offline callback for resctrl work
x86/resctrl: Remove set_mba_sc()s control array re-initialisation
x86/resctrl: Abstract and use supports_mba_mbps()
x86/resctrl: Create mba_sc configuration in the rdt_domain
x86/resctrl: Switch over to the resctrl mbps_val list
x86/resctrl: Remove architecture copy of mbps_val
x86/resctrl: Allow update_mba_bw() to update controls directly
x86/resctrl: Calculate bandwidth from the previous __mon_event_count()
chunks
x86/resctrl: Add per-rmid arch private storage for overflow and chunks
x86/resctrl: Allow per-rmid arch private storage to be reset
x86/resctrl: Abstract __rmid_read()
x86/resctrl: Pass the required parameters into
resctrl_arch_rmid_read()
x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()
x86/resctrl: Move get_corrected_mbm_count() into
resctrl_arch_rmid_read()
x86/resctrl: Rename and change the units of resctrl_cqm_threshold
x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's
boot_cpu_data
x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes

arch/x86/include/asm/resctrl.h | 9 +
arch/x86/kernel/cpu/resctrl/core.c | 117 ++++-------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 75 ++++---
arch/x86/kernel/cpu/resctrl/internal.h | 61 +++---
arch/x86/kernel/cpu/resctrl/monitor.c | 232 ++++++++++++++--------
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 216 ++++++++++++++++----
include/linux/resctrl.h | 64 +++++-
8 files changed, 514 insertions(+), 262 deletions(-)

--
2.30.2


2022-09-02 15:59:00

by James Morse

[permalink] [raw]
Subject: [PATCH v6 04/21] x86/resctrl: Group struct rdt_hw_domain cleanup

domain_add_cpu() and domain_remove_cpu() need to kfree() the child
arrays that were allocated by domain_setup_ctrlval().

As this memory is moved around, and new arrays are created, adjusting
the error handling cleanup code becomes noisier.

To simplify this, move all the kfree() calls into a domain_free() helper.
This depends on struct rdt_hw_domain being kzalloc()d, allowing it to
unconditionally kfree() all the child arrays.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v2:
* Made domain_free() static.

Changes since v1:
* This patch is new
---
arch/x86/kernel/cpu/resctrl/core.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 25f30148478b..e37889f7a1a5 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -414,6 +414,13 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
}
}

+static void domain_free(struct rdt_hw_domain *hw_dom)
+{
+ kfree(hw_dom->ctrl_val);
+ kfree(hw_dom->mbps_val);
+ kfree(hw_dom);
+}
+
static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
@@ -488,7 +495,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
rdt_domain_reconfigure_cdp(r);

if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
- kfree(hw_dom);
+ domain_free(hw_dom);
return;
}

@@ -497,9 +504,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
err = resctrl_online_domain(r, d);
if (err) {
list_del(&d->list);
- kfree(hw_dom->ctrl_val);
- kfree(hw_dom->mbps_val);
- kfree(hw_dom);
+ domain_free(hw_dom);
}
}

@@ -547,12 +552,10 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
if (d->plr)
d->plr->d = NULL;

- kfree(hw_dom->ctrl_val);
- kfree(hw_dom->mbps_val);
bitmap_free(d->rmid_busy_llc);
kfree(d->mbm_total);
kfree(d->mbm_local);
- kfree(hw_dom);
+ domain_free(hw_dom);
return;
}

--
2.30.2

2022-09-02 15:59:13

by James Morse

[permalink] [raw]
Subject: [PATCH v6 09/21] x86/resctrl: Switch over to the resctrl mbps_val list

Updates to resctrl's software controller follow the same path as
other configuration updates, but they don't modify the hardware state.
rdtgroup_schemata_write() uses parse_line() and the resource's
parse_ctrlval() function to stage the configuration.
resctrl_arch_update_domains() then updates the mbps_val[] array
instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update()
call that would update hardware.

This complicates the interface between resctrl's filesystem parts
and architecture specific code. It should be possible for mba_sc
to be completely implemented by the filesystem parts of resctrl. This
would allow it to work on a second architecture with no additional code.
resctrl_arch_update_domains() using the mbps_val[] array prevents this.

Change parse_bw() to write the configuration value directly to the
mbps_val[] array in the domain structure. Change rdtgroup_schemata_write()
to skip the call to resctrl_arch_update_domains(), meaning all the
mba_sc specific code in resctrl_arch_update_domains() can be removed.
On the read-side, show_doms() and update_mba_bw() are changed to read
the mbps_val[] array from the domain structure. With this,
resctrl_arch_get_config() no longer needs to consider mba_sc resources.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v4:
* Wording of comments.
* Restored mbps_val[] reset code to set_mba_sc().
* Moved mbps_val[] reset in rdtgroup_init_mba() to reduce noise.

Changes since v3:
* Added the rdtgroup_init_mba() hunk to avoid ~0 being written
to the ctrl_val array, and to only reset mbps_val[] when its going
to be used.

Changes since v2:
* Fixed some names in the commit message.
* Added missing 'or mbps_val[]' code to rdtgroup_size_show()

Changes since v1:
* Squashed out struct resctrl_mba_sc
* Removed stray paragraphs from commit message
---
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 44 ++++++++++++++---------
arch/x86/kernel/cpu/resctrl/monitor.c | 10 +++---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 27 ++++++++++----
3 files changed, 52 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 87666275eed9..9f45207a6c74 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -61,6 +61,7 @@ int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
struct rdt_domain *d)
{
struct resctrl_staged_config *cfg;
+ u32 closid = data->rdtgrp->closid;
struct rdt_resource *r = s->res;
unsigned long bw_val;

@@ -72,6 +73,12 @@ int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,

if (!bw_validate(data->buf, &bw_val, r))
return -EINVAL;
+
+ if (is_mba_sc(r)) {
+ d->mbps_val[closid] = bw_val;
+ return 0;
+ }
+
cfg->new_ctrl = bw_val;
cfg->have_new_ctrl = true;

@@ -261,14 +268,13 @@ static u32 get_config_index(u32 closid, enum resctrl_conf_type type)

static bool apply_config(struct rdt_hw_domain *hw_dom,
struct resctrl_staged_config *cfg, u32 idx,
- cpumask_var_t cpu_mask, bool mba_sc)
+ cpumask_var_t cpu_mask)
{
struct rdt_domain *dom = &hw_dom->d_resctrl;
- u32 *dc = !mba_sc ? hw_dom->ctrl_val : hw_dom->mbps_val;

- if (cfg->new_ctrl != dc[idx]) {
+ if (cfg->new_ctrl != hw_dom->ctrl_val[idx]) {
cpumask_set_cpu(cpumask_any(&dom->cpu_mask), cpu_mask);
- dc[idx] = cfg->new_ctrl;
+ hw_dom->ctrl_val[idx] = cfg->new_ctrl;

return true;
}
@@ -284,14 +290,12 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
enum resctrl_conf_type t;
cpumask_var_t cpu_mask;
struct rdt_domain *d;
- bool mba_sc;
int cpu;
u32 idx;

if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
return -ENOMEM;

- mba_sc = is_mba_sc(r);
msr_param.res = NULL;
list_for_each_entry(d, &r->domains, list) {
hw_dom = resctrl_to_arch_dom(d);
@@ -301,7 +305,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
continue;

idx = get_config_index(closid, t);
- if (!apply_config(hw_dom, cfg, idx, cpu_mask, mba_sc))
+ if (!apply_config(hw_dom, cfg, idx, cpu_mask))
continue;

if (!msr_param.res) {
@@ -315,11 +319,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
}
}

- /*
- * Avoid writing the control msr with control values when
- * MBA software controller is enabled
- */
- if (cpumask_empty(cpu_mask) || mba_sc)
+ if (cpumask_empty(cpu_mask))
goto done;
cpu = get_cpu();
/* Update resource control msr on this CPU if it's in cpu_mask. */
@@ -406,6 +406,14 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,

list_for_each_entry(s, &resctrl_schema_all, list) {
r = s->res;
+
+ /*
+ * Writes to mba_sc resources update the software controller,
+ * not the control msr.
+ */
+ if (is_mba_sc(r))
+ continue;
+
ret = resctrl_arch_update_domains(r, rdtgrp->closid);
if (ret)
goto out;
@@ -433,9 +441,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
u32 idx = get_config_index(closid, type);

- if (!is_mba_sc(r))
- return hw_dom->ctrl_val[idx];
- return hw_dom->mbps_val[idx];
+ return hw_dom->ctrl_val[idx];
}

static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int closid)
@@ -450,8 +456,12 @@ static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int clo
if (sep)
seq_puts(s, ";");

- ctrl_val = resctrl_arch_get_config(r, dom, closid,
- schema->conf_type);
+ if (is_mba_sc(r))
+ ctrl_val = dom->mbps_val[closid];
+ else
+ ctrl_val = resctrl_arch_get_config(r, dom, closid,
+ schema->conf_type);
+
seq_printf(s, r->format_str, dom->id, max_data_width,
ctrl_val);
sep = true;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 497cadf3285d..16028b2f756a 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -447,13 +447,11 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
hw_dom_mba = resctrl_to_arch_dom(dom_mba);

cur_bw = pmbm_data->prev_bw;
- user_bw = resctrl_arch_get_config(r_mba, dom_mba, closid, CDP_NONE);
+ user_bw = dom_mba->mbps_val[closid];
delta_bw = pmbm_data->delta_bw;
- /*
- * resctrl_arch_get_config() chooses the mbps/ctrl value to return
- * based on is_mba_sc(). For now, reach into the hw_dom.
- */
- cur_msr_val = hw_dom_mba->ctrl_val[closid];
+
+ /* MBA resource doesn't support CDP */
+ cur_msr_val = resctrl_arch_get_config(r_mba, dom_mba, closid, CDP_NONE);

/*
* For Ctrl groups read data from child monitor groups.
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f7ebd019e7a5..55d8a12287c3 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1356,11 +1356,13 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
struct seq_file *s, void *v)
{
struct resctrl_schema *schema;
+ enum resctrl_conf_type type;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
struct rdt_domain *d;
unsigned int size;
int ret = 0;
+ u32 closid;
bool sep;
u32 ctrl;

@@ -1386,8 +1388,11 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
goto out;
}

+ closid = rdtgrp->closid;
+
list_for_each_entry(schema, &resctrl_schema_all, list) {
r = schema->res;
+ type = schema->conf_type;
sep = false;
seq_printf(s, "%*s:", max_name_width, schema->name);
list_for_each_entry(d, &r->domains, list) {
@@ -1396,9 +1401,12 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
size = 0;
} else {
- ctrl = resctrl_arch_get_config(r, d,
- rdtgrp->closid,
- schema->conf_type);
+ if (is_mba_sc(r))
+ ctrl = d->mbps_val[closid];
+ else
+ ctrl = resctrl_arch_get_config(r, d,
+ closid,
+ type);
if (r->rid == RDT_RESOURCE_MBA)
size = ctrl;
else
@@ -2818,14 +2826,19 @@ static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid)
}

/* Initialize MBA resource with default values. */
-static void rdtgroup_init_mba(struct rdt_resource *r)
+static void rdtgroup_init_mba(struct rdt_resource *r, u32 closid)
{
struct resctrl_staged_config *cfg;
struct rdt_domain *d;

list_for_each_entry(d, &r->domains, list) {
+ if (is_mba_sc(r)) {
+ d->mbps_val[closid] = MBA_MAX_MBPS;
+ continue;
+ }
+
cfg = &d->staged_config[CDP_NONE];
- cfg->new_ctrl = is_mba_sc(r) ? MBA_MAX_MBPS : r->default_ctrl;
+ cfg->new_ctrl = r->default_ctrl;
cfg->have_new_ctrl = true;
}
}
@@ -2840,7 +2853,9 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
list_for_each_entry(s, &resctrl_schema_all, list) {
r = s->res;
if (r->rid == RDT_RESOURCE_MBA) {
- rdtgroup_init_mba(r);
+ rdtgroup_init_mba(r, rdtgrp->closid);
+ if (is_mba_sc(r))
+ continue;
} else {
ret = rdtgroup_init_cat(s, rdtgrp->closid);
if (ret < 0)
--
2.30.2

2022-09-02 15:59:28

by James Morse

[permalink] [raw]
Subject: [PATCH v6 10/21] x86/resctrl: Remove architecture copy of mbps_val

The resctrl arch code provides a second configuration array mbps_val[]
for the MBA software controller.

Since resctrl switched over to allocating and freeing its own array
when needed, nothing uses the arch code version.

Remove it.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v2:
* Made setup_default_ctrlval() static.

Changes since v1:
* Fixed spelling mistake
* Capitalisation
---
arch/x86/kernel/cpu/resctrl/core.c | 20 ++++----------------
arch/x86/kernel/cpu/resctrl/internal.h | 3 ---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 4 +---
3 files changed, 5 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index f69182973175..f0e2820af475 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -397,7 +397,7 @@ struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
return NULL;
}

-void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
+static void setup_default_ctrlval(struct rdt_resource *r, u32 *dc)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
int i;
@@ -406,18 +406,14 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
* Initialize the Control MSRs to having no control.
* For Cache Allocation: Set all bits in cbm
* For Memory Allocation: Set b/w requested to 100%
- * and the bandwidth in MBps to U32_MAX
*/
- for (i = 0; i < hw_res->num_closid; i++, dc++, dm++) {
+ for (i = 0; i < hw_res->num_closid; i++, dc++)
*dc = r->default_ctrl;
- *dm = MBA_MAX_MBPS;
- }
}

static void domain_free(struct rdt_hw_domain *hw_dom)
{
kfree(hw_dom->ctrl_val);
- kfree(hw_dom->mbps_val);
kfree(hw_dom);
}

@@ -426,23 +422,15 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct msr_param m;
- u32 *dc, *dm;
+ u32 *dc;

dc = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->ctrl_val),
GFP_KERNEL);
if (!dc)
return -ENOMEM;

- dm = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->mbps_val),
- GFP_KERNEL);
- if (!dm) {
- kfree(dc);
- return -ENOMEM;
- }
-
hw_dom->ctrl_val = dc;
- hw_dom->mbps_val = dm;
- setup_default_ctrlval(r, dc, dm);
+ setup_default_ctrlval(r, dc);

m.low = 0;
m.high = hw_res->num_closid;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index a7e2cbce29d5..373aaba53ecd 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -308,14 +308,12 @@ struct mbm_state {
* a resource
* @d_resctrl: Properties exposed to the resctrl file system
* @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID)
- * @mbps_val: When mba_sc is enabled, this holds the bandwidth in MBps
*
* Members of this structure are accessed via helpers that provide abstraction.
*/
struct rdt_hw_domain {
struct rdt_domain d_resctrl;
u32 *ctrl_val;
- u32 *mbps_val;
};

static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
@@ -529,7 +527,6 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom,
void mbm_handle_overflow(struct work_struct *work);
void __init intel_rdt_mbm_apply_quirk(void);
bool is_mba_sc(struct rdt_resource *r);
-void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm);
u32 delay_bw_map(unsigned long bw, struct rdt_resource *r);
void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
void cqm_handle_limbo(struct work_struct *work);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 55d8a12287c3..6c33dfe7ea53 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2370,10 +2370,8 @@ static int reset_all_ctrls(struct rdt_resource *r)
hw_dom = resctrl_to_arch_dom(d);
cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);

- for (i = 0; i < hw_res->num_closid; i++) {
+ for (i = 0; i < hw_res->num_closid; i++)
hw_dom->ctrl_val[i] = r->default_ctrl;
- hw_dom->mbps_val[i] = MBA_MAX_MBPS;
- }
}
cpu = get_cpu();
/* Update CBM on this cpu if it's in cpu_mask. */
--
2.30.2

2022-09-02 16:00:21

by James Morse

[permalink] [raw]
Subject: [PATCH v6 14/21] x86/resctrl: Allow per-rmid arch private storage to be reset

To abstract the rmid counters into a helper that returns the number
of bytes counted, architecture specific per-rmid state is needed.

It needs to be possible to reset this hidden state, as the values
may outlive the life of an rmid, or the mount time of the filesystem.

mon_event_read() is called with first = true when an rmid is first
allocated in mkdir_mondata_subdir(). Add resctrl_arch_reset_rmid()
and call it from __mon_event_count()'s rr->first check.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>

---
Changes since v1:
* Aded WARN_ON_ONCE() for a case that should never happen.
---
arch/x86/kernel/cpu/resctrl/internal.h | 18 ++++---------
arch/x86/kernel/cpu/resctrl/monitor.c | 35 +++++++++++++++++++++++++-
include/linux/resctrl.h | 23 +++++++++++++++++
3 files changed, 62 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 4de8e5bb93e1..b34a1403f033 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -22,14 +22,6 @@

#define L2_QOS_CDP_ENABLE 0x01ULL

-/*
- * Event IDs are used to program IA32_QM_EVTSEL before reading event
- * counter from IA32_QM_CTR
- */
-#define QOS_L3_OCCUP_EVENT_ID 0x01
-#define QOS_L3_MBM_TOTAL_EVENT_ID 0x02
-#define QOS_L3_MBM_LOCAL_EVENT_ID 0x03
-
#define CQM_LIMBOCHECK_INTERVAL 1000

#define MBM_CNTR_WIDTH_BASE 24
@@ -73,7 +65,7 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
* @list: entry in &rdt_resource->evt_list
*/
struct mon_evt {
- u32 evtid;
+ enum resctrl_event_id evtid;
char *name;
struct list_head list;
};
@@ -90,9 +82,9 @@ struct mon_evt {
union mon_data_bits {
void *priv;
struct {
- unsigned int rid : 10;
- unsigned int evtid : 8;
- unsigned int domid : 14;
+ unsigned int rid : 10;
+ enum resctrl_event_id evtid : 8;
+ unsigned int domid : 14;
} u;
};

@@ -100,7 +92,7 @@ struct rmid_read {
struct rdtgroup *rgrp;
struct rdt_resource *r;
struct rdt_domain *d;
- int evtid;
+ enum resctrl_event_id evtid;
bool first;
u64 val;
};
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 2d81b6cd9632..e9755143492b 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -137,7 +137,37 @@ static inline struct rmid_entry *__rmid_entry(u32 rmid)
return entry;
}

-static u64 __rmid_read(u32 rmid, u32 eventid)
+static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain *hw_dom,
+ u32 rmid,
+ enum resctrl_event_id eventid)
+{
+ switch (eventid) {
+ case QOS_L3_OCCUP_EVENT_ID:
+ return NULL;
+ case QOS_L3_MBM_TOTAL_EVENT_ID:
+ return &hw_dom->arch_mbm_total[rmid];
+ case QOS_L3_MBM_LOCAL_EVENT_ID:
+ return &hw_dom->arch_mbm_local[rmid];
+ }
+
+ /* Never expect to get here */
+ WARN_ON_ONCE(1);
+
+ return NULL;
+}
+
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+ u32 rmid, enum resctrl_event_id eventid)
+{
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct arch_mbm_state *am;
+
+ am = get_arch_mbm_state(hw_dom, rmid, eventid);
+ if (am)
+ memset(am, 0, sizeof(*am));
+}
+
+static u64 __rmid_read(u32 rmid, enum resctrl_event_id eventid)
{
u64 val;

@@ -291,6 +321,9 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
struct mbm_state *m;
u64 chunks, tval;

+ if (rr->first)
+ resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
+
tval = __rmid_read(rmid, rr->evtid);
if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
return tval;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index f4c9101df461..818456770176 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -32,6 +32,16 @@ enum resctrl_conf_type {

#define CDP_NUM_TYPES (CDP_DATA + 1)

+/*
+ * Event IDs, the values match those used to program IA32_QM_EVTSEL before
+ * reading IA32_QM_CTR on RDT systems.
+ */
+enum resctrl_event_id {
+ QOS_L3_OCCUP_EVENT_ID = 0x01,
+ QOS_L3_MBM_TOTAL_EVENT_ID = 0x02,
+ QOS_L3_MBM_LOCAL_EVENT_ID = 0x03,
+};
+
/**
* struct resctrl_staged_config - parsed configuration to be applied
* @new_ctrl: new ctrl value to be loaded
@@ -210,4 +220,17 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);

+/**
+ * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
+ * and eventid.
+ * @r: The domain's resource.
+ * @d: The rmid's domain.
+ * @rmid: The rmid whose counter values should be reset.
+ * @eventid: The eventid whose counter values should be reset.
+ *
+ * This can be called from any CPU.
+ */
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+ u32 rmid, enum resctrl_event_id eventid);
+
#endif /* _RESCTRL_H */
--
2.30.2

2022-09-02 16:00:41

by James Morse

[permalink] [raw]
Subject: [PATCH v6 16/21] x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read()

resctrl_arch_rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a hardware register. Currently the function
returns the MBM values in chunks directly from hardware.

To convert this to bytes, some correction and overflow calculations
are needed. These depend on the resource and domain structures.
Overflow detection requires the old chunks value. None of this
is available to resctrl_arch_rmid_read(). MPAM requires the
resource and domain structures to find the MMIO device that holds
the registers.

Pass the resource and domain to resctrl_arch_rmid_read(). This makes
rmid_dirty() too big. Instead merge it with its only caller, and the
name is kept as a local variable.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v3:
* Added comment about where resctrl_arch_rmid_read() can be called from.

Changes since v2:
* Typos.
* Kerneldoc fixes.

This is all a little noisy for __mon_event_count(), as the switch
statement work is now before the resctrl_arch_rmid_read() call.
---
arch/x86/kernel/cpu/resctrl/monitor.c | 31 +++++++++++++++------------
include/linux/resctrl.h | 18 +++++++++++++++-
2 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 51ab76f2dfbc..262141bf4264 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -167,10 +167,14 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
memset(am, 0, sizeof(*am));
}

-int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
+ u32 rmid, enum resctrl_event_id eventid, u64 *val)
{
u64 msr_val;

+ if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
+ return -EINVAL;
+
/*
* As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
* with a valid event code for supported resource type and the bits
@@ -192,16 +196,6 @@ int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
return 0;
}

-static bool rmid_dirty(struct rmid_entry *entry)
-{
- u64 val = 0;
-
- if (resctrl_arch_rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID, &val))
- return true;
-
- return val >= resctrl_cqm_threshold;
-}
-
/*
* Check the RMIDs that are marked as busy for this domain. If the
* reported LLC occupancy is below the threshold clear the busy bit and
@@ -213,6 +207,8 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
struct rmid_entry *entry;
struct rdt_resource *r;
u32 crmid = 1, nrmid;
+ bool rmid_dirty;
+ u64 val = 0;

r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;

@@ -228,7 +224,14 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
break;

entry = __rmid_entry(nrmid);
- if (force_free || !rmid_dirty(entry)) {
+
+ if (resctrl_arch_rmid_read(r, d, entry->rmid,
+ QOS_L3_OCCUP_EVENT_ID, &val))
+ rmid_dirty = true;
+ else
+ rmid_dirty = (val >= resctrl_cqm_threshold);
+
+ if (force_free || !rmid_dirty) {
clear_bit(entry->rmid, d->rmid_busy_llc);
if (!--entry->busy) {
rmid_limbo_count--;
@@ -278,7 +281,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
cpu = get_cpu();
list_for_each_entry(d, &r->domains, list) {
if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
- err = resctrl_arch_rmid_read(entry->rmid,
+ err = resctrl_arch_rmid_read(r, d, entry->rmid,
QOS_L3_OCCUP_EVENT_ID,
&val);
if (err || val <= resctrl_cqm_threshold)
@@ -336,7 +339,7 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
if (rr->first)
resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);

- rr->err = resctrl_arch_rmid_read(rmid, rr->evtid, &tval);
+ rr->err = resctrl_arch_rmid_read(rr->r, rr->d, rmid, rr->evtid, &tval);
if (rr->err)
return rr->err;

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index efe60dd7fd21..7ccfa0d1bb34 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -219,7 +219,23 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
-int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *res);
+
+/**
+ * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
+ * for this resource and domain.
+ * @r: resource that the counter should be read from.
+ * @d: domain that the counter should be read from.
+ * @rmid: rmid of the counter to read.
+ * @eventid: eventid to read, e.g. L3 occupancy.
+ * @val: result of the counter read in chunks.
+ *
+ * Call from process context on a CPU that belongs to domain @d.
+ *
+ * Return:
+ * 0 on success, or -EIO, -EINVAL etc on error.
+ */
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
+ u32 rmid, enum resctrl_event_id eventid, u64 *val);

/**
* resctrl_arch_reset_rmid() - Reset any private state associated with rmid
--
2.30.2

2022-09-02 16:00:42

by James Morse

[permalink] [raw]
Subject: [PATCH v6 17/21] x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()

resctrl_arch_rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a counter. Currently the function returns
the MBM values in chunks directly from hardware. When reading a bandwidth
counter, mbm_overflow_count() must be used to correct for any possible
overflow.

mbm_overflow_count() is architecture specific, its behaviour should
be part of resctrl_arch_rmid_read().

Move the mbm_overflow_count() calls into resctrl_arch_rmid_read().
This allows the resctrl filesystems's prev_msr to be removed in
favour of the architecture private version.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
arch/x86/kernel/cpu/resctrl/internal.h | 2 --
arch/x86/kernel/cpu/resctrl/monitor.c | 35 +++++++++++++++-----------
2 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 1d2e7bd6305f..8039e8aba7de 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -281,7 +281,6 @@ struct rftype {
/**
* struct mbm_state - status for each MBM counter in each domain
* @chunks: Total data moved (multiply by rdt_group.mon_scale to get bytes)
- * @prev_msr: Value of IA32_QM_CTR for this RMID last time we read it
* @prev_bw_chunks: Previous chunks value read for bandwidth calculation
* @prev_bw: The most recent bandwidth in MBps
* @delta_bw: Difference between the current and previous bandwidth
@@ -289,7 +288,6 @@ struct rftype {
*/
struct mbm_state {
u64 chunks;
- u64 prev_msr;
u64 prev_bw_chunks;
u32 prev_bw;
u32 delta_bw;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 262141bf4264..862a4462ed60 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -167,9 +167,20 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
memset(am, 0, sizeof(*am));
}

+static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
+{
+ u64 shift = 64 - width, chunks;
+
+ chunks = (cur_msr << shift) - (prev_msr << shift);
+ return chunks >> shift;
+}
+
int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
u32 rmid, enum resctrl_event_id eventid, u64 *val)
{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct arch_mbm_state *am;
u64 msr_val;

if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
@@ -191,7 +202,13 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
if (msr_val & RMID_VAL_UNAVAIL)
return -EINVAL;

- *val = msr_val;
+ am = get_arch_mbm_state(hw_dom, rmid, eventid);
+ if (am) {
+ *val = mbm_overflow_count(am->prev_msr, msr_val, hw_res->mbm_width);
+ am->prev_msr = msr_val;
+ } else {
+ *val = msr_val;
+ }

return 0;
}
@@ -322,19 +339,10 @@ void free_rmid(u32 rmid)
list_add_tail(&entry->list, &rmid_free_lru);
}

-static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
-{
- u64 shift = 64 - width, chunks;
-
- chunks = (cur_msr << shift) - (prev_msr << shift);
- return chunks >> shift;
-}
-
static int __mon_event_count(u32 rmid, struct rmid_read *rr)
{
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
struct mbm_state *m;
- u64 chunks, tval = 0;
+ u64 tval = 0;

if (rr->first)
resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
@@ -363,13 +371,10 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)

if (rr->first) {
memset(m, 0, sizeof(struct mbm_state));
- m->prev_msr = tval;
return 0;
}

- chunks = mbm_overflow_count(m->prev_msr, tval, hw_res->mbm_width);
- m->chunks += chunks;
- m->prev_msr = tval;
+ m->chunks += tval;

rr->val += get_corrected_mbm_count(rmid, m->chunks);

--
2.30.2

2022-09-02 16:01:02

by James Morse

[permalink] [raw]
Subject: [PATCH v6 18/21] x86/resctrl: Move get_corrected_mbm_count() into resctrl_arch_rmid_read()

resctrl_arch_rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a counter. Currently the function returns
the MBM values in chunks directly from hardware. When reading a bandwidth
counter, get_corrected_mbm_count() must be used to correct the
value read.

get_corrected_mbm_count() is architecture specific, this work should be
done in resctrl_arch_rmid_read().

Move the function calls. This allows the resctrl filesystems's chunks
value to be removed in favour of the architecture private version.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
arch/x86/kernel/cpu/resctrl/internal.h | 4 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 8 ++++----
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 8039e8aba7de..bdb55c2fbdd3 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -280,14 +280,12 @@ struct rftype {

/**
* struct mbm_state - status for each MBM counter in each domain
- * @chunks: Total data moved (multiply by rdt_group.mon_scale to get bytes)
* @prev_bw_chunks: Previous chunks value read for bandwidth calculation
* @prev_bw: The most recent bandwidth in MBps
* @delta_bw: Difference between the current and previous bandwidth
* @delta_comp: Indicates whether to compute the delta_bw
*/
struct mbm_state {
- u64 chunks;
u64 prev_bw_chunks;
u32 prev_bw;
u32 delta_bw;
@@ -297,10 +295,12 @@ struct mbm_state {
/**
* struct arch_mbm_state - values used to compute resctrl_arch_rmid_read()s
* return value.
+ * @chunks: Total data moved (multiply by rdt_group.mon_scale to get bytes)
* @prev_msr: Value of IA32_QM_CTR last time it was read for the RMID used to
* find this struct.
*/
struct arch_mbm_state {
+ u64 chunks;
u64 prev_msr;
};

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 862a4462ed60..27bb4947a176 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -204,7 +204,9 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,

am = get_arch_mbm_state(hw_dom, rmid, eventid);
if (am) {
- *val = mbm_overflow_count(am->prev_msr, msr_val, hw_res->mbm_width);
+ am->chunks += mbm_overflow_count(am->prev_msr, msr_val,
+ hw_res->mbm_width);
+ *val = get_corrected_mbm_count(rmid, am->chunks);
am->prev_msr = msr_val;
} else {
*val = msr_val;
@@ -374,9 +376,7 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
return 0;
}

- m->chunks += tval;
-
- rr->val += get_corrected_mbm_count(rmid, m->chunks);
+ rr->val += tval;

return 0;
}
--
2.30.2

2022-09-02 16:01:19

by James Morse

[permalink] [raw]
Subject: [PATCH v6 20/21] x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data

resctrl_rmid_realloc_threshold can be set by user-space. The maximum
value is specified by the architecture.

Currently max_threshold_occ_write() reads the maximum value from
boot_cpu_data.x86_cache_size, which is not portable to another
architecture.

Add resctrl_rmid_realloc_limit to describe the maximum size in bytes
that user-space can set the threshold to.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
arch/x86/kernel/cpu/resctrl/monitor.c | 9 +++++++--
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 2 +-
include/linux/resctrl.h | 1 +
3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index e91afe99b763..8d15568d7121 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -67,6 +67,11 @@ unsigned int rdt_mon_features;
*/
unsigned int resctrl_rmid_realloc_threshold;

+/*
+ * This is the maximum value for the reallocation threshold, in bytes.
+ */
+unsigned int resctrl_rmid_realloc_limit;
+
#define CF(cf) ((unsigned long)(1048576 * (cf) + 0.5))

/*
@@ -747,10 +752,10 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
{
unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- unsigned int cl_size = boot_cpu_data.x86_cache_size;
unsigned int threshold;
int ret;

+ resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1;
hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
@@ -767,7 +772,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
*
* For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
*/
- threshold = cl_size * 1024 / r->num_rmid;
+ threshold = resctrl_rmid_realloc_limit / r->num_rmid;

/*
* Because num_rmid may not be a power of two, round the value
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 849bdec37217..e5a48f05e787 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1059,7 +1059,7 @@ static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
if (ret)
return ret;

- if (bytes > (boot_cpu_data.x86_cache_size * 1024))
+ if (bytes > resctrl_rmid_realloc_limit)
return -EINVAL;

resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(bytes);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 9995d043650a..cb857f753322 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -251,5 +251,6 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
u32 rmid, enum resctrl_event_id eventid);

extern unsigned int resctrl_rmid_realloc_threshold;
+extern unsigned int resctrl_rmid_realloc_limit;

#endif /* _RESCTRL_H */
--
2.30.2

2022-09-02 16:01:22

by James Morse

[permalink] [raw]
Subject: [PATCH v6 21/21] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes

resctrl_arch_rmid_read() returns a value in chunks, as read from the
hardware. This needs scaling to bytes by mon_scale, as provided by
the architecture code.

Now that resctrl_arch_rmid_read() performs the overflow and corrections
itself, it may as well return a value in bytes directly. This allows
the accesses to the architecture specific 'hw' structure to be removed.

Move the mon_scale conversion into resctrl_arch_rmid_read().
mbm_bw_count() is updated to calculate bandwidth from bytes.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v3:
* Only set val in one place in restrl_arch_rmid_read()
* Use SZ_1M to improve readability
---
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 6 ++----
arch/x86/kernel/cpu/resctrl/internal.h | 4 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 24 +++++++++++------------
include/linux/resctrl.h | 2 +-
4 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index d3f7eb2ac14b..03fc91d8bc9f 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -549,7 +549,6 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
int rdtgroup_mondata_show(struct seq_file *m, void *arg)
{
struct kernfs_open_file *of = m->private;
- struct rdt_hw_resource *hw_res;
u32 resid, evtid, domid;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
@@ -569,8 +568,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
domid = md.u.domid;
evtid = md.u.evtid;

- hw_res = &rdt_resources_all[resid];
- r = &hw_res->r_resctrl;
+ r = &rdt_resources_all[resid].r_resctrl;
d = rdt_find_domain(r, domid, NULL);
if (IS_ERR_OR_NULL(d)) {
ret = -ENOENT;
@@ -584,7 +582,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
else if (rr.err == -EINVAL)
seq_puts(m, "Unavailable\n");
else
- seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
+ seq_printf(m, "%llu\n", rr.val);

out:
rdtgroup_kn_unlock(of->kn);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index c05e9b7cf77a..5f7128686cfd 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -279,13 +279,13 @@ struct rftype {

/**
* struct mbm_state - status for each MBM counter in each domain
- * @prev_bw_chunks: Previous chunks value read for bandwidth calculation
+ * @prev_bw_bytes: Previous bytes value read for bandwidth calculation
* @prev_bw: The most recent bandwidth in MBps
* @delta_bw: Difference between the current and previous bandwidth
* @delta_comp: Indicates whether to compute the delta_bw
*/
struct mbm_state {
- u64 prev_bw_chunks;
+ u64 prev_bw_bytes;
u32 prev_bw;
u32 delta_bw;
bool delta_comp;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 8d15568d7121..efe0c30d3a12 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -16,6 +16,7 @@
*/

#include <linux/module.h>
+#include <linux/sizes.h>
#include <linux/slab.h>

#include <asm/cpu_device_id.h>
@@ -189,7 +190,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct arch_mbm_state *am;
- u64 msr_val;
+ u64 msr_val, chunks;

if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
return -EINVAL;
@@ -214,12 +215,14 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
if (am) {
am->chunks += mbm_overflow_count(am->prev_msr, msr_val,
hw_res->mbm_width);
- *val = get_corrected_mbm_count(rmid, am->chunks);
+ chunks = get_corrected_mbm_count(rmid, am->chunks);
am->prev_msr = msr_val;
} else {
- *val = msr_val;
+ chunks = msr_val;
}

+ *val = chunks * hw_res->mon_scale;
+
return 0;
}

@@ -232,7 +235,6 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
void __check_limbo(struct rdt_domain *d, bool force_free)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rmid_entry *entry;
u32 crmid = 1, nrmid;
bool rmid_dirty;
@@ -255,7 +257,6 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
QOS_L3_OCCUP_EVENT_ID, &val)) {
rmid_dirty = true;
} else {
- val *= hw_res->mon_scale;
rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
}

@@ -299,7 +300,6 @@ int alloc_rmid(void)
static void add_rmid_to_limbo(struct rmid_entry *entry)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rdt_domain *d;
int cpu, err;
u64 val = 0;
@@ -311,7 +311,6 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
err = resctrl_arch_rmid_read(r, d, entry->rmid,
QOS_L3_OCCUP_EVENT_ID,
&val);
- val *= hw_res->mon_scale;
if (err || val <= resctrl_rmid_realloc_threshold)
continue;
}
@@ -403,15 +402,14 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
*/
static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
{
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
struct mbm_state *m = &rr->d->mbm_local[rmid];
- u64 cur_bw, chunks, cur_chunks;
+ u64 cur_bw, bytes, cur_bytes;

- cur_chunks = rr->val;
- chunks = cur_chunks - m->prev_bw_chunks;
- m->prev_bw_chunks = cur_chunks;
+ cur_bytes = rr->val;
+ bytes = cur_bytes - m->prev_bw_bytes;
+ m->prev_bw_bytes = cur_bytes;

- cur_bw = (chunks * hw_res->mon_scale) >> 20;
+ cur_bw = bytes / SZ_1M;

if (m->delta_comp)
m->delta_bw = abs(cur_bw - m->prev_bw);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index cb857f753322..0cf5b20c6ddf 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -227,7 +227,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
* @d: domain that the counter should be read from.
* @rmid: rmid of the counter to read.
* @eventid: eventid to read, e.g. L3 occupancy.
- * @val: result of the counter read in chunks.
+ * @val: result of the counter read in bytes.
*
* Call from process context on a CPU that belongs to domain @d.
*
--
2.30.2

2022-09-02 16:01:26

by James Morse

[permalink] [raw]
Subject: [PATCH v6 19/21] x86/resctrl: Rename and change the units of resctrl_cqm_threshold

resctrl_cqm_threshold is stored in a hardware specific chunk size,
but exposed to user-space as bytes.

This means the filesystem parts of resctrl need to know how the hardware
counts, to convert the user provided byte value to chunks. The interface
between the architecture's resctrl code and the filesystem ought to
treat everything as bytes.

Change the unit of resctrl_cqm_threshold to bytes. resctrl_arch_rmid_read()
still returns its value in chunks, so this needs converting to bytes.
As all the users have been touched, rename the variable to
resctrl_rmid_realloc_threshold, which describes what the value is for.

Neither r->num_rmid nor hw_res->mon_scale are guaranteed to be a power
of 2, so the existing code introduces a rounding error from resctrl's
theoretical fraction of the cache usage. This behaviour is kept as it
ensures the user visible value matches the value read from hardware
when the rmid will be reallocated.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v4:
* Added resctrl_arch_round_mon_val() to fix the user provided value.
* Keep the 'hw counts in chunks' comment.

Changes since v3:
* Preserved the rounding errors.
---
arch/x86/include/asm/resctrl.h | 9 ++++++
arch/x86/kernel/cpu/resctrl/internal.h | 1 -
arch/x86/kernel/cpu/resctrl/monitor.c | 43 ++++++++++++++++----------
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 9 ++----
include/linux/resctrl.h | 2 ++
5 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index d60ed0668a59..d24b04ebf950 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -81,6 +81,15 @@ static void __resctrl_sched_in(void)
}
}

+static inline unsigned int resctrl_arch_round_mon_val(unsigned int val)
+{
+ unsigned int scale = boot_cpu_data.x86_cache_occ_scale;
+
+ /* h/w works in units of "boot_cpu_data.x86_cache_occ_scale" */
+ val /= scale;
+ return val * scale;
+}
+
static inline void resctrl_sched_in(void)
{
if (static_branch_likely(&rdt_enable_key))
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index bdb55c2fbdd3..c05e9b7cf77a 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -98,7 +98,6 @@ struct rmid_read {
u64 val;
};

-extern unsigned int resctrl_cqm_threshold;
extern bool rdt_alloc_capable;
extern bool rdt_mon_capable;
extern unsigned int rdt_mon_features;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 27bb4947a176..e91afe99b763 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -17,7 +17,10 @@

#include <linux/module.h>
#include <linux/slab.h>
+
#include <asm/cpu_device_id.h>
+#include <asm/resctrl.h>
+
#include "internal.h"

struct rmid_entry {
@@ -37,8 +40,8 @@ static LIST_HEAD(rmid_free_lru);
* @rmid_limbo_count count of currently unused but (potentially)
* dirty RMIDs.
* This counts RMIDs that no one is currently using but that
- * may have a occupancy value > intel_cqm_threshold. User can change
- * the threshold occupancy value.
+ * may have a occupancy value > resctrl_rmid_realloc_threshold. User can
+ * change the threshold occupancy value.
*/
static unsigned int rmid_limbo_count;

@@ -59,10 +62,10 @@ bool rdt_mon_capable;
unsigned int rdt_mon_features;

/*
- * This is the threshold cache occupancy at which we will consider an
+ * This is the threshold cache occupancy in bytes at which we will consider an
* RMID available for re-allocation.
*/
-unsigned int resctrl_cqm_threshold;
+unsigned int resctrl_rmid_realloc_threshold;

#define CF(cf) ((unsigned long)(1048576 * (cf) + 0.5))

@@ -223,14 +226,13 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
*/
void __check_limbo(struct rdt_domain *d, bool force_free)
{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rmid_entry *entry;
- struct rdt_resource *r;
u32 crmid = 1, nrmid;
bool rmid_dirty;
u64 val = 0;

- r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
-
/*
* Skip RMID 0 and start from RMID 1 and check all the RMIDs that
* are marked as busy for occupancy < threshold. If the occupancy
@@ -245,10 +247,12 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
entry = __rmid_entry(nrmid);

if (resctrl_arch_rmid_read(r, d, entry->rmid,
- QOS_L3_OCCUP_EVENT_ID, &val))
+ QOS_L3_OCCUP_EVENT_ID, &val)) {
rmid_dirty = true;
- else
- rmid_dirty = (val >= resctrl_cqm_threshold);
+ } else {
+ val *= hw_res->mon_scale;
+ rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
+ }

if (force_free || !rmid_dirty) {
clear_bit(entry->rmid, d->rmid_busy_llc);
@@ -289,13 +293,12 @@ int alloc_rmid(void)

static void add_rmid_to_limbo(struct rmid_entry *entry)
{
- struct rdt_resource *r;
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rdt_domain *d;
int cpu, err;
u64 val = 0;

- r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
-
entry->busy = 0;
cpu = get_cpu();
list_for_each_entry(d, &r->domains, list) {
@@ -303,7 +306,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
err = resctrl_arch_rmid_read(r, d, entry->rmid,
QOS_L3_OCCUP_EVENT_ID,
&val);
- if (err || val <= resctrl_cqm_threshold)
+ val *= hw_res->mon_scale;
+ if (err || val <= resctrl_rmid_realloc_threshold)
continue;
}

@@ -744,6 +748,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
unsigned int cl_size = boot_cpu_data.x86_cache_size;
+ unsigned int threshold;
int ret;

hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
@@ -762,10 +767,14 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
*
* For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
*/
- resctrl_cqm_threshold = cl_size * 1024 / r->num_rmid;
+ threshold = cl_size * 1024 / r->num_rmid;

- /* h/w works in units of "boot_cpu_data.x86_cache_occ_scale" */
- resctrl_cqm_threshold /= hw_res->mon_scale;
+ /*
+ * Because num_rmid may not be a power of two, round the value
+ * to the nearest multiple of hw_res->mon_scale so it matches a
+ * value the hardware will measure. mon_scale may not be a power of 2.
+ */
+ resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(threshold);

ret = dom_data_init(r);
if (ret)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 6c33dfe7ea53..849bdec37217 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1030,10 +1030,7 @@ static int rdt_delay_linear_show(struct kernfs_open_file *of,
static int max_threshold_occ_show(struct kernfs_open_file *of,
struct seq_file *seq, void *v)
{
- struct rdt_resource *r = of->kn->parent->priv;
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
-
- seq_printf(seq, "%u\n", resctrl_cqm_threshold * hw_res->mon_scale);
+ seq_printf(seq, "%u\n", resctrl_rmid_realloc_threshold);

return 0;
}
@@ -1055,7 +1052,6 @@ static int rdt_thread_throttle_mode_show(struct kernfs_open_file *of,
static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
- struct rdt_hw_resource *hw_res;
unsigned int bytes;
int ret;

@@ -1066,8 +1062,7 @@ static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
if (bytes > (boot_cpu_data.x86_cache_size * 1024))
return -EINVAL;

- hw_res = resctrl_to_arch_res(of->kn->parent->priv);
- resctrl_cqm_threshold = bytes / hw_res->mon_scale;
+ resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(bytes);

return nbytes;
}
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 7ccfa0d1bb34..9995d043650a 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -250,4 +250,6 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
u32 rmid, enum resctrl_event_id eventid);

+extern unsigned int resctrl_rmid_realloc_threshold;
+
#endif /* _RESCTRL_H */
--
2.30.2

2022-09-02 16:01:36

by James Morse

[permalink] [raw]
Subject: [PATCH v6 12/21] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks

mbm_bw_count() is only called by the mbm_handle_overflow() worker once a
second. It reads the hardware register, calculates the bandwidth and
updates m->prev_bw_msr which is used to hold the previous hardware register
value.

Operating directly on hardware register values makes it difficult to make
this code architecture independent, so that it can be moved to /fs/,
making the mba_sc feature something resctrl supports with no additional
support from the architecture.
Prior to calling mbm_bw_count(), mbm_update() reads from the same hardware
register using __mon_event_count().

Change mbm_bw_count() to use the current chunks value most recently saved
by __mon_event_count(). This removes an extra call to __rmid_read().
Instead of using m->prev_msr to calculate the number of chunks seen,
use the rr->val that was updated by __mon_event_count(). This removes an
extra call to mbm_overflow_count() and get_corrected_mbm_count().
Calculating bandwidth like this means mbm_bw_count() no longer operates
on hardware register values directly.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v3:
* Reset rr.val before each use to avoid over counting.
* Fixed typos in kerneldoc.

Changes since v2:
* Expanded commit message

Changes since v1:
* This patch was rewritten
---
arch/x86/kernel/cpu/resctrl/internal.h | 4 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 25 ++++++++++++++++---------
2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 3b9e43ba7590..46062099d69e 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -289,7 +289,7 @@ struct rftype {
* struct mbm_state - status for each MBM counter in each domain
* @chunks: Total data moved (multiply by rdt_group.mon_scale to get bytes)
* @prev_msr: Value of IA32_QM_CTR for this RMID last time we read it
- * @prev_bw_msr:Value of previous IA32_QM_CTR for bandwidth counting
+ * @prev_bw_chunks: Previous chunks value read for bandwidth calculation
* @prev_bw: The most recent bandwidth in MBps
* @delta_bw: Difference between the current and previous bandwidth
* @delta_comp: Indicates whether to compute the delta_bw
@@ -297,7 +297,7 @@ struct rftype {
struct mbm_state {
u64 chunks;
u64 prev_msr;
- u64 prev_bw_msr;
+ u64 prev_bw_chunks;
u32 prev_bw;
u32 delta_bw;
bool delta_comp;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 3e69386cfe00..2d81b6cd9632 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -315,7 +315,7 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)

if (rr->first) {
memset(m, 0, sizeof(struct mbm_state));
- m->prev_bw_msr = m->prev_msr = tval;
+ m->prev_msr = tval;
return 0;
}

@@ -329,27 +329,32 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
}

/*
+ * mbm_bw_count() - Update bw count from values previously read by
+ * __mon_event_count().
+ * @rmid: The rmid used to identify the cached mbm_state.
+ * @rr: The struct rmid_read populated by __mon_event_count().
+ *
* Supporting function to calculate the memory bandwidth
- * and delta bandwidth in MBps.
+ * and delta bandwidth in MBps. The chunks value previously read by
+ * __mon_event_count() is compared with the chunks value from the previous
+ * invocation. This must be called once per second to maintain values in MBps.
*/
static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
struct mbm_state *m = &rr->d->mbm_local[rmid];
- u64 tval, cur_bw, chunks;
+ u64 cur_bw, chunks, cur_chunks;

- tval = __rmid_read(rmid, rr->evtid);
- if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
- return;
+ cur_chunks = rr->val;
+ chunks = cur_chunks - m->prev_bw_chunks;
+ m->prev_bw_chunks = cur_chunks;

- chunks = mbm_overflow_count(m->prev_bw_msr, tval, hw_res->mbm_width);
- cur_bw = (get_corrected_mbm_count(rmid, chunks) * hw_res->mon_scale) >> 20;
+ cur_bw = (chunks * hw_res->mon_scale) >> 20;

if (m->delta_comp)
m->delta_bw = abs(cur_bw - m->prev_bw);
m->delta_comp = false;
m->prev_bw = cur_bw;
- m->prev_bw_msr = tval;
}

/*
@@ -516,10 +521,12 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
*/
if (is_mbm_total_enabled()) {
rr.evtid = QOS_L3_MBM_TOTAL_EVENT_ID;
+ rr.val = 0;
__mon_event_count(rmid, &rr);
}
if (is_mbm_local_enabled()) {
rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
+ rr.val = 0;
__mon_event_count(rmid, &rr);

/*
--
2.30.2

2022-09-02 16:02:04

by James Morse

[permalink] [raw]
Subject: [PATCH v6 11/21] x86/resctrl: Allow update_mba_bw() to update controls directly

update_mba_bw() calculates a new control value for the MBA resource
based on the user provided mbps_val and the current measured
bandwidth. Some control values need remapping by delay_bw_map().

It does this by calling wrmsrl() directly. This needs splitting
up to be done by an architecture specific helper, so that the
remainder can eventually be moved to /fs/.

Add resctrl_arch_update_one() to apply one configuration value
to the provided resource and domain. This avoids the staging
and cross-calling that is only needed with changes made by
user-space. delay_bw_map() moves to be part of the arch code,
to maintain the 'percentage control' view of MBA resources
in resctrl.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v2:
* Call msr_update() directly from resctrl_arch_update_one().

Changes since v1:
* Capitalisation
---
arch/x86/kernel/cpu/resctrl/core.c | 2 +-
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 21 +++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/internal.h | 1 -
arch/x86/kernel/cpu/resctrl/monitor.c | 13 ++++---------
include/linux/resctrl.h | 8 ++++++++
5 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index f0e2820af475..90ebb7d71af2 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -296,7 +296,7 @@ mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
* that can be written to QOS_MSRs.
* There are currently no SKUs which support non linear delay values.
*/
-u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
+static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
{
if (r->membw.delay_linear)
return MAX_MBA_BW - bw;
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 9f45207a6c74..ece3a1e0e6f2 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -282,6 +282,27 @@ static bool apply_config(struct rdt_hw_domain *hw_dom,
return false;
}

+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
+ u32 closid, enum resctrl_conf_type t, u32 cfg_val)
+{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ u32 idx = get_config_index(closid, t);
+ struct msr_param msr_param;
+
+ if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
+ return -EINVAL;
+
+ hw_dom->ctrl_val[idx] = cfg_val;
+
+ msr_param.res = r;
+ msr_param.low = idx;
+ msr_param.high = idx + 1;
+ hw_res->msr_update(d, &msr_param, r);
+
+ return 0;
+}
+
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
{
struct resctrl_staged_config *cfg;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 373aaba53ecd..3b9e43ba7590 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -527,7 +527,6 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom,
void mbm_handle_overflow(struct work_struct *work);
void __init intel_rdt_mbm_apply_quirk(void);
bool is_mba_sc(struct rdt_resource *r);
-u32 delay_bw_map(unsigned long bw, struct rdt_resource *r);
void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
void cqm_handle_limbo(struct work_struct *work);
bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 16028b2f756a..3e69386cfe00 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -420,10 +420,8 @@ void mon_event_count(void *info)
*/
static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
{
- u32 closid, rmid, cur_msr, cur_msr_val, new_msr_val;
+ u32 closid, rmid, cur_msr_val, new_msr_val;
struct mbm_state *pmbm_data, *cmbm_data;
- struct rdt_hw_resource *hw_r_mba;
- struct rdt_hw_domain *hw_dom_mba;
u32 cur_bw, delta_bw, user_bw;
struct rdt_resource *r_mba;
struct rdt_domain *dom_mba;
@@ -433,8 +431,8 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
if (!is_mbm_local_enabled())
return;

- hw_r_mba = &rdt_resources_all[RDT_RESOURCE_MBA];
- r_mba = &hw_r_mba->r_resctrl;
+ r_mba = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
+
closid = rgrp->closid;
rmid = rgrp->mon.rmid;
pmbm_data = &dom_mbm->mbm_local[rmid];
@@ -444,7 +442,6 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
pr_warn_once("Failure to get domain for MBA update\n");
return;
}
- hw_dom_mba = resctrl_to_arch_dom(dom_mba);

cur_bw = pmbm_data->prev_bw;
user_bw = dom_mba->mbps_val[closid];
@@ -486,9 +483,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
return;
}

- cur_msr = hw_r_mba->msr_base + closid;
- wrmsrl(cur_msr, delay_bw_map(new_msr_val, r_mba));
- hw_dom_mba->ctrl_val[closid] = new_msr_val;
+ resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val);

/*
* Delta values are updated dynamically package wise for each
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 93dfe553b364..f4c9101df461 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -197,6 +197,14 @@ struct resctrl_schema {
/* The number of closid supported by this resource regardless of CDP */
u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
+
+/*
+ * Update the ctrl_val and apply this config right now.
+ * Must be called on one of the domain's CPUs.
+ */
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
+ u32 closid, enum resctrl_conf_type t, u32 cfg_val);
+
u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
--
2.30.2

2022-09-02 16:06:30

by James Morse

[permalink] [raw]
Subject: [PATCH v6 01/21] x86/resctrl: Kill off alloc_enabled

rdt_resources_all[] used to have extra entries for L2CODE/L2DATA.
These were hidden from resctrl by the alloc_enabled value.

Now that the L2/L2CODE/L2DATA resources have been merged together,
alloc_enabled doesn't mean anything, it always has the same value as
alloc_capable which indicates allocation is supported by this resource.

Remove alloc_enabled and its helpers.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v3:
* Removed duplicate words from commit message.

Changes since v1:
* Fixed comment in rdtgroup_create_info_dir()
---
arch/x86/kernel/cpu/resctrl/core.c | 4 ----
arch/x86/kernel/cpu/resctrl/internal.h | 4 ----
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6 +++---
include/linux/resctrl.h | 2 --
5 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index bb1c3f5f60c8..2f87177f1f69 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -147,7 +147,6 @@ static inline void cache_alloc_hsw_probe(void)
r->cache.shareable_bits = 0xc0000;
r->cache.min_cbm_bits = 2;
r->alloc_capable = true;
- r->alloc_enabled = true;

rdt_alloc_capable = true;
}
@@ -211,7 +210,6 @@ static bool __get_mem_config_intel(struct rdt_resource *r)
thread_throttle_mode_init();

r->alloc_capable = true;
- r->alloc_enabled = true;

return true;
}
@@ -242,7 +240,6 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
r->data_width = 4;

r->alloc_capable = true;
- r->alloc_enabled = true;

return true;
}
@@ -261,7 +258,6 @@ static void rdt_get_cache_alloc_cfg(int idx, struct rdt_resource *r)
r->cache.shareable_bits = ebx & r->default_ctrl;
r->data_width = (r->cache.cbm_len + 3) / 4;
r->alloc_capable = true;
- r->alloc_enabled = true;
}

static void rdt_get_cdp_config(int level)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 1d647188a43b..53f3d275a98f 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -459,10 +459,6 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
for_each_rdt_resource(r) \
if (r->mon_capable)

-#define for_each_alloc_enabled_rdt_resource(r) \
- for_each_rdt_resource(r) \
- if (r->alloc_enabled)
-
#define for_each_mon_enabled_rdt_resource(r) \
for_each_rdt_resource(r) \
if (r->mon_enabled)
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index db813f819ad6..f810969ced4b 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -835,7 +835,7 @@ bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
* First determine which cpus have pseudo-locked regions
* associated with them.
*/
- for_each_alloc_enabled_rdt_resource(r) {
+ for_each_alloc_capable_rdt_resource(r) {
list_for_each_entry(d_i, &r->domains, list) {
if (d_i->plr)
cpumask_or(cpu_with_psl, cpu_with_psl,
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f276aff521e8..526eb933333b 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1756,7 +1756,7 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
if (ret)
goto out_destroy;

- /* loop over enabled controls, these are all alloc_enabled */
+ /* loop over enabled controls, these are all alloc_capable */
list_for_each_entry(s, &resctrl_schema_all, list) {
r = s->res;
fflags = r->fflags | RF_CTRL_INFO;
@@ -2106,7 +2106,7 @@ static int schemata_list_create(void)
struct rdt_resource *r;
int ret = 0;

- for_each_alloc_enabled_rdt_resource(r) {
+ for_each_alloc_capable_rdt_resource(r) {
if (resctrl_arch_get_cdp_enabled(r->rid)) {
ret = schemata_list_add(r, CDP_CODE);
if (ret)
@@ -2452,7 +2452,7 @@ static void rdt_kill_sb(struct super_block *sb)
set_mba_sc(false);

/*Put everything back to default values. */
- for_each_alloc_enabled_rdt_resource(r)
+ for_each_alloc_capable_rdt_resource(r)
reset_all_ctrls(r);
cdp_disable_all();
rmdir_all_sub();
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 21deb5212bbd..386ab3a41500 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -130,7 +130,6 @@ struct resctrl_schema;
/**
* struct rdt_resource - attributes of a resctrl resource
* @rid: The index of the resource
- * @alloc_enabled: Is allocation enabled on this machine
* @mon_enabled: Is monitoring enabled for this feature
* @alloc_capable: Is allocation available on this machine
* @mon_capable: Is monitor feature available on this machine
@@ -150,7 +149,6 @@ struct resctrl_schema;
*/
struct rdt_resource {
int rid;
- bool alloc_enabled;
bool mon_enabled;
bool alloc_capable;
bool mon_capable;
--
2.30.2

2022-09-02 16:09:57

by James Morse

[permalink] [raw]
Subject: [PATCH v6 13/21] x86/resctrl: Add per-rmid arch private storage for overflow and chunks

A renamed __rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a counter. Currently the function returns
the MBM values in chunks directly from hardware. For bandwidth
counters the resctrl filesystem uses this to calculate the number of
bytes ever seen.

MPAM's scaling of counters can be changed at runtime, reducing the
resolution but increasing the range. When this is changed the prev_msr
values need to be converted by the architecture code.

Add an array for per-rmid private storage. The prev_msr and chunks
values will move here to allow resctrl_arch_rmid_read() to always
return the number of bytes read by this counter without assistance
from the filesystem. The values are moved in later patches when
the overflow and correction calls are moved into __rmid_read().

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v2:
* Capitalisation
* Use __rmid_read() as this patch is earlier in the series.
* kfree() one array in arch_domain_mbm_alloc() when allocating the other
fails, instead of relying on domain_free().
* Remove the documentation that domain_free() has to be called to cleanup
if this call fails.
---
arch/x86/kernel/cpu/resctrl/core.c | 35 ++++++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/internal.h | 14 +++++++++++
2 files changed, 49 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 90ebb7d71af2..de62b0b87ced 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -413,6 +413,8 @@ static void setup_default_ctrlval(struct rdt_resource *r, u32 *dc)

static void domain_free(struct rdt_hw_domain *hw_dom)
{
+ kfree(hw_dom->arch_mbm_total);
+ kfree(hw_dom->arch_mbm_local);
kfree(hw_dom->ctrl_val);
kfree(hw_dom);
}
@@ -438,6 +440,34 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

+/**
+ * arch_domain_mbm_alloc() - Allocate arch private storage for the MBM counters
+ * @num_rmid: The size of the MBM counter array
+ * @hw_dom: The domain that owns the allocated arrays
+ */
+static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
+{
+ size_t tsize;
+
+ if (is_mbm_total_enabled()) {
+ tsize = sizeof(*hw_dom->arch_mbm_total);
+ hw_dom->arch_mbm_total = kcalloc(num_rmid, tsize, GFP_KERNEL);
+ if (!hw_dom->arch_mbm_total)
+ return -ENOMEM;
+ }
+ if (is_mbm_local_enabled()) {
+ tsize = sizeof(*hw_dom->arch_mbm_local);
+ hw_dom->arch_mbm_local = kcalloc(num_rmid, tsize, GFP_KERNEL);
+ if (!hw_dom->arch_mbm_local) {
+ kfree(hw_dom->arch_mbm_total);
+ hw_dom->arch_mbm_total = NULL;
+ return -ENOMEM;
+ }
+ }
+
+ return 0;
+}
+
/*
* domain_add_cpu - Add a cpu to a resource's domain list.
*
@@ -487,6 +517,11 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
return;
}

+ if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+ domain_free(hw_dom);
+ return;
+ }
+
list_add_tail(&d->list, add_pos);

err = resctrl_online_domain(r, d);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 46062099d69e..4de8e5bb93e1 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -303,17 +303,31 @@ struct mbm_state {
bool delta_comp;
};

+/**
+ * struct arch_mbm_state - values used to compute resctrl_arch_rmid_read()s
+ * return value.
+ * @prev_msr: Value of IA32_QM_CTR last time it was read for the RMID used to
+ * find this struct.
+ */
+struct arch_mbm_state {
+ u64 prev_msr;
+};
+
/**
* struct rdt_hw_domain - Arch private attributes of a set of CPUs that share
* a resource
* @d_resctrl: Properties exposed to the resctrl file system
* @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID)
+ * @arch_mbm_total: arch private state for MBM total bandwidth
+ * @arch_mbm_local: arch private state for MBM local bandwidth
*
* Members of this structure are accessed via helpers that provide abstraction.
*/
struct rdt_hw_domain {
struct rdt_domain d_resctrl;
u32 *ctrl_val;
+ struct arch_mbm_state *arch_mbm_total;
+ struct arch_mbm_state *arch_mbm_local;
};

static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
--
2.30.2

2022-09-02 16:11:21

by James Morse

[permalink] [raw]
Subject: [PATCH v6 07/21] x86/resctrl: Abstract and use supports_mba_mbps()

To determine whether the mba_MBps option to resctrl should be supported,
resctrl tests the boot CPUs' x86_vendor.

This isn't portable, and needs abstracting behind a helper so this check
can be part of the filesystem code that moves to /fs/.

Re-use the tests set_mba_sc() does to determine if the mba_sc is supported
on this system. An 'alloc_capable' test is added so that support for the
controls isn't implied by the 'delay_linear' property, which is always
true for MPAM. Because mbm_update() only update mba_sc if the mbm_local
counters are enabled, supports_mba_mbps() checks is_mbm_local_enabled().
(instead of using is_mbm_enabled(), which checks both)

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v5:
* Changed supports_mba_mbps() to use is_mbm_local_enabled()

Changes since v3:
* Added use in resctrl_online_domain()

Changes since v1:
* Capitalisation
* Added MPAM example in commit message
* Fixed supports_mba_mbps() logic error in rdt_parse_param()
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index b32ceff8325a..4ee26264ecfc 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1890,17 +1890,26 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
}

/*
- * Enable or disable the MBA software controller
- * which helps user specify bandwidth in MBps.
* MBA software controller is supported only if
* MBM is supported and MBA is in linear scale.
*/
+static bool supports_mba_mbps(void)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
+
+ return (is_mbm_local_enabled() &&
+ r->alloc_capable && is_mba_linear());
+}
+
+/*
+ * Enable or disable the MBA software controller
+ * which helps user specify bandwidth in MBps.
+ */
static int set_mba_sc(bool mba_sc)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;

- if (!is_mbm_enabled() || !is_mba_linear() ||
- mba_sc == is_mba_sc(r))
+ if (!supports_mba_mbps() || mba_sc == is_mba_sc(r))
return -EINVAL;

r->membw.mba_sc = mba_sc;
@@ -2255,7 +2264,7 @@ static int rdt_parse_param(struct fs_context *fc, struct fs_parameter *param)
ctx->enable_cdpl2 = true;
return 0;
case Opt_mba_mbps:
- if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+ if (!supports_mba_mbps())
return -EINVAL;
ctx->enable_mba_mbps = true;
return 0;
--
2.30.2

2022-09-02 16:09:57

by James Morse

[permalink] [raw]
Subject: [PATCH v6 02/21] x86/resctrl: Merge mon_capable and mon_enabled

mon_enabled and mon_capable are always set as a pair by
rdt_get_mon_l3_config().

There is no point having two values.

Merge them together.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v1:
* Removed stray cdp_capable changes.
---
arch/x86/kernel/cpu/resctrl/internal.h | 4 ----
arch/x86/kernel/cpu/resctrl/monitor.c | 1 -
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++----
include/linux/resctrl.h | 2 --
4 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 53f3d275a98f..8828b5c1b6d2 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -459,10 +459,6 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
for_each_rdt_resource(r) \
if (r->mon_capable)

-#define for_each_mon_enabled_rdt_resource(r) \
- for_each_rdt_resource(r) \
- if (r->mon_enabled)
-
/* CPUID.(EAX=10H, ECX=ResID=1).EAX */
union cpuid_0x10_1_eax {
struct {
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index eaf25a234ff5..497cadf3285d 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -717,7 +717,6 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
l3_mon_evt_init(r);

r->mon_capable = true;
- r->mon_enabled = true;

return 0;
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 526eb933333b..def7c6681f8b 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1765,7 +1765,7 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
goto out_destroy;
}

- for_each_mon_enabled_rdt_resource(r) {
+ for_each_mon_capable_rdt_resource(r) {
fflags = r->fflags | RF_MON_INFO;
sprintf(name, "%s_MON", r->name);
ret = rdtgroup_mkdir_info_resdir(r, name, fflags);
@@ -2504,7 +2504,7 @@ void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id)
struct rdtgroup *prgrp, *crgrp;
char name[32];

- if (!r->mon_enabled)
+ if (!r->mon_capable)
return;

list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
@@ -2572,7 +2572,7 @@ void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
struct rdtgroup *prgrp, *crgrp;
struct list_head *head;

- if (!r->mon_enabled)
+ if (!r->mon_capable)
return;

list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
@@ -2642,7 +2642,7 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
* Create the subdirectories for each domain. Note that all events
* in a domain like L3 are grouped into a resource whose domain is L3
*/
- for_each_mon_enabled_rdt_resource(r) {
+ for_each_mon_capable_rdt_resource(r) {
ret = mkdir_mondata_subdir_alldom(kn, r, prgrp);
if (ret)
goto out_destroy;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 386ab3a41500..8180c539800d 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -130,7 +130,6 @@ struct resctrl_schema;
/**
* struct rdt_resource - attributes of a resctrl resource
* @rid: The index of the resource
- * @mon_enabled: Is monitoring enabled for this feature
* @alloc_capable: Is allocation available on this machine
* @mon_capable: Is monitor feature available on this machine
* @num_rmid: Number of RMIDs available
@@ -149,7 +148,6 @@ struct resctrl_schema;
*/
struct rdt_resource {
int rid;
- bool mon_enabled;
bool alloc_capable;
bool mon_capable;
int num_rmid;
--
2.30.2

2022-09-02 16:12:26

by James Morse

[permalink] [raw]
Subject: [PATCH v6 06/21] x86/resctrl: Remove set_mba_sc()s control array re-initialisation

set_mba_sc() enables the 'software controller' to regulate the
bandwidth based on the byte counters. This can be managed entirely
in the parts of resctrl that move to /fs/, without any extra
support from the architecture specific code.
set_mba_sc() is called by rdt_enable_ctx() during mount and
umount. It currently resets the arch code's ctrl_val[] and mbps_val[]
arrays.

The ctrl_val[] was already reset when the domain was created,
and by reset_all_ctrls() when the filesystem was last umounted.
Doing the work in set_mba_sc() is not necessary as the values are
already at their defaults due to the creation of the domain, or were
previously reset during umount(), or are about to reset during umount().

Add a reset of the mbps_val[] in reset_all_ctrls(), allowing the
code in set_mba_sc() that reaches in to the architecture specific
structures to be removed.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v3:
* Spelling mistakes in commit message.

Changes since v2:
* Moved earlier in the series, added the reset in reset_all_ctrls().
* Rephrased commit message.
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 5830905a92d2..b32ceff8325a 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1898,18 +1898,12 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
static int set_mba_sc(bool mba_sc)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
- struct rdt_hw_domain *hw_dom;
- struct rdt_domain *d;

if (!is_mbm_enabled() || !is_mba_linear() ||
mba_sc == is_mba_sc(r))
return -EINVAL;

r->membw.mba_sc = mba_sc;
- list_for_each_entry(d, &r->domains, list) {
- hw_dom = resctrl_to_arch_dom(d);
- setup_default_ctrlval(r, hw_dom->ctrl_val, hw_dom->mbps_val);
- }

return 0;
}
@@ -2327,8 +2321,10 @@ static int reset_all_ctrls(struct rdt_resource *r)
hw_dom = resctrl_to_arch_dom(d);
cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);

- for (i = 0; i < hw_res->num_closid; i++)
+ for (i = 0; i < hw_res->num_closid; i++) {
hw_dom->ctrl_val[i] = r->default_ctrl;
+ hw_dom->mbps_val[i] = MBA_MAX_MBPS;
+ }
}
cpu = get_cpu();
/* Update CBM on this cpu if it's in cpu_mask. */
--
2.30.2

2022-09-02 16:12:44

by James Morse

[permalink] [raw]
Subject: [PATCH v6 08/21] x86/resctrl: Create mba_sc configuration in the rdt_domain

To support resctrl's MBA software controller, the architecture must provide
a second configuration array to hold the mbps_val[] from user-space.

This complicates the interface between the architecture specific code and
the filesystem portions of resctrl that will move to /fs/, to allow
multiple architectures to support resctrl.

Make the filesystem parts of resctrl create an array for the mba_sc
values. The software controller can be changed to use this, allowing
the architecture code to only consider the values configured in hardware.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v4:
* Use supports_mba_mbps() in resctrl_{on,off}line_domain().
* Don't call mba_sc_domain_destroy() for error handling - it can't happen.
* Whitespace.

Changes since v3:
* Always allocate the array.
* Move the array allocation above the r->mon_capable check.

Changes since v2:
* Split patch in two, the liftime parts are a separate patch.
* Added reset in set_mba_sc() now that we can't depend on the lifetime.
* Initialise ret in mba_sc_allocate(),
* Made mbps_val allocation/freeing symmetric for cpuhp calls.
* Removed reference to squashed-out struct.
* Preserved kerneldoc for mbps_val.

Changes since v1:
* Added missing error handling to mba_sc_domain_allocate() in
domain_setup_mon_state()
* Added comment about mba_sc_domain_allocate() races
* Squashed out struct resctrl_mba_sc
* Moved mount time alloc/free calls to set_mba_sc().
* Removed mount check in resctrl_offline_domain()
* Reword commit message
---
arch/x86/kernel/cpu/resctrl/internal.h | 1 -
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 39 ++++++++++++++++++++++++++
include/linux/resctrl.h | 7 +++++
3 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e12b55f815bf..a7e2cbce29d5 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -36,7 +36,6 @@
#define MBM_OVERFLOW_INTERVAL 1000
#define MAX_MBA_BW 100u
#define MBA_IS_LINEAR 0x4
-#define MBA_MAX_MBPS U32_MAX
#define MAX_MBA_BW_AMD 0x800
#define MBM_CNTR_WIDTH_OFFSET_AMD 20

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 4ee26264ecfc..f7ebd019e7a5 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1889,6 +1889,30 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
l3_qos_cfg_update(&hw_res->cdp_enabled);
}

+static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_domain *d)
+{
+ u32 num_closid = resctrl_arch_get_num_closid(r);
+ int cpu = cpumask_any(&d->cpu_mask);
+ int i;
+
+ d->mbps_val = kcalloc_node(num_closid, sizeof(*d->mbps_val),
+ GFP_KERNEL, cpu_to_node(cpu));
+ if (!d->mbps_val)
+ return -ENOMEM;
+
+ for (i = 0; i < num_closid; i++)
+ d->mbps_val[i] = MBA_MAX_MBPS;
+
+ return 0;
+}
+
+static void mba_sc_domain_destroy(struct rdt_resource *r,
+ struct rdt_domain *d)
+{
+ kfree(d->mbps_val);
+ d->mbps_val = NULL;
+}
+
/*
* MBA software controller is supported only if
* MBM is supported and MBA is in linear scale.
@@ -1908,12 +1932,20 @@ static bool supports_mba_mbps(void)
static int set_mba_sc(bool mba_sc)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
+ u32 num_closid = resctrl_arch_get_num_closid(r);
+ struct rdt_domain *d;
+ int i;

if (!supports_mba_mbps() || mba_sc == is_mba_sc(r))
return -EINVAL;

r->membw.mba_sc = mba_sc;

+ list_for_each_entry(d, &r->domains, list) {
+ for (i = 0; i < num_closid; i++)
+ d->mbps_val[i] = MBA_MAX_MBPS;
+ }
+
return 0;
}

@@ -3247,6 +3279,9 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
{
lockdep_assert_held(&rdtgroup_mutex);

+ if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
+ mba_sc_domain_destroy(r, d);
+
if (!r->mon_capable)
return;

@@ -3311,6 +3346,10 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)

lockdep_assert_held(&rdtgroup_mutex);

+ if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
+ /* RDT_RESOURCE_MBA is never mon_capable */
+ return mba_sc_domain_allocate(r, d);
+
if (!r->mon_capable)
return 0;

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 5d283bdd6162..93dfe553b364 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -15,6 +15,9 @@ int proc_resctrl_show(struct seq_file *m,

#endif

+/* max value for struct rdt_domain's mbps_val */
+#define MBA_MAX_MBPS U32_MAX
+
/**
* enum resctrl_conf_type - The type of configuration.
* @CDP_NONE: No prioritisation, both code and data are controlled or monitored.
@@ -53,6 +56,9 @@ struct resctrl_staged_config {
* @cqm_work_cpu: worker CPU for CQM h/w counters
* @plr: pseudo-locked region (if any) associated with domain
* @staged_config: parsed configuration to be applied
+ * @mbps_val: When mba_sc is enabled, this holds the array of user
+ * specified control values for mba_sc in MBps, indexed
+ * by closid
*/
struct rdt_domain {
struct list_head list;
@@ -67,6 +73,7 @@ struct rdt_domain {
int cqm_work_cpu;
struct pseudo_lock_region *plr;
struct resctrl_staged_config staged_config[CDP_NUM_TYPES];
+ u32 *mbps_val;
};

/**
--
2.30.2

2022-09-02 16:13:03

by James Morse

[permalink] [raw]
Subject: [PATCH v6 05/21] x86/resctrl: Add domain offline callback for resctrl work

Because domains are exposed to user-space via resctrl, the filesystem
must update its state when CPU hotplug callbacks are triggered.

Some of this work is common to any architecture that would support
resctrl, but the work is tied up with the architecture code to
free the memory.

Move the monitor subdir removal and the cancelling of the mbm/limbo
works into a new resctrl_offline_domain() call. These bits are not
specific to the architecture. Grouping them in one function allows
that code to be moved to /fs/ and re-used by another architecture.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v2:
* Moved kfree()ing to domain_destroy_mon_state() for later re-use.

Changes since v1:
* Removed a redundant mon_capable check
* Capitalisation
* Removed inline comment
* Added to the commit message
---
arch/x86/kernel/cpu/resctrl/core.c | 26 ++-------------
arch/x86/kernel/cpu/resctrl/internal.h | 2 --
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 45 +++++++++++++++++++++++---
include/linux/resctrl.h | 1 +
4 files changed, 44 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index e37889f7a1a5..f69182973175 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -523,27 +523,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)

cpumask_clear_cpu(cpu, &d->cpu_mask);
if (cpumask_empty(&d->cpu_mask)) {
- /*
- * If resctrl is mounted, remove all the
- * per domain monitor data directories.
- */
- if (static_branch_unlikely(&rdt_mon_enable_key))
- rmdir_mondata_subdir_allrdtgrp(r, d->id);
+ resctrl_offline_domain(r, d);
list_del(&d->list);
- if (r->mon_capable && is_mbm_enabled())
- cancel_delayed_work(&d->mbm_over);
- if (is_llc_occupancy_enabled() && has_busy_rmid(r, d)) {
- /*
- * When a package is going down, forcefully
- * decrement rmid->ebusy. There is no way to know
- * that the L3 was flushed and hence may lead to
- * incorrect counts in rare scenarios, but leaving
- * the RMID as busy creates RMID leaks if the
- * package never comes back.
- */
- __check_limbo(d, true);
- cancel_delayed_work(&d->cqm_limbo);
- }

/*
* rdt_domain "d" is going to be freed below, so clear
@@ -551,11 +532,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
*/
if (d->plr)
d->plr->d = NULL;
-
- bitmap_free(d->rmid_busy_llc);
- kfree(d->mbm_total);
- kfree(d->mbm_local);
domain_free(hw_dom);
+
return;
}

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index be48a682dbdb..e12b55f815bf 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -522,8 +522,6 @@ void free_rmid(u32 rmid);
int rdt_get_mon_l3_config(struct rdt_resource *r);
void mon_event_count(void *info);
int rdtgroup_mondata_show(struct seq_file *m, void *arg);
-void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
- unsigned int dom_id);
void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
struct rdt_domain *d, struct rdtgroup *rdtgrp,
int evtid, int first);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 030a70326ccc..5830905a92d2 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2499,14 +2499,12 @@ static int mon_addfile(struct kernfs_node *parent_kn, const char *name,
* Remove all subdirectories of mon_data of ctrl_mon groups
* and monitor groups with given domain id.
*/
-void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id)
+static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
+ unsigned int dom_id)
{
struct rdtgroup *prgrp, *crgrp;
char name[32];

- if (!r->mon_capable)
- return;
-
list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
sprintf(name, "mon_%s_%02d", r->name, dom_id);
kernfs_remove_by_name(prgrp->mon.mon_data_kn, name);
@@ -3233,6 +3231,45 @@ static int __init rdtgroup_setup_root(void)
return ret;
}

+static void domain_destroy_mon_state(struct rdt_domain *d)
+{
+ bitmap_free(d->rmid_busy_llc);
+ kfree(d->mbm_total);
+ kfree(d->mbm_local);
+}
+
+void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ if (!r->mon_capable)
+ return;
+
+ /*
+ * If resctrl is mounted, remove all the
+ * per domain monitor data directories.
+ */
+ if (static_branch_unlikely(&rdt_mon_enable_key))
+ rmdir_mondata_subdir_allrdtgrp(r, d->id);
+
+ if (is_mbm_enabled())
+ cancel_delayed_work(&d->mbm_over);
+ if (is_llc_occupancy_enabled() && has_busy_rmid(r, d)) {
+ /*
+ * When a package is going down, forcefully
+ * decrement rmid->ebusy. There is no way to know
+ * that the L3 was flushed and hence may lead to
+ * incorrect counts in rare scenarios, but leaving
+ * the RMID as busy creates RMID leaks if the
+ * package never comes back.
+ */
+ __check_limbo(d, true);
+ cancel_delayed_work(&d->cqm_limbo);
+ }
+
+ domain_destroy_mon_state(d);
+}
+
static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
{
size_t tsize;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index d512455b4c3a..5d283bdd6162 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -193,5 +193,6 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);

#endif /* _RESCTRL_H */
--
2.30.2

2022-09-02 16:13:05

by James Morse

[permalink] [raw]
Subject: [PATCH v6 03/21] x86/resctrl: Add domain online callback for resctrl work

Because domains are exposed to user-space via resctrl, the filesystem
must update its state when CPU hotplug callbacks are triggered.

Some of this work is common to any architecture that would support
resctrl, but the work is tied up with the architecture code to
allocate the memory.

Move domain_setup_mon_state(), the monitor subdir creation call and the
mbm/limbo workers into a new resctrl_online_domain() call. These bits
are not specific to the architecture. Grouping them in one function
allows that code to be moved to /fs/ and re-used by another architecture.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v2:
* Fixed kfree(d) rebase artefact.

Changes since v1:
* Capitalisation
* Removed inline comment
* Added to the commit message
---
arch/x86/kernel/cpu/resctrl/core.c | 57 ++++------------------
arch/x86/kernel/cpu/resctrl/internal.h | 2 -
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 65 ++++++++++++++++++++++++--
include/linux/resctrl.h | 1 +
4 files changed, 69 insertions(+), 56 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 2f87177f1f69..25f30148478b 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -443,42 +443,6 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

-static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
-{
- size_t tsize;
-
- if (is_llc_occupancy_enabled()) {
- d->rmid_busy_llc = bitmap_zalloc(r->num_rmid, GFP_KERNEL);
- if (!d->rmid_busy_llc)
- return -ENOMEM;
- INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
- }
- if (is_mbm_total_enabled()) {
- tsize = sizeof(*d->mbm_total);
- d->mbm_total = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
- if (!d->mbm_total) {
- bitmap_free(d->rmid_busy_llc);
- return -ENOMEM;
- }
- }
- if (is_mbm_local_enabled()) {
- tsize = sizeof(*d->mbm_local);
- d->mbm_local = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
- if (!d->mbm_local) {
- bitmap_free(d->rmid_busy_llc);
- kfree(d->mbm_total);
- return -ENOMEM;
- }
- }
-
- if (is_mbm_enabled()) {
- INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
- mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
- }
-
- return 0;
-}
-
/*
* domain_add_cpu - Add a cpu to a resource's domain list.
*
@@ -498,6 +462,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
struct list_head *add_pos = NULL;
struct rdt_hw_domain *hw_dom;
struct rdt_domain *d;
+ int err;

d = rdt_find_domain(r, id, &add_pos);
if (IS_ERR(d)) {
@@ -527,21 +492,15 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
return;
}

- if (r->mon_capable && domain_setup_mon_state(r, d)) {
- kfree(hw_dom->ctrl_val);
- kfree(hw_dom->mbps_val);
- kfree(hw_dom);
- return;
- }
-
list_add_tail(&d->list, add_pos);

- /*
- * If resctrl is mounted, add
- * per domain monitor data directories.
- */
- if (static_branch_unlikely(&rdt_mon_enable_key))
- mkdir_mondata_subdir_allrdtgrp(r, d);
+ err = resctrl_online_domain(r, d);
+ if (err) {
+ list_del(&d->list);
+ kfree(hw_dom->ctrl_val);
+ kfree(hw_dom->mbps_val);
+ kfree(hw_dom);
+ }
}

static void domain_remove_cpu(int cpu, struct rdt_resource *r)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 8828b5c1b6d2..be48a682dbdb 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -524,8 +524,6 @@ void mon_event_count(void *info);
int rdtgroup_mondata_show(struct seq_file *m, void *arg);
void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
unsigned int dom_id);
-void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
- struct rdt_domain *d);
void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
struct rdt_domain *d, struct rdtgroup *rdtgrp,
int evtid, int first);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index def7c6681f8b..030a70326ccc 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2565,16 +2565,13 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
* Add all subdirectories of mon_data for "ctrl_mon" groups
* and "monitor" groups with given domain id.
*/
-void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
- struct rdt_domain *d)
+static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
+ struct rdt_domain *d)
{
struct kernfs_node *parent_kn;
struct rdtgroup *prgrp, *crgrp;
struct list_head *head;

- if (!r->mon_capable)
- return;
-
list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
parent_kn = prgrp->mon.mon_data_kn;
mkdir_mondata_subdir(parent_kn, d, r, prgrp);
@@ -3236,6 +3233,64 @@ static int __init rdtgroup_setup_root(void)
return ret;
}

+static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
+{
+ size_t tsize;
+
+ if (is_llc_occupancy_enabled()) {
+ d->rmid_busy_llc = bitmap_zalloc(r->num_rmid, GFP_KERNEL);
+ if (!d->rmid_busy_llc)
+ return -ENOMEM;
+ }
+ if (is_mbm_total_enabled()) {
+ tsize = sizeof(*d->mbm_total);
+ d->mbm_total = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
+ if (!d->mbm_total) {
+ bitmap_free(d->rmid_busy_llc);
+ return -ENOMEM;
+ }
+ }
+ if (is_mbm_local_enabled()) {
+ tsize = sizeof(*d->mbm_local);
+ d->mbm_local = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
+ if (!d->mbm_local) {
+ bitmap_free(d->rmid_busy_llc);
+ kfree(d->mbm_total);
+ return -ENOMEM;
+ }
+ }
+
+ return 0;
+}
+
+int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+ int err;
+
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ if (!r->mon_capable)
+ return 0;
+
+ err = domain_setup_mon_state(r, d);
+ if (err)
+ return err;
+
+ if (is_mbm_enabled()) {
+ INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
+ mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
+ }
+
+ if (is_llc_occupancy_enabled())
+ INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
+
+ /* If resctrl is mounted, add per domain monitor data directories. */
+ if (static_branch_unlikely(&rdt_mon_enable_key))
+ mkdir_mondata_subdir_allrdtgrp(r, d);
+
+ return 0;
+}
+
/*
* rdtgroup_init - rdtgroup initialization
*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 8180c539800d..d512455b4c3a 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -192,5 +192,6 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
+int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);

#endif /* _RESCTRL_H */
--
2.30.2

2022-09-02 16:30:09

by James Morse

[permalink] [raw]
Subject: [PATCH v6 15/21] x86/resctrl: Abstract __rmid_read()

__rmid_read() selects the specified eventid and returns the counter
value from the MSR. The error handling is architecture specific, and
handled by the callers, rdtgroup_mondata_show() and __mon_event_count().

Error handling should be handled by architecture specific code, as
a different architecture may have different requirements. MPAM's
counters can report that they are 'not ready', requiring a second
read after a short delay. This should be hidden from resctrl.

Make __rmid_read() the architecture specific function for reading
a counter. Rename it resctrl_arch_rmid_read() and move the error
handling into it.

A read from a counter that hardware supports but resctrl does not
now returns -EINVAL instead of -EIO from the default case in
__mon_event_count(). It isn't possible for user-space to see this
change as resctrl doesn't expose counters it doesn't support.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v4:
* Fixed __rmid_read in comment.
* Renamed ret_val ret.

Changes since v3:
* Changed return type of __mon_event_count().
* Clarified comment in mon_event_count()

Changes since v2:
* Capitalisation
* Stray newline restored
* Removed rr->val set to the error value, and replaced it with clearing the
the error to hide Unavailable from monitor group reads. (and added a block
comment).

Changes since v1:
* Return EINVAL from the impossible case in __mon_event_count() instead
of an x86 hardware specific value.
---
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 4 +-
arch/x86/kernel/cpu/resctrl/internal.h | 1 +
arch/x86/kernel/cpu/resctrl/monitor.c | 62 ++++++++++++++---------
include/linux/resctrl.h | 1 +
4 files changed, 43 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index ece3a1e0e6f2..d3f7eb2ac14b 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)

mon_event_read(&rr, r, d, rdtgrp, evtid, false);

- if (rr.val & RMID_VAL_ERROR)
+ if (rr.err == -EIO)
seq_puts(m, "Error\n");
- else if (rr.val & RMID_VAL_UNAVAIL)
+ else if (rr.err == -EINVAL)
seq_puts(m, "Unavailable\n");
else
seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index b34a1403f033..1d2e7bd6305f 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -94,6 +94,7 @@ struct rmid_read {
struct rdt_domain *d;
enum resctrl_event_id evtid;
bool first;
+ int err;
u64 val;
};

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index e9755143492b..51ab76f2dfbc 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -167,9 +167,9 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
memset(am, 0, sizeof(*am));
}

-static u64 __rmid_read(u32 rmid, enum resctrl_event_id eventid)
+int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
{
- u64 val;
+ u64 msr_val;

/*
* As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
@@ -180,14 +180,24 @@ static u64 __rmid_read(u32 rmid, enum resctrl_event_id eventid)
* are error bits.
*/
wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
- rdmsrl(MSR_IA32_QM_CTR, val);
+ rdmsrl(MSR_IA32_QM_CTR, msr_val);

- return val;
+ if (msr_val & RMID_VAL_ERROR)
+ return -EIO;
+ if (msr_val & RMID_VAL_UNAVAIL)
+ return -EINVAL;
+
+ *val = msr_val;
+
+ return 0;
}

static bool rmid_dirty(struct rmid_entry *entry)
{
- u64 val = __rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID);
+ u64 val = 0;
+
+ if (resctrl_arch_rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID, &val))
+ return true;

return val >= resctrl_cqm_threshold;
}
@@ -259,8 +269,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
{
struct rdt_resource *r;
struct rdt_domain *d;
- int cpu;
- u64 val;
+ int cpu, err;
+ u64 val = 0;

r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;

@@ -268,8 +278,10 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
cpu = get_cpu();
list_for_each_entry(d, &r->domains, list) {
if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
- val = __rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID);
- if (val <= resctrl_cqm_threshold)
+ err = resctrl_arch_rmid_read(entry->rmid,
+ QOS_L3_OCCUP_EVENT_ID,
+ &val);
+ if (err || val <= resctrl_cqm_threshold)
continue;
}

@@ -315,19 +327,19 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
return chunks >> shift;
}

-static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
+static int __mon_event_count(u32 rmid, struct rmid_read *rr)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
struct mbm_state *m;
- u64 chunks, tval;
+ u64 chunks, tval = 0;

if (rr->first)
resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);

- tval = __rmid_read(rmid, rr->evtid);
- if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
- return tval;
- }
+ rr->err = resctrl_arch_rmid_read(rmid, rr->evtid, &tval);
+ if (rr->err)
+ return rr->err;
+
switch (rr->evtid) {
case QOS_L3_OCCUP_EVENT_ID:
rr->val += tval;
@@ -341,9 +353,9 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
default:
/*
* Code would never reach here because an invalid
- * event id would fail the __rmid_read.
+ * event id would fail in resctrl_arch_rmid_read().
*/
- return RMID_VAL_ERROR;
+ return -EINVAL;
}

if (rr->first) {
@@ -399,11 +411,11 @@ void mon_event_count(void *info)
struct rdtgroup *rdtgrp, *entry;
struct rmid_read *rr = info;
struct list_head *head;
- u64 ret_val;
+ int ret;

rdtgrp = rr->rgrp;

- ret_val = __mon_event_count(rdtgrp->mon.rmid, rr);
+ ret = __mon_event_count(rdtgrp->mon.rmid, rr);

/*
* For Ctrl groups read data from child monitor groups and
@@ -415,13 +427,17 @@ void mon_event_count(void *info)
if (rdtgrp->type == RDTCTRL_GROUP) {
list_for_each_entry(entry, head, mon.crdtgrp_list) {
if (__mon_event_count(entry->mon.rmid, rr) == 0)
- ret_val = 0;
+ ret = 0;
}
}

- /* Report error if none of rmid_reads are successful */
- if (ret_val)
- rr->val = ret_val;
+ /*
+ * __mon_event_count() calls for newly created monitor groups may
+ * report -EINVAL/Unavailable if the monitor hasn't seen any traffic.
+ * Discard error if any of the monitor event reads succeeded.
+ */
+ if (ret == 0)
+ rr->err = 0;
}

/*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 818456770176..efe60dd7fd21 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -219,6 +219,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
+int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *res);

/**
* resctrl_arch_reset_rmid() - Reset any private state associated with rmid
--
2.30.2

2022-09-07 06:52:40

by haoxin

[permalink] [raw]
Subject: Re: [PATCH v6 20/21] x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data


在 2022/9/2 下午11:48, James Morse 写道:
> resctrl_rmid_realloc_threshold can be set by user-space. The maximum
> value is specified by the architecture.
>
> Currently max_threshold_occ_write() reads the maximum value from
> boot_cpu_data.x86_cache_size, which is not portable to another
> architecture.
>
> Add resctrl_rmid_realloc_limit to describe the maximum size in bytes
> that user-space can set the threshold to.
>
> Reviewed-by: Jamie Iles <[email protected]>
> Tested-by: Xin Hao <[email protected]>
> Reviewed-by: Shaopeng Tan <[email protected]>
> Tested-by: Shaopeng Tan <[email protected]>
> Tested-by: Cristian Marussi <[email protected]>
> Reviewed-by: Reinette Chatre <[email protected]>
> Signed-off-by: James Morse <[email protected]>
> ---
> arch/x86/kernel/cpu/resctrl/monitor.c | 9 +++++++--
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 2 +-
> include/linux/resctrl.h | 1 +
> 3 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index e91afe99b763..8d15568d7121 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -67,6 +67,11 @@ unsigned int rdt_mon_features;
> */
> unsigned int resctrl_rmid_realloc_threshold;
>
> +/*
> + * This is the maximum value for the reallocation threshold, in bytes.
> + */
> +unsigned int resctrl_rmid_realloc_limit;
> +
> #define CF(cf) ((unsigned long)(1048576 * (cf) + 0.5))
>
> /*
> @@ -747,10 +752,10 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
> {
> unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
> struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> - unsigned int cl_size = boot_cpu_data.x86_cache_size;
> unsigned int threshold;
> int ret;
>
> + resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
Use SZ_1K instead ?
> hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
> r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1;
> hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
> @@ -767,7 +772,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
> *
> * For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
> */
> - threshold = cl_size * 1024 / r->num_rmid;
> + threshold = resctrl_rmid_realloc_limit / r->num_rmid;
>
> /*
> * Because num_rmid may not be a power of two, round the value
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 849bdec37217..e5a48f05e787 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1059,7 +1059,7 @@ static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
> if (ret)
> return ret;
>
> - if (bytes > (boot_cpu_data.x86_cache_size * 1024))
> + if (bytes > resctrl_rmid_realloc_limit)
> return -EINVAL;
>
> resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(bytes);
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 9995d043650a..cb857f753322 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -251,5 +251,6 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
> u32 rmid, enum resctrl_event_id eventid);
>
> extern unsigned int resctrl_rmid_realloc_threshold;
> +extern unsigned int resctrl_rmid_realloc_limit;
>
> #endif /* _RESCTRL_H */

2022-09-07 06:54:02

by haoxin

[permalink] [raw]
Subject: Re: [PATCH v6 04/21] x86/resctrl: Group struct rdt_hw_domain cleanup


在 2022/9/2 下午11:48, James Morse 写道:
> domain_add_cpu() and domain_remove_cpu() need to kfree() the child
> arrays that were allocated by domain_setup_ctrlval().
>
> As this memory is moved around, and new arrays are created, adjusting
> the error handling cleanup code becomes noisier.
>
> To simplify this, move all the kfree() calls into a domain_free() helper.
> This depends on struct rdt_hw_domain being kzalloc()d, allowing it to
> unconditionally kfree() all the child arrays.
>
> Reviewed-by: Jamie Iles <[email protected]>
> Tested-by: Xin Hao <[email protected]>
> Reviewed-by: Shaopeng Tan <[email protected]>
> Tested-by: Shaopeng Tan <[email protected]>
> Tested-by: Cristian Marussi <[email protected]>
> Reviewed-by: Reinette Chatre <[email protected]>
> Signed-off-by: James Morse <[email protected]>
> ---
> Changes since v2:
> * Made domain_free() static.
>
> Changes since v1:
> * This patch is new
> ---
> arch/x86/kernel/cpu/resctrl/core.c | 17 ++++++++++-------
> 1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 25f30148478b..e37889f7a1a5 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -414,6 +414,13 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
> }
> }
>
> +static void domain_free(struct rdt_hw_domain *hw_dom)
add inline ?
> +{
> + kfree(hw_dom->ctrl_val);
> + kfree(hw_dom->mbps_val);
> + kfree(hw_dom);
> +}
> +
> static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
> {
> struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> @@ -488,7 +495,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
> rdt_domain_reconfigure_cdp(r);
>
> if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
> - kfree(hw_dom);
> + domain_free(hw_dom);
> return;
> }
>
> @@ -497,9 +504,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
> err = resctrl_online_domain(r, d);
> if (err) {
> list_del(&d->list);
> - kfree(hw_dom->ctrl_val);
> - kfree(hw_dom->mbps_val);
> - kfree(hw_dom);
> + domain_free(hw_dom);
> }
> }
>
> @@ -547,12 +552,10 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
> if (d->plr)
> d->plr->d = NULL;
>
> - kfree(hw_dom->ctrl_val);
> - kfree(hw_dom->mbps_val);
> bitmap_free(d->rmid_busy_llc);
> kfree(d->mbm_total);
> kfree(d->mbm_local);
> - kfree(hw_dom);
> + domain_free(hw_dom);
> return;
> }
>

2022-09-07 07:06:36

by haoxin

[permalink] [raw]
Subject: Re: [PATCH v6 05/21] x86/resctrl: Add domain offline callback for resctrl work


在 2022/9/2 下午11:48, James Morse 写道:
> Because domains are exposed to user-space via resctrl, the filesystem
> must update its state when CPU hotplug callbacks are triggered.
>
> Some of this work is common to any architecture that would support
> resctrl, but the work is tied up with the architecture code to
> free the memory.
>
> Move the monitor subdir removal and the cancelling of the mbm/limbo
> works into a new resctrl_offline_domain() call. These bits are not
> specific to the architecture. Grouping them in one function allows
> that code to be moved to /fs/ and re-used by another architecture.
>
> Reviewed-by: Jamie Iles <[email protected]>
> Tested-by: Xin Hao <[email protected]>
> Reviewed-by: Shaopeng Tan <[email protected]>
> Tested-by: Shaopeng Tan <[email protected]>
> Tested-by: Cristian Marussi <[email protected]>
> Reviewed-by: Reinette Chatre <[email protected]>
> Signed-off-by: James Morse <[email protected]>
> ---
> Changes since v2:
> * Moved kfree()ing to domain_destroy_mon_state() for later re-use.
>
> Changes since v1:
> * Removed a redundant mon_capable check
> * Capitalisation
> * Removed inline comment
> * Added to the commit message
> ---
> arch/x86/kernel/cpu/resctrl/core.c | 26 ++-------------
> arch/x86/kernel/cpu/resctrl/internal.h | 2 --
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 45 +++++++++++++++++++++++---
> include/linux/resctrl.h | 1 +
> 4 files changed, 44 insertions(+), 30 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index e37889f7a1a5..f69182973175 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -523,27 +523,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
>
> cpumask_clear_cpu(cpu, &d->cpu_mask);
> if (cpumask_empty(&d->cpu_mask)) {
> - /*
> - * If resctrl is mounted, remove all the
> - * per domain monitor data directories.
> - */
> - if (static_branch_unlikely(&rdt_mon_enable_key))
> - rmdir_mondata_subdir_allrdtgrp(r, d->id);
> + resctrl_offline_domain(r, d);
> list_del(&d->list);
> - if (r->mon_capable && is_mbm_enabled())
> - cancel_delayed_work(&d->mbm_over);
> - if (is_llc_occupancy_enabled() && has_busy_rmid(r, d)) {
> - /*
> - * When a package is going down, forcefully
> - * decrement rmid->ebusy. There is no way to know
> - * that the L3 was flushed and hence may lead to
> - * incorrect counts in rare scenarios, but leaving
> - * the RMID as busy creates RMID leaks if the
> - * package never comes back.
> - */
> - __check_limbo(d, true);
> - cancel_delayed_work(&d->cqm_limbo);
> - }
>
> /*
> * rdt_domain "d" is going to be freed below, so clear
> @@ -551,11 +532,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
> */
> if (d->plr)
> d->plr->d = NULL;
> -
> - bitmap_free(d->rmid_busy_llc);
> - kfree(d->mbm_total);
> - kfree(d->mbm_local);
> domain_free(hw_dom);
> +
> return;
> }
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index be48a682dbdb..e12b55f815bf 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -522,8 +522,6 @@ void free_rmid(u32 rmid);
> int rdt_get_mon_l3_config(struct rdt_resource *r);
> void mon_event_count(void *info);
> int rdtgroup_mondata_show(struct seq_file *m, void *arg);
> -void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
> - unsigned int dom_id);
> void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
> struct rdt_domain *d, struct rdtgroup *rdtgrp,
> int evtid, int first);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 030a70326ccc..5830905a92d2 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2499,14 +2499,12 @@ static int mon_addfile(struct kernfs_node *parent_kn, const char *name,
> * Remove all subdirectories of mon_data of ctrl_mon groups
> * and monitor groups with given domain id.
> */
> -void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id)
> +static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
> + unsigned int dom_id)
> {
> struct rdtgroup *prgrp, *crgrp;
> char name[32];
>
> - if (!r->mon_capable)
> - return;
> -
> list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> sprintf(name, "mon_%s_%02d", r->name, dom_id);
> kernfs_remove_by_name(prgrp->mon.mon_data_kn, name);
> @@ -3233,6 +3231,45 @@ static int __init rdtgroup_setup_root(void)
> return ret;
> }
>
> +static void domain_destroy_mon_state(struct rdt_domain *d)
add inline ?
> +{
> + bitmap_free(d->rmid_busy_llc);
> + kfree(d->mbm_total);
> + kfree(d->mbm_local);
> +}
> +
> +void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
> +{
> + lockdep_assert_held(&rdtgroup_mutex);
> +
> + if (!r->mon_capable)
> + return;
> +
> + /*
> + * If resctrl is mounted, remove all the
> + * per domain monitor data directories.
> + */
> + if (static_branch_unlikely(&rdt_mon_enable_key))
> + rmdir_mondata_subdir_allrdtgrp(r, d->id);
> +
> + if (is_mbm_enabled())
> + cancel_delayed_work(&d->mbm_over);
> + if (is_llc_occupancy_enabled() && has_busy_rmid(r, d)) {
> + /*
> + * When a package is going down, forcefully
> + * decrement rmid->ebusy. There is no way to know
> + * that the L3 was flushed and hence may lead to
> + * incorrect counts in rare scenarios, but leaving
> + * the RMID as busy creates RMID leaks if the
> + * package never comes back.
> + */
> + __check_limbo(d, true);
> + cancel_delayed_work(&d->cqm_limbo);
> + }
> +
> + domain_destroy_mon_state(d);
> +}
> +
> static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
> {
> size_t tsize;
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index d512455b4c3a..5d283bdd6162 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -193,5 +193,6 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
> u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
> u32 closid, enum resctrl_conf_type type);
> int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
> +void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
>
> #endif /* _RESCTRL_H */

2022-09-07 07:22:50

by haoxin

[permalink] [raw]
Subject: Re: [PATCH v6 12/21] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks


在 2022/9/2 下午11:48, James Morse 写道:
> mbm_bw_count() is only called by the mbm_handle_overflow() worker once a
> second. It reads the hardware register, calculates the bandwidth and
> updates m->prev_bw_msr which is used to hold the previous hardware register
> value.
>
> Operating directly on hardware register values makes it difficult to make
> this code architecture independent, so that it can be moved to /fs/,
> making the mba_sc feature something resctrl supports with no additional
> support from the architecture.
> Prior to calling mbm_bw_count(), mbm_update() reads from the same hardware
> register using __mon_event_count().
>
> Change mbm_bw_count() to use the current chunks value most recently saved
> by __mon_event_count(). This removes an extra call to __rmid_read().
> Instead of using m->prev_msr to calculate the number of chunks seen,
> use the rr->val that was updated by __mon_event_count(). This removes an
> extra call to mbm_overflow_count() and get_corrected_mbm_count().
> Calculating bandwidth like this means mbm_bw_count() no longer operates
> on hardware register values directly.
>
> Reviewed-by: Jamie Iles <[email protected]>
> Tested-by: Xin Hao <[email protected]>
> Reviewed-by: Shaopeng Tan <[email protected]>
> Tested-by: Shaopeng Tan <[email protected]>
> Tested-by: Cristian Marussi <[email protected]>
> Reviewed-by: Reinette Chatre <[email protected]>
> Signed-off-by: James Morse <[email protected]>
> ---
> Changes since v3:
> * Reset rr.val before each use to avoid over counting.
> * Fixed typos in kerneldoc.
>
> Changes since v2:
> * Expanded commit message
>
> Changes since v1:
> * This patch was rewritten
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 4 ++--
> arch/x86/kernel/cpu/resctrl/monitor.c | 25 ++++++++++++++++---------
> 2 files changed, 18 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 3b9e43ba7590..46062099d69e 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -289,7 +289,7 @@ struct rftype {
> * struct mbm_state - status for each MBM counter in each domain
> * @chunks: Total data moved (multiply by rdt_group.mon_scale to get bytes)
> * @prev_msr: Value of IA32_QM_CTR for this RMID last time we read it
> - * @prev_bw_msr:Value of previous IA32_QM_CTR for bandwidth counting
> + * @prev_bw_chunks: Previous chunks value read for bandwidth calculation
> * @prev_bw: The most recent bandwidth in MBps
> * @delta_bw: Difference between the current and previous bandwidth
> * @delta_comp: Indicates whether to compute the delta_bw
> @@ -297,7 +297,7 @@ struct rftype {
> struct mbm_state {
> u64 chunks;
> u64 prev_msr;
> - u64 prev_bw_msr;
> + u64 prev_bw_chunks;
> u32 prev_bw;
> u32 delta_bw;
> bool delta_comp;
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 3e69386cfe00..2d81b6cd9632 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -315,7 +315,7 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
>
> if (rr->first) {
> memset(m, 0, sizeof(struct mbm_state));
> - m->prev_bw_msr = m->prev_msr = tval;
> + m->prev_msr = tval;
> return 0;
> }
>
> @@ -329,27 +329,32 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
> }
>
> /*
> + * mbm_bw_count() - Update bw count from values previously read by
> + * __mon_event_count().
> + * @rmid: The rmid used to identify the cached mbm_state.
> + * @rr: The struct rmid_read populated by __mon_event_count().
> + *
> * Supporting function to calculate the memory bandwidth
> - * and delta bandwidth in MBps.
> + * and delta bandwidth in MBps. The chunks value previously read by
> + * __mon_event_count() is compared with the chunks value from the previous
> + * invocation. This must be called once per second to maintain values in MBps.
> */
> static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
> {
> struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
> struct mbm_state *m = &rr->d->mbm_local[rmid];
> - u64 tval, cur_bw, chunks;
> + u64 cur_bw, chunks, cur_chunks;
>
> - tval = __rmid_read(rmid, rr->evtid);
> - if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
> - return;
> + cur_chunks = rr->val;
> + chunks = cur_chunks - m->prev_bw_chunks;
> + m->prev_bw_chunks = cur_chunks;
>
> - chunks = mbm_overflow_count(m->prev_bw_msr, tval, hw_res->mbm_width);
> - cur_bw = (get_corrected_mbm_count(rmid, chunks) * hw_res->mon_scale) >> 20;
> + cur_bw = (chunks * hw_res->mon_scale) >> 20;
>
> if (m->delta_comp)
> m->delta_bw = abs(cur_bw - m->prev_bw);
> m->delta_comp = false;
> m->prev_bw = cur_bw;
> - m->prev_bw_msr = tval;
> }
>
> /*
> @@ -516,10 +521,12 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
> */
> if (is_mbm_total_enabled()) {
> rr.evtid = QOS_L3_MBM_TOTAL_EVENT_ID;
> + rr.val = 0;
In mbm_update,  there no use the rr.val, so there no need to initialize ?
> __mon_event_count(rmid, &rr);
> }
> if (is_mbm_local_enabled()) {
> rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
> + rr.val = 0;
ditto.
> __mon_event_count(rmid, &rr);
>
> /*

2022-09-08 17:16:28

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v6 05/21] x86/resctrl: Add domain offline callback for resctrl work

Hi Hao Xin,

On 07/09/2022 07:29, haoxin wrote:
> 在 2022/9/2 下午11:48, James Morse 写道:
>> Because domains are exposed to user-space via resctrl, the filesystem
>> must update its state when CPU hotplug callbacks are triggered.
>>
>> Some of this work is common to any architecture that would support
>> resctrl, but the work is tied up with the architecture code to
>> free the memory.
>>
>> Move the monitor subdir removal and the cancelling of the mbm/limbo
>> works into a new resctrl_offline_domain() call. These bits are not
>> specific to the architecture. Grouping them in one function allows
>> that code to be moved to /fs/ and re-used by another architecture.

>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 030a70326ccc..5830905a92d2 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -3233,6 +3231,45 @@ static int __init rdtgroup_setup_root(void)
>>       return ret;
>>   }
>>   +static void domain_destroy_mon_state(struct rdt_domain *d)

> add inline ?

As previously, the compiler doesn't need to be told it can inline static functions in a C
file.


Thanks,

James

2022-09-08 17:17:41

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v6 04/21] x86/resctrl: Group struct rdt_hw_domain cleanup

Hi Hao Xin,

On 07/09/2022 07:28, haoxin wrote:
>
> 在 2022/9/2 下午11:48, James Morse 写道:
>> domain_add_cpu() and domain_remove_cpu() need to kfree() the child
>> arrays that were allocated by domain_setup_ctrlval().
>>
>> As this memory is moved around, and new arrays are created, adjusting
>> the error handling cleanup code becomes noisier.
>>
>> To simplify this, move all the kfree() calls into a domain_free() helper.
>> This depends on struct rdt_hw_domain being kzalloc()d, allowing it to
>> unconditionally kfree() all the child arrays.

>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 25f30148478b..e37889f7a1a5 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -414,6 +414,13 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
>>       }
>>   }
>>   +static void domain_free(struct rdt_hw_domain *hw_dom)

> add inline ?

It's best to let the compiler decide this. As this is in a C file, and is declared static,
the compiler is free to duplicate and inline this function as it sees fit. The inline
keyword would only be needed if this were in a header file.

Looking at the built object file - the compiler chose not to duplicate this into the two
callers, presumably because of the size of the function.

Unless its relied on for correctness, or is a performance sensitive path, its best to let
the compiler make its own decision here.


Thanks,

James


>> +{
>> +    kfree(hw_dom->ctrl_val);
>> +    kfree(hw_dom->mbps_val);
>> +    kfree(hw_dom);
>> +}
>> +
>>   static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
>>   {
>>       struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);

2022-09-08 17:37:39

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v6 20/21] x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data

Hi Hao Xin,

On 07/09/2022 07:26, haoxin wrote:
> 在 2022/9/2 下午11:48, James Morse 写道:
>> resctrl_rmid_realloc_threshold can be set by user-space. The maximum
>> value is specified by the architecture.
>>
>> Currently max_threshold_occ_write() reads the maximum value from
>> boot_cpu_data.x86_cache_size, which is not portable to another
>> architecture.
>>
>> Add resctrl_rmid_realloc_limit to describe the maximum size in bytes
>> that user-space can set the threshold to.

>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index e91afe99b763..8d15568d7121 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -747,10 +752,10 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
>>   {
>>       unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
>>       struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>> -    unsigned int cl_size = boot_cpu_data.x86_cache_size;
>>       unsigned int threshold;
>>       int ret;
>>   +    resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;

> Use SZ_1K instead ?

Sure, that's usually more readable. I didn't because the existing code doesn't, and I
wanted it to be obvious where this had come from.

If I re-post the series I'll change this.


Thanks for taking a look!

James

>>       hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
>>       r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1;
>>       hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
>> @@ -767,7 +772,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
>> *
>> * For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
>> */
>> - threshold = cl_size * 1024 / r->num_rmid;
>> + threshold = resctrl_rmid_realloc_limit / r->num_rmid;
>> /*
>> * Because num_rmid may not be a power of two, round the value

2022-09-08 17:39:20

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v6 12/21] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks

Hi Hao Xin,

On 07/09/2022 07:47, haoxin wrote:
> 在 2022/9/2 下午11:48, James Morse 写道:
>> mbm_bw_count() is only called by the mbm_handle_overflow() worker once a
>> second. It reads the hardware register, calculates the bandwidth and
>> updates m->prev_bw_msr which is used to hold the previous hardware register
>> value.
>>
>> Operating directly on hardware register values makes it difficult to make
>> this code architecture independent, so that it can be moved to /fs/,
>> making the mba_sc feature something resctrl supports with no additional
>> support from the architecture.
>> Prior to calling mbm_bw_count(), mbm_update() reads from the same hardware
>> register using __mon_event_count().
>>
>> Change mbm_bw_count() to use the current chunks value most recently saved
>> by __mon_event_count(). This removes an extra call to __rmid_read().
>> Instead of using m->prev_msr to calculate the number of chunks seen,
>> use the rr->val that was updated by __mon_event_count(). This removes an
>> extra call to mbm_overflow_count() and get_corrected_mbm_count().
>> Calculating bandwidth like this means mbm_bw_count() no longer operates
>> on hardware register values directly.

>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 3e69386cfe00..2d81b6cd9632 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c

>> @@ -516,10 +521,12 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain
>> *d, int rmid)
>>        */
>>       if (is_mbm_total_enabled()) {
>>           rr.evtid = QOS_L3_MBM_TOTAL_EVENT_ID;>> +        rr.val = 0;

> In mbm_update,  there no use the rr.val, so there no need to initialize ?

>>           __mon_event_count(rmid, &rr);
>>       }
>>       if (is_mbm_local_enabled()) {
>>           rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
>> +        rr.val = 0;

> ditto.

>>           __mon_event_count(rmid, &rr);
>>             /*

No, but this just leaves that problem for someone else to discover the hard way! I think
its fair for the compiler to complain that addition on an uninitialised field is a bug.

I'd prefer to keep this as it is on the principle of 'least surprise'.


Thanks,

James

2022-09-08 21:39:29

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v6 12/21] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks



On 9/8/2022 10:00 AM, James Morse wrote:
> Hi Hao Xin,
>
> On 07/09/2022 07:47, haoxin wrote:
>> 在 2022/9/2 下午11:48, James Morse 写道:
>>> mbm_bw_count() is only called by the mbm_handle_overflow() worker once a
>>> second. It reads the hardware register, calculates the bandwidth and
>>> updates m->prev_bw_msr which is used to hold the previous hardware register
>>> value.
>>>
>>> Operating directly on hardware register values makes it difficult to make
>>> this code architecture independent, so that it can be moved to /fs/,
>>> making the mba_sc feature something resctrl supports with no additional
>>> support from the architecture.
>>> Prior to calling mbm_bw_count(), mbm_update() reads from the same hardware
>>> register using __mon_event_count().
>>>
>>> Change mbm_bw_count() to use the current chunks value most recently saved
>>> by __mon_event_count(). This removes an extra call to __rmid_read().
>>> Instead of using m->prev_msr to calculate the number of chunks seen,
>>> use the rr->val that was updated by __mon_event_count(). This removes an
>>> extra call to mbm_overflow_count() and get_corrected_mbm_count().
>>> Calculating bandwidth like this means mbm_bw_count() no longer operates
>>> on hardware register values directly.
>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> index 3e69386cfe00..2d81b6cd9632 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>
>>> @@ -516,10 +521,12 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain
>>> *d, int rmid)
>>>        */
>>>       if (is_mbm_total_enabled()) {
>>>           rr.evtid = QOS_L3_MBM_TOTAL_EVENT_ID;>> +        rr.val = 0;
>
>> In mbm_update,  there no use the rr.val, so there no need to initialize ?

This patch introduces using rr.val into mbm_bw_count() that is called
from mbm_update(). This patch thus introduces the requirement that rr.val needs
to be initialized.

>
>>>           __mon_event_count(rmid, &rr);
>>>       }
>>>       if (is_mbm_local_enabled()) {
>>>           rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
>>> +        rr.val = 0;
>
>> ditto.
>
>>>           __mon_event_count(rmid, &rr);
>>>             /*
>
> No, but this just leaves that problem for someone else to discover the hard way! I think
> its fair for the compiler to complain that addition on an uninitialised field is a bug.
>
> I'd prefer to keep this as it is on the principle of 'least surprise'.

Yes, please do keep this. If rr.val is not initialized then the software controller will
use wrong data to compute the delta bandwidth.

Reinette

2022-09-13 01:44:05

by Shaopeng Tan (Fujitsu)

[permalink] [raw]
Subject: RE: [PATCH v6 00/21] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes

Hi James,

> Changes in this version?
> * Changed supports_mba_mbps() to use is_mbm_local_enabled()
>
> ---
> The aim of this series is to insert a split between the parts of the monitor code
> that the architecture must implement, and those that are part of the resctrl
> filesystem. The eventual aim is to move all filesystem parts out to live in
> /fs/resctrl, so that resctrl can be wired up for MPAM.
>
> What's MPAM? See the cover letter of a previous series. [1]
>
> The series adds domain online/offline callbacks to allow the filesystem to
> manage some of its structures itself, then moves all the 'mba_sc' behaviour to
> be part of the filesystem.
> This means another architecture doesn't need to provide an mbps_val array.
> As its all software, the resctrl filesystem should be able to do this without any
> help from the architecture code.
>
> Finally __rmid_read() is refactored to be the API call that the architecture
> provides to read a counter value. All the hardware specific overflow detection,
> scaling and value correction should occur behind this helper.
>
>
> This series is based on v6.0-rc3, and can be retrieved from:
> git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git
> mpam/resctrl_monitors_in_bytes/v6
>
> [0] git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git
> mpam/resctrl_merge_cdp/v7 [1]
> https://lore.kernel.org/lkml/[email protected]
> /
>
> Previous versions of this series:
> [v1]
> https://lore.kernel.org/lkml/[email protected]
> /
> [v2]
> https://lore.kernel.org/lkml/[email protected]
> /
> [v3]
> https://lore.kernel.org/lkml/[email protected]/
> [v4]
> https://lore.kernel.org/lkml/[email protected]
> /
> [v5]
> https://lore.kernel.org/lkml/[email protected]
> /
>
> James Morse (21):
> x86/resctrl: Kill off alloc_enabled
> x86/resctrl: Merge mon_capable and mon_enabled
> x86/resctrl: Add domain online callback for resctrl work
> x86/resctrl: Group struct rdt_hw_domain cleanup
> x86/resctrl: Add domain offline callback for resctrl work
> x86/resctrl: Remove set_mba_sc()s control array re-initialisation
> x86/resctrl: Abstract and use supports_mba_mbps()
> x86/resctrl: Create mba_sc configuration in the rdt_domain
> x86/resctrl: Switch over to the resctrl mbps_val list
> x86/resctrl: Remove architecture copy of mbps_val
> x86/resctrl: Allow update_mba_bw() to update controls directly
> x86/resctrl: Calculate bandwidth from the previous __mon_event_count()
> chunks
> x86/resctrl: Add per-rmid arch private storage for overflow and chunks
> x86/resctrl: Allow per-rmid arch private storage to be reset
> x86/resctrl: Abstract __rmid_read()
> x86/resctrl: Pass the required parameters into
> resctrl_arch_rmid_read()
> x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()
> x86/resctrl: Move get_corrected_mbm_count() into
> resctrl_arch_rmid_read()
> x86/resctrl: Rename and change the units of resctrl_cqm_threshold
> x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's
> boot_cpu_data
> x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes
>
> arch/x86/include/asm/resctrl.h | 9 +
> arch/x86/kernel/cpu/resctrl/core.c | 117 ++++-------
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 75 ++++---
> arch/x86/kernel/cpu/resctrl/internal.h | 61 +++---
> arch/x86/kernel/cpu/resctrl/monitor.c | 232
> ++++++++++++++--------
> arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 216
> ++++++++++++++++----
> include/linux/resctrl.h | 64 +++++-
> 8 files changed, 514 insertions(+), 262 deletions(-)
>
> --
> 2.30.2
I tested this patch series(patch v6) on Intel(R) Xeon(R) Gold 6254 CPU with resctrl selftest.
It is no problem.

Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>


2022-09-22 09:50:52

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v6 00/21] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes

Hello!

On 13/09/2022 02:00, [email protected] wrote:
>> Changes in this version?
>> * Changed supports_mba_mbps() to use is_mbm_local_enabled()
>>
>> ---
>> The aim of this series is to insert a split between the parts of the monitor code
>> that the architecture must implement, and those that are part of the resctrl
>> filesystem. The eventual aim is to move all filesystem parts out to live in
>> /fs/resctrl, so that resctrl can be wired up for MPAM.
>>
>> What's MPAM? See the cover letter of a previous series. [1]
>>
>> The series adds domain online/offline callbacks to allow the filesystem to
>> manage some of its structures itself, then moves all the 'mba_sc' behaviour to
>> be part of the filesystem.
>> This means another architecture doesn't need to provide an mbps_val array.
>> As its all software, the resctrl filesystem should be able to do this without any
>> help from the architecture code.
>>
>> Finally __rmid_read() is refactored to be the API call that the architecture
>> provides to read a counter value. All the hardware specific overflow detection,
>> scaling and value correction should occur behind this helper.

> I tested this patch series(patch v6) on Intel(R) Xeon(R) Gold 6254 CPU with resctrl selftest.
> It is no problem.
>
> Reviewed-by: Shaopeng Tan <[email protected]>
> Tested-by: Shaopeng Tan <[email protected]>

Thanks for testing it again!


James

Subject: [tip: x86/cache] x86/resctrl: Allow per-rmid arch private storage to be reset

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: fea62d370d7a1ba288d71d0cae7ad47c2a02b839
Gitweb: https://git.kernel.org/tip/fea62d370d7a1ba288d71d0cae7ad47c2a02b839
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:22
Committer: Borislav Petkov <[email protected]>
CommitterDate: Fri, 23 Sep 2022 12:49:04 +02:00

x86/resctrl: Allow per-rmid arch private storage to be reset

To abstract the rmid counters into a helper that returns the number
of bytes counted, architecture specific per-rmid state is needed.

It needs to be possible to reset this hidden state, as the values
may outlive the life of an rmid, or the mount time of the filesystem.

mon_event_read() is called with first = true when an rmid is first
allocated in mkdir_mondata_subdir(). Add resctrl_arch_reset_rmid()
and call it from __mon_event_count()'s rr->first check.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/internal.h | 18 +++----------
arch/x86/kernel/cpu/resctrl/monitor.c | 35 ++++++++++++++++++++++++-
include/linux/resctrl.h | 23 ++++++++++++++++-
3 files changed, 62 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 4de8e5b..b34a140 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -22,14 +22,6 @@

#define L2_QOS_CDP_ENABLE 0x01ULL

-/*
- * Event IDs are used to program IA32_QM_EVTSEL before reading event
- * counter from IA32_QM_CTR
- */
-#define QOS_L3_OCCUP_EVENT_ID 0x01
-#define QOS_L3_MBM_TOTAL_EVENT_ID 0x02
-#define QOS_L3_MBM_LOCAL_EVENT_ID 0x03
-
#define CQM_LIMBOCHECK_INTERVAL 1000

#define MBM_CNTR_WIDTH_BASE 24
@@ -73,7 +65,7 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
* @list: entry in &rdt_resource->evt_list
*/
struct mon_evt {
- u32 evtid;
+ enum resctrl_event_id evtid;
char *name;
struct list_head list;
};
@@ -90,9 +82,9 @@ struct mon_evt {
union mon_data_bits {
void *priv;
struct {
- unsigned int rid : 10;
- unsigned int evtid : 8;
- unsigned int domid : 14;
+ unsigned int rid : 10;
+ enum resctrl_event_id evtid : 8;
+ unsigned int domid : 14;
} u;
};

@@ -100,7 +92,7 @@ struct rmid_read {
struct rdtgroup *rgrp;
struct rdt_resource *r;
struct rdt_domain *d;
- int evtid;
+ enum resctrl_event_id evtid;
bool first;
u64 val;
};
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 2d81b6c..e975514 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -137,7 +137,37 @@ static inline struct rmid_entry *__rmid_entry(u32 rmid)
return entry;
}

-static u64 __rmid_read(u32 rmid, u32 eventid)
+static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain *hw_dom,
+ u32 rmid,
+ enum resctrl_event_id eventid)
+{
+ switch (eventid) {
+ case QOS_L3_OCCUP_EVENT_ID:
+ return NULL;
+ case QOS_L3_MBM_TOTAL_EVENT_ID:
+ return &hw_dom->arch_mbm_total[rmid];
+ case QOS_L3_MBM_LOCAL_EVENT_ID:
+ return &hw_dom->arch_mbm_local[rmid];
+ }
+
+ /* Never expect to get here */
+ WARN_ON_ONCE(1);
+
+ return NULL;
+}
+
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+ u32 rmid, enum resctrl_event_id eventid)
+{
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct arch_mbm_state *am;
+
+ am = get_arch_mbm_state(hw_dom, rmid, eventid);
+ if (am)
+ memset(am, 0, sizeof(*am));
+}
+
+static u64 __rmid_read(u32 rmid, enum resctrl_event_id eventid)
{
u64 val;

@@ -291,6 +321,9 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
struct mbm_state *m;
u64 chunks, tval;

+ if (rr->first)
+ resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
+
tval = __rmid_read(rmid, rr->evtid);
if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
return tval;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index f4c9101..8184567 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -32,6 +32,16 @@ enum resctrl_conf_type {

#define CDP_NUM_TYPES (CDP_DATA + 1)

+/*
+ * Event IDs, the values match those used to program IA32_QM_EVTSEL before
+ * reading IA32_QM_CTR on RDT systems.
+ */
+enum resctrl_event_id {
+ QOS_L3_OCCUP_EVENT_ID = 0x01,
+ QOS_L3_MBM_TOTAL_EVENT_ID = 0x02,
+ QOS_L3_MBM_LOCAL_EVENT_ID = 0x03,
+};
+
/**
* struct resctrl_staged_config - parsed configuration to be applied
* @new_ctrl: new ctrl value to be loaded
@@ -210,4 +220,17 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);

+/**
+ * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
+ * and eventid.
+ * @r: The domain's resource.
+ * @d: The rmid's domain.
+ * @rmid: The rmid whose counter values should be reset.
+ * @eventid: The eventid whose counter values should be reset.
+ *
+ * This can be called from any CPU.
+ */
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+ u32 rmid, enum resctrl_event_id eventid);
+
#endif /* _RESCTRL_H */

Subject: [tip: x86/cache] x86/resctrl: Add per-rmid arch private storage for overflow and chunks

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 48dbe31a243d5fc7c07b7f03b48e95ec4696b118
Gitweb: https://git.kernel.org/tip/48dbe31a243d5fc7c07b7f03b48e95ec4696b118
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:21
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 22 Sep 2022 17:46:09 +02:00

x86/resctrl: Add per-rmid arch private storage for overflow and chunks

A renamed __rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a counter. Currently the function returns
the MBM values in chunks directly from hardware. For bandwidth
counters the resctrl filesystem uses this to calculate the number of
bytes ever seen.

MPAM's scaling of counters can be changed at runtime, reducing the
resolution but increasing the range. When this is changed the prev_msr
values need to be converted by the architecture code.

Add an array for per-rmid private storage. The prev_msr and chunks
values will move here to allow resctrl_arch_rmid_read() to always
return the number of bytes read by this counter without assistance
from the filesystem. The values are moved in later patches when
the overflow and correction calls are moved into __rmid_read().

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/core.c | 35 +++++++++++++++++++++++++-
arch/x86/kernel/cpu/resctrl/internal.h | 14 ++++++++++-
2 files changed, 49 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 90ebb7d..de62b0b 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -413,6 +413,8 @@ static void setup_default_ctrlval(struct rdt_resource *r, u32 *dc)

static void domain_free(struct rdt_hw_domain *hw_dom)
{
+ kfree(hw_dom->arch_mbm_total);
+ kfree(hw_dom->arch_mbm_local);
kfree(hw_dom->ctrl_val);
kfree(hw_dom);
}
@@ -438,6 +440,34 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

+/**
+ * arch_domain_mbm_alloc() - Allocate arch private storage for the MBM counters
+ * @num_rmid: The size of the MBM counter array
+ * @hw_dom: The domain that owns the allocated arrays
+ */
+static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
+{
+ size_t tsize;
+
+ if (is_mbm_total_enabled()) {
+ tsize = sizeof(*hw_dom->arch_mbm_total);
+ hw_dom->arch_mbm_total = kcalloc(num_rmid, tsize, GFP_KERNEL);
+ if (!hw_dom->arch_mbm_total)
+ return -ENOMEM;
+ }
+ if (is_mbm_local_enabled()) {
+ tsize = sizeof(*hw_dom->arch_mbm_local);
+ hw_dom->arch_mbm_local = kcalloc(num_rmid, tsize, GFP_KERNEL);
+ if (!hw_dom->arch_mbm_local) {
+ kfree(hw_dom->arch_mbm_total);
+ hw_dom->arch_mbm_total = NULL;
+ return -ENOMEM;
+ }
+ }
+
+ return 0;
+}
+
/*
* domain_add_cpu - Add a cpu to a resource's domain list.
*
@@ -487,6 +517,11 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
return;
}

+ if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+ domain_free(hw_dom);
+ return;
+ }
+
list_add_tail(&d->list, add_pos);

err = resctrl_online_domain(r, d);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 4606209..4de8e5b 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -304,16 +304,30 @@ struct mbm_state {
};

/**
+ * struct arch_mbm_state - values used to compute resctrl_arch_rmid_read()s
+ * return value.
+ * @prev_msr: Value of IA32_QM_CTR last time it was read for the RMID used to
+ * find this struct.
+ */
+struct arch_mbm_state {
+ u64 prev_msr;
+};
+
+/**
* struct rdt_hw_domain - Arch private attributes of a set of CPUs that share
* a resource
* @d_resctrl: Properties exposed to the resctrl file system
* @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID)
+ * @arch_mbm_total: arch private state for MBM total bandwidth
+ * @arch_mbm_local: arch private state for MBM local bandwidth
*
* Members of this structure are accessed via helpers that provide abstraction.
*/
struct rdt_hw_domain {
struct rdt_domain d_resctrl;
u32 *ctrl_val;
+ struct arch_mbm_state *arch_mbm_total;
+ struct arch_mbm_state *arch_mbm_local;
};

static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)

Subject: [tip: x86/cache] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 30442571ec81fb33f7bd8cea5a14afb10b8f442a
Gitweb: https://git.kernel.org/tip/30442571ec81fb33f7bd8cea5a14afb10b8f442a
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:20
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 22 Sep 2022 17:44:57 +02:00

x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks

mbm_bw_count() is only called by the mbm_handle_overflow() worker once a
second. It reads the hardware register, calculates the bandwidth and
updates m->prev_bw_msr which is used to hold the previous hardware register
value.

Operating directly on hardware register values makes it difficult to make
this code architecture independent, so that it can be moved to /fs/,
making the mba_sc feature something resctrl supports with no additional
support from the architecture.
Prior to calling mbm_bw_count(), mbm_update() reads from the same hardware
register using __mon_event_count().

Change mbm_bw_count() to use the current chunks value most recently saved
by __mon_event_count(). This removes an extra call to __rmid_read().
Instead of using m->prev_msr to calculate the number of chunks seen,
use the rr->val that was updated by __mon_event_count(). This removes an
extra call to mbm_overflow_count() and get_corrected_mbm_count().
Calculating bandwidth like this means mbm_bw_count() no longer operates
on hardware register values directly.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/internal.h | 4 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 25 ++++++++++++++++---------
2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 3b9e43b..4606209 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -289,7 +289,7 @@ struct rftype {
* struct mbm_state - status for each MBM counter in each domain
* @chunks: Total data moved (multiply by rdt_group.mon_scale to get bytes)
* @prev_msr: Value of IA32_QM_CTR for this RMID last time we read it
- * @prev_bw_msr:Value of previous IA32_QM_CTR for bandwidth counting
+ * @prev_bw_chunks: Previous chunks value read for bandwidth calculation
* @prev_bw: The most recent bandwidth in MBps
* @delta_bw: Difference between the current and previous bandwidth
* @delta_comp: Indicates whether to compute the delta_bw
@@ -297,7 +297,7 @@ struct rftype {
struct mbm_state {
u64 chunks;
u64 prev_msr;
- u64 prev_bw_msr;
+ u64 prev_bw_chunks;
u32 prev_bw;
u32 delta_bw;
bool delta_comp;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 3e69386..2d81b6c 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -315,7 +315,7 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)

if (rr->first) {
memset(m, 0, sizeof(struct mbm_state));
- m->prev_bw_msr = m->prev_msr = tval;
+ m->prev_msr = tval;
return 0;
}

@@ -329,27 +329,32 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
}

/*
+ * mbm_bw_count() - Update bw count from values previously read by
+ * __mon_event_count().
+ * @rmid: The rmid used to identify the cached mbm_state.
+ * @rr: The struct rmid_read populated by __mon_event_count().
+ *
* Supporting function to calculate the memory bandwidth
- * and delta bandwidth in MBps.
+ * and delta bandwidth in MBps. The chunks value previously read by
+ * __mon_event_count() is compared with the chunks value from the previous
+ * invocation. This must be called once per second to maintain values in MBps.
*/
static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
struct mbm_state *m = &rr->d->mbm_local[rmid];
- u64 tval, cur_bw, chunks;
+ u64 cur_bw, chunks, cur_chunks;

- tval = __rmid_read(rmid, rr->evtid);
- if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
- return;
+ cur_chunks = rr->val;
+ chunks = cur_chunks - m->prev_bw_chunks;
+ m->prev_bw_chunks = cur_chunks;

- chunks = mbm_overflow_count(m->prev_bw_msr, tval, hw_res->mbm_width);
- cur_bw = (get_corrected_mbm_count(rmid, chunks) * hw_res->mon_scale) >> 20;
+ cur_bw = (chunks * hw_res->mon_scale) >> 20;

if (m->delta_comp)
m->delta_bw = abs(cur_bw - m->prev_bw);
m->delta_comp = false;
m->prev_bw = cur_bw;
- m->prev_bw_msr = tval;
}

/*
@@ -516,10 +521,12 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
*/
if (is_mbm_total_enabled()) {
rr.evtid = QOS_L3_MBM_TOTAL_EVENT_ID;
+ rr.val = 0;
__mon_event_count(rmid, &rr);
}
if (is_mbm_local_enabled()) {
rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
+ rr.val = 0;
__mon_event_count(rmid, &rr);

/*

Subject: [tip: x86/cache] x86/resctrl: Abstract and use supports_mba_mbps()

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: b045c215866393419fb960432ed6be69a0113ee1
Gitweb: https://git.kernel.org/tip/b045c215866393419fb960432ed6be69a0113ee1
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:15
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 22 Sep 2022 16:10:11 +02:00

x86/resctrl: Abstract and use supports_mba_mbps()

To determine whether the mba_MBps option to resctrl should be supported,
resctrl tests the boot CPUs' x86_vendor.

This isn't portable, and needs abstracting behind a helper so this check
can be part of the filesystem code that moves to /fs/.

Re-use the tests set_mba_sc() does to determine if the mba_sc is supported
on this system. An 'alloc_capable' test is added so that support for the
controls isn't implied by the 'delay_linear' property, which is always
true for MPAM. Because mbm_update() only update mba_sc if the mbm_local
counters are enabled, supports_mba_mbps() checks is_mbm_local_enabled().
(instead of using is_mbm_enabled(), which checks both).

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index b32ceff..4ee2626 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1890,17 +1890,26 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
}

/*
- * Enable or disable the MBA software controller
- * which helps user specify bandwidth in MBps.
* MBA software controller is supported only if
* MBM is supported and MBA is in linear scale.
*/
+static bool supports_mba_mbps(void)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
+
+ return (is_mbm_local_enabled() &&
+ r->alloc_capable && is_mba_linear());
+}
+
+/*
+ * Enable or disable the MBA software controller
+ * which helps user specify bandwidth in MBps.
+ */
static int set_mba_sc(bool mba_sc)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;

- if (!is_mbm_enabled() || !is_mba_linear() ||
- mba_sc == is_mba_sc(r))
+ if (!supports_mba_mbps() || mba_sc == is_mba_sc(r))
return -EINVAL;

r->membw.mba_sc = mba_sc;
@@ -2255,7 +2264,7 @@ static int rdt_parse_param(struct fs_context *fc, struct fs_parameter *param)
ctx->enable_cdpl2 = true;
return 0;
case Opt_mba_mbps:
- if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+ if (!supports_mba_mbps())
return -EINVAL;
ctx->enable_mba_mbps = true;
return 0;

Subject: [tip: x86/cache] x86/resctrl: Abstract __rmid_read()

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 4d044c521a63b2cd394ea6e3547012032145e47e
Gitweb: https://git.kernel.org/tip/4d044c521a63b2cd394ea6e3547012032145e47e
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:23
Committer: Borislav Petkov <[email protected]>
CommitterDate: Fri, 23 Sep 2022 14:17:20 +02:00

x86/resctrl: Abstract __rmid_read()

__rmid_read() selects the specified eventid and returns the counter
value from the MSR. The error handling is architecture specific, and
handled by the callers, rdtgroup_mondata_show() and __mon_event_count().

Error handling should be handled by architecture specific code, as
a different architecture may have different requirements. MPAM's
counters can report that they are 'not ready', requiring a second
read after a short delay. This should be hidden from resctrl.

Make __rmid_read() the architecture specific function for reading
a counter. Rename it resctrl_arch_rmid_read() and move the error
handling into it.

A read from a counter that hardware supports but resctrl does not
now returns -EINVAL instead of -EIO from the default case in
__mon_event_count(). It isn't possible for user-space to see this
change as resctrl doesn't expose counters it doesn't support.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 4 +-
arch/x86/kernel/cpu/resctrl/internal.h | 1 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 62 +++++++++++++---------
include/linux/resctrl.h | 1 +-
4 files changed, 43 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 0ab9232..42a1abb 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)

mon_event_read(&rr, r, d, rdtgrp, evtid, false);

- if (rr.val & RMID_VAL_ERROR)
+ if (rr.err == -EIO)
seq_puts(m, "Error\n");
- else if (rr.val & RMID_VAL_UNAVAIL)
+ else if (rr.err == -EINVAL)
seq_puts(m, "Unavailable\n");
else
seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index b34a140..1d2e7bd 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -94,6 +94,7 @@ struct rmid_read {
struct rdt_domain *d;
enum resctrl_event_id evtid;
bool first;
+ int err;
u64 val;
};

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index e975514..51ab76f 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -167,9 +167,9 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
memset(am, 0, sizeof(*am));
}

-static u64 __rmid_read(u32 rmid, enum resctrl_event_id eventid)
+int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
{
- u64 val;
+ u64 msr_val;

/*
* As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
@@ -180,14 +180,24 @@ static u64 __rmid_read(u32 rmid, enum resctrl_event_id eventid)
* are error bits.
*/
wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
- rdmsrl(MSR_IA32_QM_CTR, val);
+ rdmsrl(MSR_IA32_QM_CTR, msr_val);

- return val;
+ if (msr_val & RMID_VAL_ERROR)
+ return -EIO;
+ if (msr_val & RMID_VAL_UNAVAIL)
+ return -EINVAL;
+
+ *val = msr_val;
+
+ return 0;
}

static bool rmid_dirty(struct rmid_entry *entry)
{
- u64 val = __rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID);
+ u64 val = 0;
+
+ if (resctrl_arch_rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID, &val))
+ return true;

return val >= resctrl_cqm_threshold;
}
@@ -259,8 +269,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
{
struct rdt_resource *r;
struct rdt_domain *d;
- int cpu;
- u64 val;
+ int cpu, err;
+ u64 val = 0;

r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;

@@ -268,8 +278,10 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
cpu = get_cpu();
list_for_each_entry(d, &r->domains, list) {
if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
- val = __rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID);
- if (val <= resctrl_cqm_threshold)
+ err = resctrl_arch_rmid_read(entry->rmid,
+ QOS_L3_OCCUP_EVENT_ID,
+ &val);
+ if (err || val <= resctrl_cqm_threshold)
continue;
}

@@ -315,19 +327,19 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
return chunks >> shift;
}

-static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
+static int __mon_event_count(u32 rmid, struct rmid_read *rr)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
struct mbm_state *m;
- u64 chunks, tval;
+ u64 chunks, tval = 0;

if (rr->first)
resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);

- tval = __rmid_read(rmid, rr->evtid);
- if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
- return tval;
- }
+ rr->err = resctrl_arch_rmid_read(rmid, rr->evtid, &tval);
+ if (rr->err)
+ return rr->err;
+
switch (rr->evtid) {
case QOS_L3_OCCUP_EVENT_ID:
rr->val += tval;
@@ -341,9 +353,9 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
default:
/*
* Code would never reach here because an invalid
- * event id would fail the __rmid_read.
+ * event id would fail in resctrl_arch_rmid_read().
*/
- return RMID_VAL_ERROR;
+ return -EINVAL;
}

if (rr->first) {
@@ -399,11 +411,11 @@ void mon_event_count(void *info)
struct rdtgroup *rdtgrp, *entry;
struct rmid_read *rr = info;
struct list_head *head;
- u64 ret_val;
+ int ret;

rdtgrp = rr->rgrp;

- ret_val = __mon_event_count(rdtgrp->mon.rmid, rr);
+ ret = __mon_event_count(rdtgrp->mon.rmid, rr);

/*
* For Ctrl groups read data from child monitor groups and
@@ -415,13 +427,17 @@ void mon_event_count(void *info)
if (rdtgrp->type == RDTCTRL_GROUP) {
list_for_each_entry(entry, head, mon.crdtgrp_list) {
if (__mon_event_count(entry->mon.rmid, rr) == 0)
- ret_val = 0;
+ ret = 0;
}
}

- /* Report error if none of rmid_reads are successful */
- if (ret_val)
- rr->val = ret_val;
+ /*
+ * __mon_event_count() calls for newly created monitor groups may
+ * report -EINVAL/Unavailable if the monitor hasn't seen any traffic.
+ * Discard error if any of the monitor event reads succeeded.
+ */
+ if (ret == 0)
+ rr->err = 0;
}

/*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 8184567..efe60dd 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -219,6 +219,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
+int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *res);

/**
* resctrl_arch_reset_rmid() - Reset any private state associated with rmid

Subject: [tip: x86/cache] x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: d80975e264c8f01518890f3d91ab5bada8fa7f5e
Gitweb: https://git.kernel.org/tip/d80975e264c8f01518890f3d91ab5bada8fa7f5e
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:28
Committer: Borislav Petkov <[email protected]>
CommitterDate: Fri, 23 Sep 2022 14:24:16 +02:00

x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data

resctrl_rmid_realloc_threshold can be set by user-space. The maximum
value is specified by the architecture.

Currently max_threshold_occ_write() reads the maximum value from
boot_cpu_data.x86_cache_size, which is not portable to another
architecture.

Add resctrl_rmid_realloc_limit to describe the maximum size in bytes
that user-space can set the threshold to.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/monitor.c | 9 +++++++--
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 2 +-
include/linux/resctrl.h | 1 +
3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index e91afe9..8d15568 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -67,6 +67,11 @@ unsigned int rdt_mon_features;
*/
unsigned int resctrl_rmid_realloc_threshold;

+/*
+ * This is the maximum value for the reallocation threshold, in bytes.
+ */
+unsigned int resctrl_rmid_realloc_limit;
+
#define CF(cf) ((unsigned long)(1048576 * (cf) + 0.5))

/*
@@ -747,10 +752,10 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
{
unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- unsigned int cl_size = boot_cpu_data.x86_cache_size;
unsigned int threshold;
int ret;

+ resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1;
hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
@@ -767,7 +772,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
*
* For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
*/
- threshold = cl_size * 1024 / r->num_rmid;
+ threshold = resctrl_rmid_realloc_limit / r->num_rmid;

/*
* Because num_rmid may not be a power of two, round the value
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 849bdec..e5a48f0 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1059,7 +1059,7 @@ static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
if (ret)
return ret;

- if (bytes > (boot_cpu_data.x86_cache_size * 1024))
+ if (bytes > resctrl_rmid_realloc_limit)
return -EINVAL;

resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(bytes);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 9995d04..cb857f7 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -251,5 +251,6 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
u32 rmid, enum resctrl_event_id eventid);

extern unsigned int resctrl_rmid_realloc_threshold;
+extern unsigned int resctrl_rmid_realloc_limit;

#endif /* _RESCTRL_H */

Subject: [tip: x86/cache] x86/resctrl: Allow update_mba_bw() to update controls directly

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: ff6357bb50023af2a1dc8f113930082c5252c753
Gitweb: https://git.kernel.org/tip/ff6357bb50023af2a1dc8f113930082c5252c753
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:19
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 22 Sep 2022 17:43:44 +02:00

x86/resctrl: Allow update_mba_bw() to update controls directly

update_mba_bw() calculates a new control value for the MBA resource
based on the user provided mbps_val and the current measured
bandwidth. Some control values need remapping by delay_bw_map().

It does this by calling wrmsrl() directly. This needs splitting
up to be done by an architecture specific helper, so that the
remainder can eventually be moved to /fs/.

Add resctrl_arch_update_one() to apply one configuration value
to the provided resource and domain. This avoids the staging
and cross-calling that is only needed with changes made by
user-space. delay_bw_map() moves to be part of the arch code,
to maintain the 'percentage control' view of MBA resources
in resctrl.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/core.c | 2 +-
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 21 +++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/internal.h | 1 -
arch/x86/kernel/cpu/resctrl/monitor.c | 13 ++++---------
include/linux/resctrl.h | 8 ++++++++
5 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index f0e2820..90ebb7d 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -296,7 +296,7 @@ mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
* that can be written to QOS_MSRs.
* There are currently no SKUs which support non linear delay values.
*/
-u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
+static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
{
if (r->membw.delay_linear)
return MAX_MBA_BW - bw;
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index bf9d73c..0ab9232 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -282,6 +282,27 @@ static bool apply_config(struct rdt_hw_domain *hw_dom,
return false;
}

+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
+ u32 closid, enum resctrl_conf_type t, u32 cfg_val)
+{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ u32 idx = get_config_index(closid, t);
+ struct msr_param msr_param;
+
+ if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
+ return -EINVAL;
+
+ hw_dom->ctrl_val[idx] = cfg_val;
+
+ msr_param.res = r;
+ msr_param.low = idx;
+ msr_param.high = idx + 1;
+ hw_res->msr_update(d, &msr_param, r);
+
+ return 0;
+}
+
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
{
struct resctrl_staged_config *cfg;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 373aaba..3b9e43b 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -527,7 +527,6 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom,
void mbm_handle_overflow(struct work_struct *work);
void __init intel_rdt_mbm_apply_quirk(void);
bool is_mba_sc(struct rdt_resource *r);
-u32 delay_bw_map(unsigned long bw, struct rdt_resource *r);
void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
void cqm_handle_limbo(struct work_struct *work);
bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 16028b2..3e69386 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -420,10 +420,8 @@ void mon_event_count(void *info)
*/
static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
{
- u32 closid, rmid, cur_msr, cur_msr_val, new_msr_val;
+ u32 closid, rmid, cur_msr_val, new_msr_val;
struct mbm_state *pmbm_data, *cmbm_data;
- struct rdt_hw_resource *hw_r_mba;
- struct rdt_hw_domain *hw_dom_mba;
u32 cur_bw, delta_bw, user_bw;
struct rdt_resource *r_mba;
struct rdt_domain *dom_mba;
@@ -433,8 +431,8 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
if (!is_mbm_local_enabled())
return;

- hw_r_mba = &rdt_resources_all[RDT_RESOURCE_MBA];
- r_mba = &hw_r_mba->r_resctrl;
+ r_mba = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
+
closid = rgrp->closid;
rmid = rgrp->mon.rmid;
pmbm_data = &dom_mbm->mbm_local[rmid];
@@ -444,7 +442,6 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
pr_warn_once("Failure to get domain for MBA update\n");
return;
}
- hw_dom_mba = resctrl_to_arch_dom(dom_mba);

cur_bw = pmbm_data->prev_bw;
user_bw = dom_mba->mbps_val[closid];
@@ -486,9 +483,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
return;
}

- cur_msr = hw_r_mba->msr_base + closid;
- wrmsrl(cur_msr, delay_bw_map(new_msr_val, r_mba));
- hw_dom_mba->ctrl_val[closid] = new_msr_val;
+ resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val);

/*
* Delta values are updated dynamically package wise for each
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 93dfe55..f4c9101 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -197,6 +197,14 @@ struct resctrl_schema {
/* The number of closid supported by this resource regardless of CDP */
u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
+
+/*
+ * Update the ctrl_val and apply this config right now.
+ * Must be called on one of the domain's CPUs.
+ */
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
+ u32 closid, enum resctrl_conf_type t, u32 cfg_val);
+
u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);

Subject: [tip: x86/cache] x86/resctrl: Move get_corrected_mbm_count() into resctrl_arch_rmid_read()

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 38f72f50d6498ee60ac89deff3686e34ce0c2a32
Gitweb: https://git.kernel.org/tip/38f72f50d6498ee60ac89deff3686e34ce0c2a32
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:26
Committer: Borislav Petkov <[email protected]>
CommitterDate: Fri, 23 Sep 2022 14:22:53 +02:00

x86/resctrl: Move get_corrected_mbm_count() into resctrl_arch_rmid_read()

resctrl_arch_rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a counter. Currently the function returns
the MBM values in chunks directly from hardware. When reading a bandwidth
counter, get_corrected_mbm_count() must be used to correct the
value read.

get_corrected_mbm_count() is architecture specific, this work should be
done in resctrl_arch_rmid_read().

Move the function calls. This allows the resctrl filesystems's chunks
value to be removed in favour of the architecture private version.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/internal.h | 4 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 8 ++++----
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 8039e8a..bdb55c2 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -280,14 +280,12 @@ struct rftype {

/**
* struct mbm_state - status for each MBM counter in each domain
- * @chunks: Total data moved (multiply by rdt_group.mon_scale to get bytes)
* @prev_bw_chunks: Previous chunks value read for bandwidth calculation
* @prev_bw: The most recent bandwidth in MBps
* @delta_bw: Difference between the current and previous bandwidth
* @delta_comp: Indicates whether to compute the delta_bw
*/
struct mbm_state {
- u64 chunks;
u64 prev_bw_chunks;
u32 prev_bw;
u32 delta_bw;
@@ -297,10 +295,12 @@ struct mbm_state {
/**
* struct arch_mbm_state - values used to compute resctrl_arch_rmid_read()s
* return value.
+ * @chunks: Total data moved (multiply by rdt_group.mon_scale to get bytes)
* @prev_msr: Value of IA32_QM_CTR last time it was read for the RMID used to
* find this struct.
*/
struct arch_mbm_state {
+ u64 chunks;
u64 prev_msr;
};

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 862a446..27bb494 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -204,7 +204,9 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,

am = get_arch_mbm_state(hw_dom, rmid, eventid);
if (am) {
- *val = mbm_overflow_count(am->prev_msr, msr_val, hw_res->mbm_width);
+ am->chunks += mbm_overflow_count(am->prev_msr, msr_val,
+ hw_res->mbm_width);
+ *val = get_corrected_mbm_count(rmid, am->chunks);
am->prev_msr = msr_val;
} else {
*val = msr_val;
@@ -374,9 +376,7 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
return 0;
}

- m->chunks += tval;
-
- rr->val += get_corrected_mbm_count(rmid, m->chunks);
+ rr->val += tval;

return 0;
}

Subject: [tip: x86/cache] x86/resctrl: Create mba_sc configuration in the rdt_domain

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 781096d971dfe3c5f9401a300bdb0b148a600584
Gitweb: https://git.kernel.org/tip/781096d971dfe3c5f9401a300bdb0b148a600584
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:16
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 22 Sep 2022 17:17:59 +02:00

x86/resctrl: Create mba_sc configuration in the rdt_domain

To support resctrl's MBA software controller, the architecture must provide
a second configuration array to hold the mbps_val[] from user-space.

This complicates the interface between the architecture specific code and
the filesystem portions of resctrl that will move to /fs/, to allow
multiple architectures to support resctrl.

Make the filesystem parts of resctrl create an array for the mba_sc
values. The software controller can be changed to use this, allowing
the architecture code to only consider the values configured in hardware.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/internal.h | 1 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 39 +++++++++++++++++++++++++-
include/linux/resctrl.h | 7 ++++-
3 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e12b55f..a7e2cbc 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -36,7 +36,6 @@
#define MBM_OVERFLOW_INTERVAL 1000
#define MAX_MBA_BW 100u
#define MBA_IS_LINEAR 0x4
-#define MBA_MAX_MBPS U32_MAX
#define MAX_MBA_BW_AMD 0x800
#define MBM_CNTR_WIDTH_OFFSET_AMD 20

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 4ee2626..f7ebd01 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1889,6 +1889,30 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
l3_qos_cfg_update(&hw_res->cdp_enabled);
}

+static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_domain *d)
+{
+ u32 num_closid = resctrl_arch_get_num_closid(r);
+ int cpu = cpumask_any(&d->cpu_mask);
+ int i;
+
+ d->mbps_val = kcalloc_node(num_closid, sizeof(*d->mbps_val),
+ GFP_KERNEL, cpu_to_node(cpu));
+ if (!d->mbps_val)
+ return -ENOMEM;
+
+ for (i = 0; i < num_closid; i++)
+ d->mbps_val[i] = MBA_MAX_MBPS;
+
+ return 0;
+}
+
+static void mba_sc_domain_destroy(struct rdt_resource *r,
+ struct rdt_domain *d)
+{
+ kfree(d->mbps_val);
+ d->mbps_val = NULL;
+}
+
/*
* MBA software controller is supported only if
* MBM is supported and MBA is in linear scale.
@@ -1908,12 +1932,20 @@ static bool supports_mba_mbps(void)
static int set_mba_sc(bool mba_sc)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
+ u32 num_closid = resctrl_arch_get_num_closid(r);
+ struct rdt_domain *d;
+ int i;

if (!supports_mba_mbps() || mba_sc == is_mba_sc(r))
return -EINVAL;

r->membw.mba_sc = mba_sc;

+ list_for_each_entry(d, &r->domains, list) {
+ for (i = 0; i < num_closid; i++)
+ d->mbps_val[i] = MBA_MAX_MBPS;
+ }
+
return 0;
}

@@ -3247,6 +3279,9 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
{
lockdep_assert_held(&rdtgroup_mutex);

+ if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
+ mba_sc_domain_destroy(r, d);
+
if (!r->mon_capable)
return;

@@ -3311,6 +3346,10 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)

lockdep_assert_held(&rdtgroup_mutex);

+ if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
+ /* RDT_RESOURCE_MBA is never mon_capable */
+ return mba_sc_domain_allocate(r, d);
+
if (!r->mon_capable)
return 0;

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 5d283bd..93dfe55 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -15,6 +15,9 @@ int proc_resctrl_show(struct seq_file *m,

#endif

+/* max value for struct rdt_domain's mbps_val */
+#define MBA_MAX_MBPS U32_MAX
+
/**
* enum resctrl_conf_type - The type of configuration.
* @CDP_NONE: No prioritisation, both code and data are controlled or monitored.
@@ -53,6 +56,9 @@ struct resctrl_staged_config {
* @cqm_work_cpu: worker CPU for CQM h/w counters
* @plr: pseudo-locked region (if any) associated with domain
* @staged_config: parsed configuration to be applied
+ * @mbps_val: When mba_sc is enabled, this holds the array of user
+ * specified control values for mba_sc in MBps, indexed
+ * by closid
*/
struct rdt_domain {
struct list_head list;
@@ -67,6 +73,7 @@ struct rdt_domain {
int cqm_work_cpu;
struct pseudo_lock_region *plr;
struct resctrl_staged_config staged_config[CDP_NUM_TYPES];
+ u32 *mbps_val;
};

/**

Subject: [tip: x86/cache] x86/resctrl: Kill off alloc_enabled

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 4d269ed485298e8a09485a664e7b35b370ab4ada
Gitweb: https://git.kernel.org/tip/4d269ed485298e8a09485a664e7b35b370ab4ada
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:09
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 22 Sep 2022 14:34:33 +02:00

x86/resctrl: Kill off alloc_enabled

rdt_resources_all[] used to have extra entries for L2CODE/L2DATA.
These were hidden from resctrl by the alloc_enabled value.

Now that the L2/L2CODE/L2DATA resources have been merged together,
alloc_enabled doesn't mean anything, it always has the same value as
alloc_capable which indicates allocation is supported by this resource.

Remove alloc_enabled and its helpers.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/core.c | 4 ----
arch/x86/kernel/cpu/resctrl/internal.h | 4 ----
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6 +++---
include/linux/resctrl.h | 2 --
5 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index bb1c3f5..2f87177 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -147,7 +147,6 @@ static inline void cache_alloc_hsw_probe(void)
r->cache.shareable_bits = 0xc0000;
r->cache.min_cbm_bits = 2;
r->alloc_capable = true;
- r->alloc_enabled = true;

rdt_alloc_capable = true;
}
@@ -211,7 +210,6 @@ static bool __get_mem_config_intel(struct rdt_resource *r)
thread_throttle_mode_init();

r->alloc_capable = true;
- r->alloc_enabled = true;

return true;
}
@@ -242,7 +240,6 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
r->data_width = 4;

r->alloc_capable = true;
- r->alloc_enabled = true;

return true;
}
@@ -261,7 +258,6 @@ static void rdt_get_cache_alloc_cfg(int idx, struct rdt_resource *r)
r->cache.shareable_bits = ebx & r->default_ctrl;
r->data_width = (r->cache.cbm_len + 3) / 4;
r->alloc_capable = true;
- r->alloc_enabled = true;
}

static void rdt_get_cdp_config(int level)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 1d64718..53f3d27 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -459,10 +459,6 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
for_each_rdt_resource(r) \
if (r->mon_capable)

-#define for_each_alloc_enabled_rdt_resource(r) \
- for_each_rdt_resource(r) \
- if (r->alloc_enabled)
-
#define for_each_mon_enabled_rdt_resource(r) \
for_each_rdt_resource(r) \
if (r->mon_enabled)
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 4d83989..d961ae3 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -837,7 +837,7 @@ bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
* First determine which cpus have pseudo-locked regions
* associated with them.
*/
- for_each_alloc_enabled_rdt_resource(r) {
+ for_each_alloc_capable_rdt_resource(r) {
list_for_each_entry(d_i, &r->domains, list) {
if (d_i->plr)
cpumask_or(cpu_with_psl, cpu_with_psl,
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f276aff..526eb93 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1756,7 +1756,7 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
if (ret)
goto out_destroy;

- /* loop over enabled controls, these are all alloc_enabled */
+ /* loop over enabled controls, these are all alloc_capable */
list_for_each_entry(s, &resctrl_schema_all, list) {
r = s->res;
fflags = r->fflags | RF_CTRL_INFO;
@@ -2106,7 +2106,7 @@ static int schemata_list_create(void)
struct rdt_resource *r;
int ret = 0;

- for_each_alloc_enabled_rdt_resource(r) {
+ for_each_alloc_capable_rdt_resource(r) {
if (resctrl_arch_get_cdp_enabled(r->rid)) {
ret = schemata_list_add(r, CDP_CODE);
if (ret)
@@ -2452,7 +2452,7 @@ static void rdt_kill_sb(struct super_block *sb)
set_mba_sc(false);

/*Put everything back to default values. */
- for_each_alloc_enabled_rdt_resource(r)
+ for_each_alloc_capable_rdt_resource(r)
reset_all_ctrls(r);
cdp_disable_all();
rmdir_all_sub();
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 21deb52..386ab3a 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -130,7 +130,6 @@ struct resctrl_schema;
/**
* struct rdt_resource - attributes of a resctrl resource
* @rid: The index of the resource
- * @alloc_enabled: Is allocation enabled on this machine
* @mon_enabled: Is monitoring enabled for this feature
* @alloc_capable: Is allocation available on this machine
* @mon_capable: Is monitor feature available on this machine
@@ -150,7 +149,6 @@ struct resctrl_schema;
*/
struct rdt_resource {
int rid;
- bool alloc_enabled;
bool mon_enabled;
bool alloc_capable;
bool mon_capable;

Subject: [tip: x86/cache] x86/resctrl: Remove architecture copy of mbps_val

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: b58d4eb1f199f5a26d8c756d8e74a31c48b90428
Gitweb: https://git.kernel.org/tip/b58d4eb1f199f5a26d8c756d8e74a31c48b90428
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:18
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 22 Sep 2022 17:37:16 +02:00

x86/resctrl: Remove architecture copy of mbps_val

The resctrl arch code provides a second configuration array mbps_val[]
for the MBA software controller.

Since resctrl switched over to allocating and freeing its own array
when needed, nothing uses the arch code version.

Remove it.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/core.c | 20 ++++----------------
arch/x86/kernel/cpu/resctrl/internal.h | 3 ---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 4 +---
3 files changed, 5 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index f691829..f0e2820 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -397,7 +397,7 @@ struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
return NULL;
}

-void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
+static void setup_default_ctrlval(struct rdt_resource *r, u32 *dc)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
int i;
@@ -406,18 +406,14 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
* Initialize the Control MSRs to having no control.
* For Cache Allocation: Set all bits in cbm
* For Memory Allocation: Set b/w requested to 100%
- * and the bandwidth in MBps to U32_MAX
*/
- for (i = 0; i < hw_res->num_closid; i++, dc++, dm++) {
+ for (i = 0; i < hw_res->num_closid; i++, dc++)
*dc = r->default_ctrl;
- *dm = MBA_MAX_MBPS;
- }
}

static void domain_free(struct rdt_hw_domain *hw_dom)
{
kfree(hw_dom->ctrl_val);
- kfree(hw_dom->mbps_val);
kfree(hw_dom);
}

@@ -426,23 +422,15 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct msr_param m;
- u32 *dc, *dm;
+ u32 *dc;

dc = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->ctrl_val),
GFP_KERNEL);
if (!dc)
return -ENOMEM;

- dm = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->mbps_val),
- GFP_KERNEL);
- if (!dm) {
- kfree(dc);
- return -ENOMEM;
- }
-
hw_dom->ctrl_val = dc;
- hw_dom->mbps_val = dm;
- setup_default_ctrlval(r, dc, dm);
+ setup_default_ctrlval(r, dc);

m.low = 0;
m.high = hw_res->num_closid;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index a7e2cbc..373aaba 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -308,14 +308,12 @@ struct mbm_state {
* a resource
* @d_resctrl: Properties exposed to the resctrl file system
* @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID)
- * @mbps_val: When mba_sc is enabled, this holds the bandwidth in MBps
*
* Members of this structure are accessed via helpers that provide abstraction.
*/
struct rdt_hw_domain {
struct rdt_domain d_resctrl;
u32 *ctrl_val;
- u32 *mbps_val;
};

static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
@@ -529,7 +527,6 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom,
void mbm_handle_overflow(struct work_struct *work);
void __init intel_rdt_mbm_apply_quirk(void);
bool is_mba_sc(struct rdt_resource *r);
-void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm);
u32 delay_bw_map(unsigned long bw, struct rdt_resource *r);
void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
void cqm_handle_limbo(struct work_struct *work);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 55d8a12..6c33dfe 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2370,10 +2370,8 @@ static int reset_all_ctrls(struct rdt_resource *r)
hw_dom = resctrl_to_arch_dom(d);
cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);

- for (i = 0; i < hw_res->num_closid; i++) {
+ for (i = 0; i < hw_res->num_closid; i++)
hw_dom->ctrl_val[i] = r->default_ctrl;
- hw_dom->mbps_val[i] = MBA_MAX_MBPS;
- }
}
cpu = get_cpu();
/* Update CBM on this cpu if it's in cpu_mask. */

Subject: [tip: x86/cache] x86/resctrl: Group struct rdt_hw_domain cleanup

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 7add3af4178d9e25afc8d990a7d1000ccb22b6a0
Gitweb: https://git.kernel.org/tip/7add3af4178d9e25afc8d990a7d1000ccb22b6a0
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:12
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 22 Sep 2022 15:27:15 +02:00

x86/resctrl: Group struct rdt_hw_domain cleanup

domain_add_cpu() and domain_remove_cpu() need to kfree() the child
arrays that were allocated by domain_setup_ctrlval().

As this memory is moved around, and new arrays are created, adjusting
the error handling cleanup code becomes noisier.

To simplify this, move all the kfree() calls into a domain_free() helper.
This depends on struct rdt_hw_domain being kzalloc()d, allowing it to
unconditionally kfree() all the child arrays.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/core.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 25f3014..e37889f 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -414,6 +414,13 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
}
}

+static void domain_free(struct rdt_hw_domain *hw_dom)
+{
+ kfree(hw_dom->ctrl_val);
+ kfree(hw_dom->mbps_val);
+ kfree(hw_dom);
+}
+
static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
@@ -488,7 +495,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
rdt_domain_reconfigure_cdp(r);

if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
- kfree(hw_dom);
+ domain_free(hw_dom);
return;
}

@@ -497,9 +504,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
err = resctrl_online_domain(r, d);
if (err) {
list_del(&d->list);
- kfree(hw_dom->ctrl_val);
- kfree(hw_dom->mbps_val);
- kfree(hw_dom);
+ domain_free(hw_dom);
}
}

@@ -547,12 +552,10 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
if (d->plr)
d->plr->d = NULL;

- kfree(hw_dom->ctrl_val);
- kfree(hw_dom->mbps_val);
bitmap_free(d->rmid_busy_llc);
kfree(d->mbm_total);
kfree(d->mbm_local);
- kfree(hw_dom);
+ domain_free(hw_dom);
return;
}

Subject: [tip: x86/cache] x86/resctrl: Switch over to the resctrl mbps_val list

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 6ce1560d35f63a458fead11ac865bc39cea9bc46
Gitweb: https://git.kernel.org/tip/6ce1560d35f63a458fead11ac865bc39cea9bc46
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:17
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 22 Sep 2022 17:34:08 +02:00

x86/resctrl: Switch over to the resctrl mbps_val list

Updates to resctrl's software controller follow the same path as
other configuration updates, but they don't modify the hardware state.
rdtgroup_schemata_write() uses parse_line() and the resource's
parse_ctrlval() function to stage the configuration.
resctrl_arch_update_domains() then updates the mbps_val[] array
instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update()
call that would update hardware.

This complicates the interface between resctrl's filesystem parts
and architecture specific code. It should be possible for mba_sc
to be completely implemented by the filesystem parts of resctrl. This
would allow it to work on a second architecture with no additional code.
resctrl_arch_update_domains() using the mbps_val[] array prevents this.

Change parse_bw() to write the configuration value directly to the
mbps_val[] array in the domain structure. Change rdtgroup_schemata_write()
to skip the call to resctrl_arch_update_domains(), meaning all the
mba_sc specific code in resctrl_arch_update_domains() can be removed.
On the read-side, show_doms() and update_mba_bw() are changed to read
the mbps_val[] array from the domain structure. With this,
resctrl_arch_get_config() no longer needs to consider mba_sc resources.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 44 +++++++++++++---------
arch/x86/kernel/cpu/resctrl/monitor.c | 10 ++---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 27 ++++++++++----
3 files changed, 52 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 8766627..bf9d73c 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -61,6 +61,7 @@ int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
struct rdt_domain *d)
{
struct resctrl_staged_config *cfg;
+ u32 closid = data->rdtgrp->closid;
struct rdt_resource *r = s->res;
unsigned long bw_val;

@@ -72,6 +73,12 @@ int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,

if (!bw_validate(data->buf, &bw_val, r))
return -EINVAL;
+
+ if (is_mba_sc(r)) {
+ d->mbps_val[closid] = bw_val;
+ return 0;
+ }
+
cfg->new_ctrl = bw_val;
cfg->have_new_ctrl = true;

@@ -261,14 +268,13 @@ static u32 get_config_index(u32 closid, enum resctrl_conf_type type)

static bool apply_config(struct rdt_hw_domain *hw_dom,
struct resctrl_staged_config *cfg, u32 idx,
- cpumask_var_t cpu_mask, bool mba_sc)
+ cpumask_var_t cpu_mask)
{
struct rdt_domain *dom = &hw_dom->d_resctrl;
- u32 *dc = !mba_sc ? hw_dom->ctrl_val : hw_dom->mbps_val;

- if (cfg->new_ctrl != dc[idx]) {
+ if (cfg->new_ctrl != hw_dom->ctrl_val[idx]) {
cpumask_set_cpu(cpumask_any(&dom->cpu_mask), cpu_mask);
- dc[idx] = cfg->new_ctrl;
+ hw_dom->ctrl_val[idx] = cfg->new_ctrl;

return true;
}
@@ -284,14 +290,12 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
enum resctrl_conf_type t;
cpumask_var_t cpu_mask;
struct rdt_domain *d;
- bool mba_sc;
int cpu;
u32 idx;

if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
return -ENOMEM;

- mba_sc = is_mba_sc(r);
msr_param.res = NULL;
list_for_each_entry(d, &r->domains, list) {
hw_dom = resctrl_to_arch_dom(d);
@@ -301,7 +305,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
continue;

idx = get_config_index(closid, t);
- if (!apply_config(hw_dom, cfg, idx, cpu_mask, mba_sc))
+ if (!apply_config(hw_dom, cfg, idx, cpu_mask))
continue;

if (!msr_param.res) {
@@ -315,11 +319,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
}
}

- /*
- * Avoid writing the control msr with control values when
- * MBA software controller is enabled
- */
- if (cpumask_empty(cpu_mask) || mba_sc)
+ if (cpumask_empty(cpu_mask))
goto done;
cpu = get_cpu();
/* Update resource control msr on this CPU if it's in cpu_mask. */
@@ -406,6 +406,14 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,

list_for_each_entry(s, &resctrl_schema_all, list) {
r = s->res;
+
+ /*
+ * Writes to mba_sc resources update the software controller,
+ * not the control MSR.
+ */
+ if (is_mba_sc(r))
+ continue;
+
ret = resctrl_arch_update_domains(r, rdtgrp->closid);
if (ret)
goto out;
@@ -433,9 +441,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
u32 idx = get_config_index(closid, type);

- if (!is_mba_sc(r))
- return hw_dom->ctrl_val[idx];
- return hw_dom->mbps_val[idx];
+ return hw_dom->ctrl_val[idx];
}

static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int closid)
@@ -450,8 +456,12 @@ static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int clo
if (sep)
seq_puts(s, ";");

- ctrl_val = resctrl_arch_get_config(r, dom, closid,
- schema->conf_type);
+ if (is_mba_sc(r))
+ ctrl_val = dom->mbps_val[closid];
+ else
+ ctrl_val = resctrl_arch_get_config(r, dom, closid,
+ schema->conf_type);
+
seq_printf(s, r->format_str, dom->id, max_data_width,
ctrl_val);
sep = true;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 497cadf..16028b2 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -447,13 +447,11 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
hw_dom_mba = resctrl_to_arch_dom(dom_mba);

cur_bw = pmbm_data->prev_bw;
- user_bw = resctrl_arch_get_config(r_mba, dom_mba, closid, CDP_NONE);
+ user_bw = dom_mba->mbps_val[closid];
delta_bw = pmbm_data->delta_bw;
- /*
- * resctrl_arch_get_config() chooses the mbps/ctrl value to return
- * based on is_mba_sc(). For now, reach into the hw_dom.
- */
- cur_msr_val = hw_dom_mba->ctrl_val[closid];
+
+ /* MBA resource doesn't support CDP */
+ cur_msr_val = resctrl_arch_get_config(r_mba, dom_mba, closid, CDP_NONE);

/*
* For Ctrl groups read data from child monitor groups.
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f7ebd01..55d8a12 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1356,11 +1356,13 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
struct seq_file *s, void *v)
{
struct resctrl_schema *schema;
+ enum resctrl_conf_type type;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
struct rdt_domain *d;
unsigned int size;
int ret = 0;
+ u32 closid;
bool sep;
u32 ctrl;

@@ -1386,8 +1388,11 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
goto out;
}

+ closid = rdtgrp->closid;
+
list_for_each_entry(schema, &resctrl_schema_all, list) {
r = schema->res;
+ type = schema->conf_type;
sep = false;
seq_printf(s, "%*s:", max_name_width, schema->name);
list_for_each_entry(d, &r->domains, list) {
@@ -1396,9 +1401,12 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
size = 0;
} else {
- ctrl = resctrl_arch_get_config(r, d,
- rdtgrp->closid,
- schema->conf_type);
+ if (is_mba_sc(r))
+ ctrl = d->mbps_val[closid];
+ else
+ ctrl = resctrl_arch_get_config(r, d,
+ closid,
+ type);
if (r->rid == RDT_RESOURCE_MBA)
size = ctrl;
else
@@ -2818,14 +2826,19 @@ static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid)
}

/* Initialize MBA resource with default values. */
-static void rdtgroup_init_mba(struct rdt_resource *r)
+static void rdtgroup_init_mba(struct rdt_resource *r, u32 closid)
{
struct resctrl_staged_config *cfg;
struct rdt_domain *d;

list_for_each_entry(d, &r->domains, list) {
+ if (is_mba_sc(r)) {
+ d->mbps_val[closid] = MBA_MAX_MBPS;
+ continue;
+ }
+
cfg = &d->staged_config[CDP_NONE];
- cfg->new_ctrl = is_mba_sc(r) ? MBA_MAX_MBPS : r->default_ctrl;
+ cfg->new_ctrl = r->default_ctrl;
cfg->have_new_ctrl = true;
}
}
@@ -2840,7 +2853,9 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
list_for_each_entry(s, &resctrl_schema_all, list) {
r = s->res;
if (r->rid == RDT_RESOURCE_MBA) {
- rdtgroup_init_mba(r);
+ rdtgroup_init_mba(r, rdtgrp->closid);
+ if (is_mba_sc(r))
+ continue;
} else {
ret = rdtgroup_init_cat(s, rdtgrp->closid);
if (ret < 0)

Subject: [tip: x86/cache] x86/resctrl: Rename and change the units of resctrl_cqm_threshold

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: ae2328b52962531c2d7c6b531022a3eb2d680f17
Gitweb: https://git.kernel.org/tip/ae2328b52962531c2d7c6b531022a3eb2d680f17
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:27
Committer: Borislav Petkov <[email protected]>
CommitterDate: Fri, 23 Sep 2022 14:23:41 +02:00

x86/resctrl: Rename and change the units of resctrl_cqm_threshold

resctrl_cqm_threshold is stored in a hardware specific chunk size,
but exposed to user-space as bytes.

This means the filesystem parts of resctrl need to know how the hardware
counts, to convert the user provided byte value to chunks. The interface
between the architecture's resctrl code and the filesystem ought to
treat everything as bytes.

Change the unit of resctrl_cqm_threshold to bytes. resctrl_arch_rmid_read()
still returns its value in chunks, so this needs converting to bytes.
As all the users have been touched, rename the variable to
resctrl_rmid_realloc_threshold, which describes what the value is for.

Neither r->num_rmid nor hw_res->mon_scale are guaranteed to be a power
of 2, so the existing code introduces a rounding error from resctrl's
theoretical fraction of the cache usage. This behaviour is kept as it
ensures the user visible value matches the value read from hardware
when the rmid will be reallocated.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/resctrl.h | 9 +++++-
arch/x86/kernel/cpu/resctrl/internal.h | 1 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 43 +++++++++++++++----------
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 9 +----
include/linux/resctrl.h | 2 +-
5 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index d60ed06..d24b04e 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -81,6 +81,15 @@ static void __resctrl_sched_in(void)
}
}

+static inline unsigned int resctrl_arch_round_mon_val(unsigned int val)
+{
+ unsigned int scale = boot_cpu_data.x86_cache_occ_scale;
+
+ /* h/w works in units of "boot_cpu_data.x86_cache_occ_scale" */
+ val /= scale;
+ return val * scale;
+}
+
static inline void resctrl_sched_in(void)
{
if (static_branch_likely(&rdt_enable_key))
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index bdb55c2..c05e9b7 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -98,7 +98,6 @@ struct rmid_read {
u64 val;
};

-extern unsigned int resctrl_cqm_threshold;
extern bool rdt_alloc_capable;
extern bool rdt_mon_capable;
extern unsigned int rdt_mon_features;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 27bb494..e91afe9 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -17,7 +17,10 @@

#include <linux/module.h>
#include <linux/slab.h>
+
#include <asm/cpu_device_id.h>
+#include <asm/resctrl.h>
+
#include "internal.h"

struct rmid_entry {
@@ -37,8 +40,8 @@ static LIST_HEAD(rmid_free_lru);
* @rmid_limbo_count count of currently unused but (potentially)
* dirty RMIDs.
* This counts RMIDs that no one is currently using but that
- * may have a occupancy value > intel_cqm_threshold. User can change
- * the threshold occupancy value.
+ * may have a occupancy value > resctrl_rmid_realloc_threshold. User can
+ * change the threshold occupancy value.
*/
static unsigned int rmid_limbo_count;

@@ -59,10 +62,10 @@ bool rdt_mon_capable;
unsigned int rdt_mon_features;

/*
- * This is the threshold cache occupancy at which we will consider an
+ * This is the threshold cache occupancy in bytes at which we will consider an
* RMID available for re-allocation.
*/
-unsigned int resctrl_cqm_threshold;
+unsigned int resctrl_rmid_realloc_threshold;

#define CF(cf) ((unsigned long)(1048576 * (cf) + 0.5))

@@ -223,14 +226,13 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
*/
void __check_limbo(struct rdt_domain *d, bool force_free)
{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rmid_entry *entry;
- struct rdt_resource *r;
u32 crmid = 1, nrmid;
bool rmid_dirty;
u64 val = 0;

- r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
-
/*
* Skip RMID 0 and start from RMID 1 and check all the RMIDs that
* are marked as busy for occupancy < threshold. If the occupancy
@@ -245,10 +247,12 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
entry = __rmid_entry(nrmid);

if (resctrl_arch_rmid_read(r, d, entry->rmid,
- QOS_L3_OCCUP_EVENT_ID, &val))
+ QOS_L3_OCCUP_EVENT_ID, &val)) {
rmid_dirty = true;
- else
- rmid_dirty = (val >= resctrl_cqm_threshold);
+ } else {
+ val *= hw_res->mon_scale;
+ rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
+ }

if (force_free || !rmid_dirty) {
clear_bit(entry->rmid, d->rmid_busy_llc);
@@ -289,13 +293,12 @@ int alloc_rmid(void)

static void add_rmid_to_limbo(struct rmid_entry *entry)
{
- struct rdt_resource *r;
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rdt_domain *d;
int cpu, err;
u64 val = 0;

- r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
-
entry->busy = 0;
cpu = get_cpu();
list_for_each_entry(d, &r->domains, list) {
@@ -303,7 +306,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
err = resctrl_arch_rmid_read(r, d, entry->rmid,
QOS_L3_OCCUP_EVENT_ID,
&val);
- if (err || val <= resctrl_cqm_threshold)
+ val *= hw_res->mon_scale;
+ if (err || val <= resctrl_rmid_realloc_threshold)
continue;
}

@@ -744,6 +748,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
unsigned int cl_size = boot_cpu_data.x86_cache_size;
+ unsigned int threshold;
int ret;

hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
@@ -762,10 +767,14 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
*
* For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
*/
- resctrl_cqm_threshold = cl_size * 1024 / r->num_rmid;
+ threshold = cl_size * 1024 / r->num_rmid;

- /* h/w works in units of "boot_cpu_data.x86_cache_occ_scale" */
- resctrl_cqm_threshold /= hw_res->mon_scale;
+ /*
+ * Because num_rmid may not be a power of two, round the value
+ * to the nearest multiple of hw_res->mon_scale so it matches a
+ * value the hardware will measure. mon_scale may not be a power of 2.
+ */
+ resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(threshold);

ret = dom_data_init(r);
if (ret)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 6c33dfe..849bdec 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1030,10 +1030,7 @@ static int rdt_delay_linear_show(struct kernfs_open_file *of,
static int max_threshold_occ_show(struct kernfs_open_file *of,
struct seq_file *seq, void *v)
{
- struct rdt_resource *r = of->kn->parent->priv;
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
-
- seq_printf(seq, "%u\n", resctrl_cqm_threshold * hw_res->mon_scale);
+ seq_printf(seq, "%u\n", resctrl_rmid_realloc_threshold);

return 0;
}
@@ -1055,7 +1052,6 @@ static int rdt_thread_throttle_mode_show(struct kernfs_open_file *of,
static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
- struct rdt_hw_resource *hw_res;
unsigned int bytes;
int ret;

@@ -1066,8 +1062,7 @@ static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
if (bytes > (boot_cpu_data.x86_cache_size * 1024))
return -EINVAL;

- hw_res = resctrl_to_arch_res(of->kn->parent->priv);
- resctrl_cqm_threshold = bytes / hw_res->mon_scale;
+ resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(bytes);

return nbytes;
}
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 7ccfa0d..9995d04 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -250,4 +250,6 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
u32 rmid, enum resctrl_event_id eventid);

+extern unsigned int resctrl_rmid_realloc_threshold;
+
#endif /* _RESCTRL_H */

Subject: [tip: x86/cache] x86/resctrl: Add domain offline callback for resctrl work

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 798fd4b9ac37fec571f55fb8592497b0dd5f7a73
Gitweb: https://git.kernel.org/tip/798fd4b9ac37fec571f55fb8592497b0dd5f7a73
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:13
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 22 Sep 2022 15:42:40 +02:00

x86/resctrl: Add domain offline callback for resctrl work

Because domains are exposed to user-space via resctrl, the filesystem
must update its state when CPU hotplug callbacks are triggered.

Some of this work is common to any architecture that would support
resctrl, but the work is tied up with the architecture code to
free the memory.

Move the monitor subdir removal and the cancelling of the mbm/limbo
works into a new resctrl_offline_domain() call. These bits are not
specific to the architecture. Grouping them in one function allows
that code to be moved to /fs/ and re-used by another architecture.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/core.c | 26 +-------------
arch/x86/kernel/cpu/resctrl/internal.h | 2 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 45 ++++++++++++++++++++++---
include/linux/resctrl.h | 1 +-
4 files changed, 44 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index e37889f..f691829 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -523,27 +523,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)

cpumask_clear_cpu(cpu, &d->cpu_mask);
if (cpumask_empty(&d->cpu_mask)) {
- /*
- * If resctrl is mounted, remove all the
- * per domain monitor data directories.
- */
- if (static_branch_unlikely(&rdt_mon_enable_key))
- rmdir_mondata_subdir_allrdtgrp(r, d->id);
+ resctrl_offline_domain(r, d);
list_del(&d->list);
- if (r->mon_capable && is_mbm_enabled())
- cancel_delayed_work(&d->mbm_over);
- if (is_llc_occupancy_enabled() && has_busy_rmid(r, d)) {
- /*
- * When a package is going down, forcefully
- * decrement rmid->ebusy. There is no way to know
- * that the L3 was flushed and hence may lead to
- * incorrect counts in rare scenarios, but leaving
- * the RMID as busy creates RMID leaks if the
- * package never comes back.
- */
- __check_limbo(d, true);
- cancel_delayed_work(&d->cqm_limbo);
- }

/*
* rdt_domain "d" is going to be freed below, so clear
@@ -551,11 +532,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
*/
if (d->plr)
d->plr->d = NULL;
-
- bitmap_free(d->rmid_busy_llc);
- kfree(d->mbm_total);
- kfree(d->mbm_local);
domain_free(hw_dom);
+
return;
}

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index be48a68..e12b55f 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -522,8 +522,6 @@ void free_rmid(u32 rmid);
int rdt_get_mon_l3_config(struct rdt_resource *r);
void mon_event_count(void *info);
int rdtgroup_mondata_show(struct seq_file *m, void *arg);
-void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
- unsigned int dom_id);
void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
struct rdt_domain *d, struct rdtgroup *rdtgrp,
int evtid, int first);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 030a703..5830905 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2499,14 +2499,12 @@ static int mon_addfile(struct kernfs_node *parent_kn, const char *name,
* Remove all subdirectories of mon_data of ctrl_mon groups
* and monitor groups with given domain id.
*/
-void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id)
+static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
+ unsigned int dom_id)
{
struct rdtgroup *prgrp, *crgrp;
char name[32];

- if (!r->mon_capable)
- return;
-
list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
sprintf(name, "mon_%s_%02d", r->name, dom_id);
kernfs_remove_by_name(prgrp->mon.mon_data_kn, name);
@@ -3233,6 +3231,45 @@ out:
return ret;
}

+static void domain_destroy_mon_state(struct rdt_domain *d)
+{
+ bitmap_free(d->rmid_busy_llc);
+ kfree(d->mbm_total);
+ kfree(d->mbm_local);
+}
+
+void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ if (!r->mon_capable)
+ return;
+
+ /*
+ * If resctrl is mounted, remove all the
+ * per domain monitor data directories.
+ */
+ if (static_branch_unlikely(&rdt_mon_enable_key))
+ rmdir_mondata_subdir_allrdtgrp(r, d->id);
+
+ if (is_mbm_enabled())
+ cancel_delayed_work(&d->mbm_over);
+ if (is_llc_occupancy_enabled() && has_busy_rmid(r, d)) {
+ /*
+ * When a package is going down, forcefully
+ * decrement rmid->ebusy. There is no way to know
+ * that the L3 was flushed and hence may lead to
+ * incorrect counts in rare scenarios, but leaving
+ * the RMID as busy creates RMID leaks if the
+ * package never comes back.
+ */
+ __check_limbo(d, true);
+ cancel_delayed_work(&d->cqm_limbo);
+ }
+
+ domain_destroy_mon_state(d);
+}
+
static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
{
size_t tsize;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index d512455..5d283bd 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -193,5 +193,6 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);

#endif /* _RESCTRL_H */

Subject: [tip: x86/cache] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: f7b1843eca6fe295ba0c71fc02a3291954078f2b
Gitweb: https://git.kernel.org/tip/f7b1843eca6fe295ba0c71fc02a3291954078f2b
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:29
Committer: Borislav Petkov <[email protected]>
CommitterDate: Fri, 23 Sep 2022 14:25:05 +02:00

x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes

resctrl_arch_rmid_read() returns a value in chunks, as read from the
hardware. This needs scaling to bytes by mon_scale, as provided by
the architecture code.

Now that resctrl_arch_rmid_read() performs the overflow and corrections
itself, it may as well return a value in bytes directly. This allows
the accesses to the architecture specific 'hw' structure to be removed.

Move the mon_scale conversion into resctrl_arch_rmid_read().
mbm_bw_count() is updated to calculate bandwidth from bytes.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 6 ++----
arch/x86/kernel/cpu/resctrl/internal.h | 4 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 24 ++++++++++------------
include/linux/resctrl.h | 2 +-
4 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 42a1abb..1dafbdc 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -549,7 +549,6 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
int rdtgroup_mondata_show(struct seq_file *m, void *arg)
{
struct kernfs_open_file *of = m->private;
- struct rdt_hw_resource *hw_res;
u32 resid, evtid, domid;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
@@ -569,8 +568,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
domid = md.u.domid;
evtid = md.u.evtid;

- hw_res = &rdt_resources_all[resid];
- r = &hw_res->r_resctrl;
+ r = &rdt_resources_all[resid].r_resctrl;
d = rdt_find_domain(r, domid, NULL);
if (IS_ERR_OR_NULL(d)) {
ret = -ENOENT;
@@ -584,7 +582,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
else if (rr.err == -EINVAL)
seq_puts(m, "Unavailable\n");
else
- seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
+ seq_printf(m, "%llu\n", rr.val);

out:
rdtgroup_kn_unlock(of->kn);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index c05e9b7..5f71286 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -279,13 +279,13 @@ struct rftype {

/**
* struct mbm_state - status for each MBM counter in each domain
- * @prev_bw_chunks: Previous chunks value read for bandwidth calculation
+ * @prev_bw_bytes: Previous bytes value read for bandwidth calculation
* @prev_bw: The most recent bandwidth in MBps
* @delta_bw: Difference between the current and previous bandwidth
* @delta_comp: Indicates whether to compute the delta_bw
*/
struct mbm_state {
- u64 prev_bw_chunks;
+ u64 prev_bw_bytes;
u32 prev_bw;
u32 delta_bw;
bool delta_comp;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 8d15568..efe0c30 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -16,6 +16,7 @@
*/

#include <linux/module.h>
+#include <linux/sizes.h>
#include <linux/slab.h>

#include <asm/cpu_device_id.h>
@@ -189,7 +190,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct arch_mbm_state *am;
- u64 msr_val;
+ u64 msr_val, chunks;

if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
return -EINVAL;
@@ -214,12 +215,14 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
if (am) {
am->chunks += mbm_overflow_count(am->prev_msr, msr_val,
hw_res->mbm_width);
- *val = get_corrected_mbm_count(rmid, am->chunks);
+ chunks = get_corrected_mbm_count(rmid, am->chunks);
am->prev_msr = msr_val;
} else {
- *val = msr_val;
+ chunks = msr_val;
}

+ *val = chunks * hw_res->mon_scale;
+
return 0;
}

@@ -232,7 +235,6 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
void __check_limbo(struct rdt_domain *d, bool force_free)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rmid_entry *entry;
u32 crmid = 1, nrmid;
bool rmid_dirty;
@@ -255,7 +257,6 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
QOS_L3_OCCUP_EVENT_ID, &val)) {
rmid_dirty = true;
} else {
- val *= hw_res->mon_scale;
rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
}

@@ -299,7 +300,6 @@ int alloc_rmid(void)
static void add_rmid_to_limbo(struct rmid_entry *entry)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rdt_domain *d;
int cpu, err;
u64 val = 0;
@@ -311,7 +311,6 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
err = resctrl_arch_rmid_read(r, d, entry->rmid,
QOS_L3_OCCUP_EVENT_ID,
&val);
- val *= hw_res->mon_scale;
if (err || val <= resctrl_rmid_realloc_threshold)
continue;
}
@@ -403,15 +402,14 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
*/
static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
{
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
struct mbm_state *m = &rr->d->mbm_local[rmid];
- u64 cur_bw, chunks, cur_chunks;
+ u64 cur_bw, bytes, cur_bytes;

- cur_chunks = rr->val;
- chunks = cur_chunks - m->prev_bw_chunks;
- m->prev_bw_chunks = cur_chunks;
+ cur_bytes = rr->val;
+ bytes = cur_bytes - m->prev_bw_bytes;
+ m->prev_bw_bytes = cur_bytes;

- cur_bw = (chunks * hw_res->mon_scale) >> 20;
+ cur_bw = bytes / SZ_1M;

if (m->delta_comp)
m->delta_bw = abs(cur_bw - m->prev_bw);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index cb857f7..0cf5b20 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -227,7 +227,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
* @d: domain that the counter should be read from.
* @rmid: rmid of the counter to read.
* @eventid: eventid to read, e.g. L3 occupancy.
- * @val: result of the counter read in chunks.
+ * @val: result of the counter read in bytes.
*
* Call from process context on a CPU that belongs to domain @d.
*

Subject: [tip: x86/cache] x86/resctrl: Remove set_mba_sc()s control array re-initialisation

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 1644dfe727cb042ef7f2e773015747954fd0e746
Gitweb: https://git.kernel.org/tip/1644dfe727cb042ef7f2e773015747954fd0e746
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:14
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 22 Sep 2022 16:08:20 +02:00

x86/resctrl: Remove set_mba_sc()s control array re-initialisation

set_mba_sc() enables the 'software controller' to regulate the bandwidth
based on the byte counters. This can be managed entirely in the parts
of resctrl that move to /fs/, without any extra support from the
architecture specific code. set_mba_sc() is called by rdt_enable_ctx()
during mount and unmount. It currently resets the arch code's ctrl_val[]
and mbps_val[] arrays.

The ctrl_val[] was already reset when the domain was created, and by
reset_all_ctrls() when the filesystem was last unmounted. Doing the work
in set_mba_sc() is not necessary as the values are already at their
defaults due to the creation of the domain, or were previously reset
during umount(), or are about to reset during umount().

Add a reset of the mbps_val[] in reset_all_ctrls(), allowing the code in
set_mba_sc() that reaches in to the architecture specific structures to
be removed.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 5830905..b32ceff 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1898,18 +1898,12 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
static int set_mba_sc(bool mba_sc)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
- struct rdt_hw_domain *hw_dom;
- struct rdt_domain *d;

if (!is_mbm_enabled() || !is_mba_linear() ||
mba_sc == is_mba_sc(r))
return -EINVAL;

r->membw.mba_sc = mba_sc;
- list_for_each_entry(d, &r->domains, list) {
- hw_dom = resctrl_to_arch_dom(d);
- setup_default_ctrlval(r, hw_dom->ctrl_val, hw_dom->mbps_val);
- }

return 0;
}
@@ -2327,8 +2321,10 @@ static int reset_all_ctrls(struct rdt_resource *r)
hw_dom = resctrl_to_arch_dom(d);
cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);

- for (i = 0; i < hw_res->num_closid; i++)
+ for (i = 0; i < hw_res->num_closid; i++) {
hw_dom->ctrl_val[i] = r->default_ctrl;
+ hw_dom->mbps_val[i] = MBA_MAX_MBPS;
+ }
}
cpu = get_cpu();
/* Update CBM on this cpu if it's in cpu_mask. */

Subject: [tip: x86/cache] x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 1d81d15db39c2b517bc58f63008c6255dd08aafe
Gitweb: https://git.kernel.org/tip/1d81d15db39c2b517bc58f63008c6255dd08aafe
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:25
Committer: Borislav Petkov <[email protected]>
CommitterDate: Fri, 23 Sep 2022 14:22:20 +02:00

x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()

resctrl_arch_rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a counter. Currently the function returns
the MBM values in chunks directly from hardware. When reading a bandwidth
counter, mbm_overflow_count() must be used to correct for any possible
overflow.

mbm_overflow_count() is architecture specific, its behaviour should
be part of resctrl_arch_rmid_read().

Move the mbm_overflow_count() calls into resctrl_arch_rmid_read().
This allows the resctrl filesystems's prev_msr to be removed in
favour of the architecture private version.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/internal.h | 2 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 35 ++++++++++++++-----------
2 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 1d2e7bd..8039e8a 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -281,7 +281,6 @@ struct rftype {
/**
* struct mbm_state - status for each MBM counter in each domain
* @chunks: Total data moved (multiply by rdt_group.mon_scale to get bytes)
- * @prev_msr: Value of IA32_QM_CTR for this RMID last time we read it
* @prev_bw_chunks: Previous chunks value read for bandwidth calculation
* @prev_bw: The most recent bandwidth in MBps
* @delta_bw: Difference between the current and previous bandwidth
@@ -289,7 +288,6 @@ struct rftype {
*/
struct mbm_state {
u64 chunks;
- u64 prev_msr;
u64 prev_bw_chunks;
u32 prev_bw;
u32 delta_bw;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 262141b..862a446 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -167,9 +167,20 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
memset(am, 0, sizeof(*am));
}

+static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
+{
+ u64 shift = 64 - width, chunks;
+
+ chunks = (cur_msr << shift) - (prev_msr << shift);
+ return chunks >> shift;
+}
+
int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
u32 rmid, enum resctrl_event_id eventid, u64 *val)
{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct arch_mbm_state *am;
u64 msr_val;

if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
@@ -191,7 +202,13 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
if (msr_val & RMID_VAL_UNAVAIL)
return -EINVAL;

- *val = msr_val;
+ am = get_arch_mbm_state(hw_dom, rmid, eventid);
+ if (am) {
+ *val = mbm_overflow_count(am->prev_msr, msr_val, hw_res->mbm_width);
+ am->prev_msr = msr_val;
+ } else {
+ *val = msr_val;
+ }

return 0;
}
@@ -322,19 +339,10 @@ void free_rmid(u32 rmid)
list_add_tail(&entry->list, &rmid_free_lru);
}

-static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
-{
- u64 shift = 64 - width, chunks;
-
- chunks = (cur_msr << shift) - (prev_msr << shift);
- return chunks >> shift;
-}
-
static int __mon_event_count(u32 rmid, struct rmid_read *rr)
{
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
struct mbm_state *m;
- u64 chunks, tval = 0;
+ u64 tval = 0;

if (rr->first)
resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
@@ -363,13 +371,10 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)

if (rr->first) {
memset(m, 0, sizeof(struct mbm_state));
- m->prev_msr = tval;
return 0;
}

- chunks = mbm_overflow_count(m->prev_msr, tval, hw_res->mbm_width);
- m->chunks += chunks;
- m->prev_msr = tval;
+ m->chunks += tval;

rr->val += get_corrected_mbm_count(rmid, m->chunks);

Subject: [tip: x86/cache] x86/resctrl: Merge mon_capable and mon_enabled

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: bab6ee736873becc0216ba5fd159394e272d01b2
Gitweb: https://git.kernel.org/tip/bab6ee736873becc0216ba5fd159394e272d01b2
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:10
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 22 Sep 2022 14:43:08 +02:00

x86/resctrl: Merge mon_capable and mon_enabled

mon_enabled and mon_capable are always set as a pair by
rdt_get_mon_l3_config().

There is no point having two values.

Merge them together.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/internal.h | 4 ----
arch/x86/kernel/cpu/resctrl/monitor.c | 1 -
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++----
include/linux/resctrl.h | 2 --
4 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 53f3d27..8828b5c 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -459,10 +459,6 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
for_each_rdt_resource(r) \
if (r->mon_capable)

-#define for_each_mon_enabled_rdt_resource(r) \
- for_each_rdt_resource(r) \
- if (r->mon_enabled)
-
/* CPUID.(EAX=10H, ECX=ResID=1).EAX */
union cpuid_0x10_1_eax {
struct {
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index eaf25a2..497cadf 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -717,7 +717,6 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
l3_mon_evt_init(r);

r->mon_capable = true;
- r->mon_enabled = true;

return 0;
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 526eb93..def7c66 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1765,7 +1765,7 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
goto out_destroy;
}

- for_each_mon_enabled_rdt_resource(r) {
+ for_each_mon_capable_rdt_resource(r) {
fflags = r->fflags | RF_MON_INFO;
sprintf(name, "%s_MON", r->name);
ret = rdtgroup_mkdir_info_resdir(r, name, fflags);
@@ -2504,7 +2504,7 @@ void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id)
struct rdtgroup *prgrp, *crgrp;
char name[32];

- if (!r->mon_enabled)
+ if (!r->mon_capable)
return;

list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
@@ -2572,7 +2572,7 @@ void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
struct rdtgroup *prgrp, *crgrp;
struct list_head *head;

- if (!r->mon_enabled)
+ if (!r->mon_capable)
return;

list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
@@ -2642,7 +2642,7 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
* Create the subdirectories for each domain. Note that all events
* in a domain like L3 are grouped into a resource whose domain is L3
*/
- for_each_mon_enabled_rdt_resource(r) {
+ for_each_mon_capable_rdt_resource(r) {
ret = mkdir_mondata_subdir_alldom(kn, r, prgrp);
if (ret)
goto out_destroy;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 386ab3a..8180c53 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -130,7 +130,6 @@ struct resctrl_schema;
/**
* struct rdt_resource - attributes of a resctrl resource
* @rid: The index of the resource
- * @mon_enabled: Is monitoring enabled for this feature
* @alloc_capable: Is allocation available on this machine
* @mon_capable: Is monitor feature available on this machine
* @num_rmid: Number of RMIDs available
@@ -149,7 +148,6 @@ struct resctrl_schema;
*/
struct rdt_resource {
int rid;
- bool mon_enabled;
bool alloc_capable;
bool mon_capable;
int num_rmid;

Subject: [tip: x86/cache] x86/resctrl: Add domain online callback for resctrl work

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 3a7232cdf19e39e7f24c493117b373788b348af2
Gitweb: https://git.kernel.org/tip/3a7232cdf19e39e7f24c493117b373788b348af2
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:11
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 22 Sep 2022 15:13:27 +02:00

x86/resctrl: Add domain online callback for resctrl work

Because domains are exposed to user-space via resctrl, the filesystem
must update its state when CPU hotplug callbacks are triggered.

Some of this work is common to any architecture that would support
resctrl, but the work is tied up with the architecture code to
allocate the memory.

Move domain_setup_mon_state(), the monitor subdir creation call and the
mbm/limbo workers into a new resctrl_online_domain() call. These bits
are not specific to the architecture. Grouping them in one function
allows that code to be moved to /fs/ and re-used by another architecture.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/core.c | 53 ++------------------
arch/x86/kernel/cpu/resctrl/internal.h | 2 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 65 +++++++++++++++++++++++--
include/linux/resctrl.h | 1 +-
4 files changed, 67 insertions(+), 54 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 2f87177..25f3014 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -443,42 +443,6 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

-static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
-{
- size_t tsize;
-
- if (is_llc_occupancy_enabled()) {
- d->rmid_busy_llc = bitmap_zalloc(r->num_rmid, GFP_KERNEL);
- if (!d->rmid_busy_llc)
- return -ENOMEM;
- INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
- }
- if (is_mbm_total_enabled()) {
- tsize = sizeof(*d->mbm_total);
- d->mbm_total = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
- if (!d->mbm_total) {
- bitmap_free(d->rmid_busy_llc);
- return -ENOMEM;
- }
- }
- if (is_mbm_local_enabled()) {
- tsize = sizeof(*d->mbm_local);
- d->mbm_local = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
- if (!d->mbm_local) {
- bitmap_free(d->rmid_busy_llc);
- kfree(d->mbm_total);
- return -ENOMEM;
- }
- }
-
- if (is_mbm_enabled()) {
- INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
- mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
- }
-
- return 0;
-}
-
/*
* domain_add_cpu - Add a cpu to a resource's domain list.
*
@@ -498,6 +462,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
struct list_head *add_pos = NULL;
struct rdt_hw_domain *hw_dom;
struct rdt_domain *d;
+ int err;

d = rdt_find_domain(r, id, &add_pos);
if (IS_ERR(d)) {
@@ -527,21 +492,15 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
return;
}

- if (r->mon_capable && domain_setup_mon_state(r, d)) {
+ list_add_tail(&d->list, add_pos);
+
+ err = resctrl_online_domain(r, d);
+ if (err) {
+ list_del(&d->list);
kfree(hw_dom->ctrl_val);
kfree(hw_dom->mbps_val);
kfree(hw_dom);
- return;
}
-
- list_add_tail(&d->list, add_pos);
-
- /*
- * If resctrl is mounted, add
- * per domain monitor data directories.
- */
- if (static_branch_unlikely(&rdt_mon_enable_key))
- mkdir_mondata_subdir_allrdtgrp(r, d);
}

static void domain_remove_cpu(int cpu, struct rdt_resource *r)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 8828b5c..be48a68 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -524,8 +524,6 @@ void mon_event_count(void *info);
int rdtgroup_mondata_show(struct seq_file *m, void *arg);
void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
unsigned int dom_id);
-void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
- struct rdt_domain *d);
void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
struct rdt_domain *d, struct rdtgroup *rdtgrp,
int evtid, int first);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index def7c66..030a703 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2565,16 +2565,13 @@ out_destroy:
* Add all subdirectories of mon_data for "ctrl_mon" groups
* and "monitor" groups with given domain id.
*/
-void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
- struct rdt_domain *d)
+static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
+ struct rdt_domain *d)
{
struct kernfs_node *parent_kn;
struct rdtgroup *prgrp, *crgrp;
struct list_head *head;

- if (!r->mon_capable)
- return;
-
list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
parent_kn = prgrp->mon.mon_data_kn;
mkdir_mondata_subdir(parent_kn, d, r, prgrp);
@@ -3236,6 +3233,64 @@ out:
return ret;
}

+static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
+{
+ size_t tsize;
+
+ if (is_llc_occupancy_enabled()) {
+ d->rmid_busy_llc = bitmap_zalloc(r->num_rmid, GFP_KERNEL);
+ if (!d->rmid_busy_llc)
+ return -ENOMEM;
+ }
+ if (is_mbm_total_enabled()) {
+ tsize = sizeof(*d->mbm_total);
+ d->mbm_total = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
+ if (!d->mbm_total) {
+ bitmap_free(d->rmid_busy_llc);
+ return -ENOMEM;
+ }
+ }
+ if (is_mbm_local_enabled()) {
+ tsize = sizeof(*d->mbm_local);
+ d->mbm_local = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
+ if (!d->mbm_local) {
+ bitmap_free(d->rmid_busy_llc);
+ kfree(d->mbm_total);
+ return -ENOMEM;
+ }
+ }
+
+ return 0;
+}
+
+int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+ int err;
+
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ if (!r->mon_capable)
+ return 0;
+
+ err = domain_setup_mon_state(r, d);
+ if (err)
+ return err;
+
+ if (is_mbm_enabled()) {
+ INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
+ mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
+ }
+
+ if (is_llc_occupancy_enabled())
+ INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
+
+ /* If resctrl is mounted, add per domain monitor data directories. */
+ if (static_branch_unlikely(&rdt_mon_enable_key))
+ mkdir_mondata_subdir_allrdtgrp(r, d);
+
+ return 0;
+}
+
/*
* rdtgroup_init - rdtgroup initialization
*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 8180c53..d512455 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -192,5 +192,6 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
+int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);

#endif /* _RESCTRL_H */

Subject: [tip: x86/cache] x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read()

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 8286618aca331bf17323ff3023ca831ac6e4b86f
Gitweb: https://git.kernel.org/tip/8286618aca331bf17323ff3023ca831ac6e4b86f
Author: James Morse <[email protected]>
AuthorDate: Fri, 02 Sep 2022 15:48:24
Committer: Borislav Petkov <[email protected]>
CommitterDate: Fri, 23 Sep 2022 14:21:25 +02:00

x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read()

resctrl_arch_rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a hardware register. Currently the function
returns the MBM values in chunks directly from hardware.

To convert this to bytes, some correction and overflow calculations
are needed. These depend on the resource and domain structures.
Overflow detection requires the old chunks value. None of this
is available to resctrl_arch_rmid_read(). MPAM requires the
resource and domain structures to find the MMIO device that holds
the registers.

Pass the resource and domain to resctrl_arch_rmid_read(). This makes
rmid_dirty() too big. Instead merge it with its only caller, and the
name is kept as a local variable.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Jamie Iles <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Xin Hao <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/monitor.c | 31 ++++++++++++++------------
include/linux/resctrl.h | 18 ++++++++++++++-
2 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 51ab76f..262141b 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -167,10 +167,14 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
memset(am, 0, sizeof(*am));
}

-int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
+ u32 rmid, enum resctrl_event_id eventid, u64 *val)
{
u64 msr_val;

+ if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
+ return -EINVAL;
+
/*
* As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
* with a valid event code for supported resource type and the bits
@@ -192,16 +196,6 @@ int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
return 0;
}

-static bool rmid_dirty(struct rmid_entry *entry)
-{
- u64 val = 0;
-
- if (resctrl_arch_rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID, &val))
- return true;
-
- return val >= resctrl_cqm_threshold;
-}
-
/*
* Check the RMIDs that are marked as busy for this domain. If the
* reported LLC occupancy is below the threshold clear the busy bit and
@@ -213,6 +207,8 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
struct rmid_entry *entry;
struct rdt_resource *r;
u32 crmid = 1, nrmid;
+ bool rmid_dirty;
+ u64 val = 0;

r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;

@@ -228,7 +224,14 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
break;

entry = __rmid_entry(nrmid);
- if (force_free || !rmid_dirty(entry)) {
+
+ if (resctrl_arch_rmid_read(r, d, entry->rmid,
+ QOS_L3_OCCUP_EVENT_ID, &val))
+ rmid_dirty = true;
+ else
+ rmid_dirty = (val >= resctrl_cqm_threshold);
+
+ if (force_free || !rmid_dirty) {
clear_bit(entry->rmid, d->rmid_busy_llc);
if (!--entry->busy) {
rmid_limbo_count--;
@@ -278,7 +281,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
cpu = get_cpu();
list_for_each_entry(d, &r->domains, list) {
if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
- err = resctrl_arch_rmid_read(entry->rmid,
+ err = resctrl_arch_rmid_read(r, d, entry->rmid,
QOS_L3_OCCUP_EVENT_ID,
&val);
if (err || val <= resctrl_cqm_threshold)
@@ -336,7 +339,7 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
if (rr->first)
resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);

- rr->err = resctrl_arch_rmid_read(rmid, rr->evtid, &tval);
+ rr->err = resctrl_arch_rmid_read(rr->r, rr->d, rmid, rr->evtid, &tval);
if (rr->err)
return rr->err;

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index efe60dd..7ccfa0d 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -219,7 +219,23 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
-int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *res);
+
+/**
+ * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
+ * for this resource and domain.
+ * @r: resource that the counter should be read from.
+ * @d: domain that the counter should be read from.
+ * @rmid: rmid of the counter to read.
+ * @eventid: eventid to read, e.g. L3 occupancy.
+ * @val: result of the counter read in chunks.
+ *
+ * Call from process context on a CPU that belongs to domain @d.
+ *
+ * Return:
+ * 0 on success, or -EIO, -EINVAL etc on error.
+ */
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
+ u32 rmid, enum resctrl_event_id eventid, u64 *val);

/**
* resctrl_arch_reset_rmid() - Reset any private state associated with rmid