2022-06-22 17:25:01

by James Morse

[permalink] [raw]
Subject: [PATCH v5 00/21] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes

Changes in this version?
* Use supports_mba_mbps() in resctrl_{on,off}line_domain()
* Remove some error handling for errors that can't happen.
* Restored mbps_val[] reset code to set_mba_sc().
* Moved mbps_val[] reset in rdtgroup_init_mba() to reduce noise.
* Added resctrl_arch_round_mon_val() to fix the user provided value.

---
The aim of this series is to insert a split between the parts of the monitor
code that the architecture must implement, and those that are part of the
resctrl filesystem. The eventual aim is to move all filesystem parts out
to live in /fs/resctrl, so that resctrl can be wired up for MPAM.

What's MPAM? See the cover letter of a previous series. [1]

The series adds domain online/offline callbacks to allow the filesystem to
manage some of its structures itself, then moves all the 'mba_sc' behaviour
to be part of the filesystem.
This means another architecture doesn't need to provide an mbps_val array.
As its all software, the resctrl filesystem should be able to do this without
any help from the architecture code.

Finally __rmid_read() is refactored to be the API call that the architecture
provides to read a counter value. All the hardware specific overflow detection,
scaling and value correction should occur behind this helper.


This series is based on v5.19-rc1, and can be retrieved from:
git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/resctrl_monitors_in_bytes/v5

[0] git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/resctrl_merge_cdp/v7
[1] https://lore.kernel.org/lkml/[email protected]/

[v1] https://lore.kernel.org/lkml/[email protected]/
[v2] https://lore.kernel.org/lkml/[email protected]/
[v3] https://lore.kernel.org/lkml/[email protected]/
[v4] https://lore.kernel.org/lkml/[email protected]/

James Morse (21):
x86/resctrl: Kill off alloc_enabled
x86/resctrl: Merge mon_capable and mon_enabled
x86/resctrl: Add domain online callback for resctrl work
x86/resctrl: Group struct rdt_hw_domain cleanup
x86/resctrl: Add domain offline callback for resctrl work
x86/resctrl: Remove set_mba_sc()s control array re-initialisation
x86/resctrl: Abstract and use supports_mba_mbps()
x86/resctrl: Create mba_sc configuration in the rdt_domain
x86/resctrl: Switch over to the resctrl mbps_val list
x86/resctrl: Remove architecture copy of mbps_val
x86/resctrl: Allow update_mba_bw() to update controls directly
x86/resctrl: Calculate bandwidth from the previous __mon_event_count()
chunks
x86/resctrl: Add per-rmid arch private storage for overflow and chunks
x86/resctrl: Allow per-rmid arch private storage to be reset
x86/resctrl: Abstract __rmid_read()
x86/resctrl: Pass the required parameters into
resctrl_arch_rmid_read()
x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()
x86/resctrl: Move get_corrected_mbm_count() into
resctrl_arch_rmid_read()
x86/resctrl: Rename and change the units of resctrl_cqm_threshold
x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's
boot_cpu_data
x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes

arch/x86/include/asm/resctrl.h | 9 +
arch/x86/kernel/cpu/resctrl/core.c | 117 ++++-------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 75 ++++---
arch/x86/kernel/cpu/resctrl/internal.h | 61 +++---
arch/x86/kernel/cpu/resctrl/monitor.c | 232 ++++++++++++++--------
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 216 ++++++++++++++++----
include/linux/resctrl.h | 64 +++++-
8 files changed, 514 insertions(+), 262 deletions(-)

--
2.30.2


2022-06-22 17:26:04

by James Morse

[permalink] [raw]
Subject: [PATCH v5 14/21] x86/resctrl: Allow per-rmid arch private storage to be reset

To abstract the rmid counters into a helper that returns the number
of bytes counted, architecture specific per-rmid state is needed.

It needs to be possible to reset this hidden state, as the values
may outlive the life of an rmid, or the mount time of the filesystem.

mon_event_read() is called with first = true when an rmid is first
allocated in mkdir_mondata_subdir(). Add resctrl_arch_reset_rmid()
and call it from __mon_event_count()'s rr->first check.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Signed-off-by: James Morse <[email protected]>

---
Changes since v1:
* Aded WARN_ON_ONCE() for a case that should never happen.
---
arch/x86/kernel/cpu/resctrl/internal.h | 18 ++++---------
arch/x86/kernel/cpu/resctrl/monitor.c | 35 +++++++++++++++++++++++++-
include/linux/resctrl.h | 23 +++++++++++++++++
3 files changed, 62 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 4de8e5bb93e1..b34a1403f033 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -22,14 +22,6 @@

#define L2_QOS_CDP_ENABLE 0x01ULL

-/*
- * Event IDs are used to program IA32_QM_EVTSEL before reading event
- * counter from IA32_QM_CTR
- */
-#define QOS_L3_OCCUP_EVENT_ID 0x01
-#define QOS_L3_MBM_TOTAL_EVENT_ID 0x02
-#define QOS_L3_MBM_LOCAL_EVENT_ID 0x03
-
#define CQM_LIMBOCHECK_INTERVAL 1000

#define MBM_CNTR_WIDTH_BASE 24
@@ -73,7 +65,7 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
* @list: entry in &rdt_resource->evt_list
*/
struct mon_evt {
- u32 evtid;
+ enum resctrl_event_id evtid;
char *name;
struct list_head list;
};
@@ -90,9 +82,9 @@ struct mon_evt {
union mon_data_bits {
void *priv;
struct {
- unsigned int rid : 10;
- unsigned int evtid : 8;
- unsigned int domid : 14;
+ unsigned int rid : 10;
+ enum resctrl_event_id evtid : 8;
+ unsigned int domid : 14;
} u;
};

@@ -100,7 +92,7 @@ struct rmid_read {
struct rdtgroup *rgrp;
struct rdt_resource *r;
struct rdt_domain *d;
- int evtid;
+ enum resctrl_event_id evtid;
bool first;
u64 val;
};
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 2d81b6cd9632..e9755143492b 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -137,7 +137,37 @@ static inline struct rmid_entry *__rmid_entry(u32 rmid)
return entry;
}

-static u64 __rmid_read(u32 rmid, u32 eventid)
+static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain *hw_dom,
+ u32 rmid,
+ enum resctrl_event_id eventid)
+{
+ switch (eventid) {
+ case QOS_L3_OCCUP_EVENT_ID:
+ return NULL;
+ case QOS_L3_MBM_TOTAL_EVENT_ID:
+ return &hw_dom->arch_mbm_total[rmid];
+ case QOS_L3_MBM_LOCAL_EVENT_ID:
+ return &hw_dom->arch_mbm_local[rmid];
+ }
+
+ /* Never expect to get here */
+ WARN_ON_ONCE(1);
+
+ return NULL;
+}
+
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+ u32 rmid, enum resctrl_event_id eventid)
+{
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ struct arch_mbm_state *am;
+
+ am = get_arch_mbm_state(hw_dom, rmid, eventid);
+ if (am)
+ memset(am, 0, sizeof(*am));
+}
+
+static u64 __rmid_read(u32 rmid, enum resctrl_event_id eventid)
{
u64 val;

@@ -291,6 +321,9 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
struct mbm_state *m;
u64 chunks, tval;

+ if (rr->first)
+ resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
+
tval = __rmid_read(rmid, rr->evtid);
if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
return tval;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index f4c9101df461..818456770176 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -32,6 +32,16 @@ enum resctrl_conf_type {

#define CDP_NUM_TYPES (CDP_DATA + 1)

+/*
+ * Event IDs, the values match those used to program IA32_QM_EVTSEL before
+ * reading IA32_QM_CTR on RDT systems.
+ */
+enum resctrl_event_id {
+ QOS_L3_OCCUP_EVENT_ID = 0x01,
+ QOS_L3_MBM_TOTAL_EVENT_ID = 0x02,
+ QOS_L3_MBM_LOCAL_EVENT_ID = 0x03,
+};
+
/**
* struct resctrl_staged_config - parsed configuration to be applied
* @new_ctrl: new ctrl value to be loaded
@@ -210,4 +220,17 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);

+/**
+ * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
+ * and eventid.
+ * @r: The domain's resource.
+ * @d: The rmid's domain.
+ * @rmid: The rmid whose counter values should be reset.
+ * @eventid: The eventid whose counter values should be reset.
+ *
+ * This can be called from any CPU.
+ */
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+ u32 rmid, enum resctrl_event_id eventid);
+
#endif /* _RESCTRL_H */
--
2.30.2

2022-06-22 17:28:35

by James Morse

[permalink] [raw]
Subject: [PATCH v5 16/21] x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read()

resctrl_arch_rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a hardware register. Currently the function
returns the MBM values in chunks directly from hardware.

To convert this to bytes, some correction and overflow calculations
are needed. These depend on the resource and domain structures.
Overflow detection requires the old chunks value. None of this
is available to resctrl_arch_rmid_read(). MPAM requires the
resource and domain structures to find the MMIO device that holds
the registers.

Pass the resource and domain to resctrl_arch_rmid_read(). This makes
rmid_dirty() too big. Instead merge it with its only caller, and the
name is kept as a local variable.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v3:
* Added comment about where resctrl_arch_rmid_read() can be called from.

Changes since v2:
* Typos.
* Kerneldoc fixes.

This is all a little noisy for __mon_event_count(), as the switch
statement work is now before the resctrl_arch_rmid_read() call.
---
arch/x86/kernel/cpu/resctrl/monitor.c | 31 +++++++++++++++------------
include/linux/resctrl.h | 18 +++++++++++++++-
2 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 51ab76f2dfbc..262141bf4264 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -167,10 +167,14 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
memset(am, 0, sizeof(*am));
}

-int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
+ u32 rmid, enum resctrl_event_id eventid, u64 *val)
{
u64 msr_val;

+ if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
+ return -EINVAL;
+
/*
* As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
* with a valid event code for supported resource type and the bits
@@ -192,16 +196,6 @@ int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
return 0;
}

-static bool rmid_dirty(struct rmid_entry *entry)
-{
- u64 val = 0;
-
- if (resctrl_arch_rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID, &val))
- return true;
-
- return val >= resctrl_cqm_threshold;
-}
-
/*
* Check the RMIDs that are marked as busy for this domain. If the
* reported LLC occupancy is below the threshold clear the busy bit and
@@ -213,6 +207,8 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
struct rmid_entry *entry;
struct rdt_resource *r;
u32 crmid = 1, nrmid;
+ bool rmid_dirty;
+ u64 val = 0;

r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;

@@ -228,7 +224,14 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
break;

entry = __rmid_entry(nrmid);
- if (force_free || !rmid_dirty(entry)) {
+
+ if (resctrl_arch_rmid_read(r, d, entry->rmid,
+ QOS_L3_OCCUP_EVENT_ID, &val))
+ rmid_dirty = true;
+ else
+ rmid_dirty = (val >= resctrl_cqm_threshold);
+
+ if (force_free || !rmid_dirty) {
clear_bit(entry->rmid, d->rmid_busy_llc);
if (!--entry->busy) {
rmid_limbo_count--;
@@ -278,7 +281,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
cpu = get_cpu();
list_for_each_entry(d, &r->domains, list) {
if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
- err = resctrl_arch_rmid_read(entry->rmid,
+ err = resctrl_arch_rmid_read(r, d, entry->rmid,
QOS_L3_OCCUP_EVENT_ID,
&val);
if (err || val <= resctrl_cqm_threshold)
@@ -336,7 +339,7 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
if (rr->first)
resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);

- rr->err = resctrl_arch_rmid_read(rmid, rr->evtid, &tval);
+ rr->err = resctrl_arch_rmid_read(rr->r, rr->d, rmid, rr->evtid, &tval);
if (rr->err)
return rr->err;

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index efe60dd7fd21..7ccfa0d1bb34 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -219,7 +219,23 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
-int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *res);
+
+/**
+ * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
+ * for this resource and domain.
+ * @r: resource that the counter should be read from.
+ * @d: domain that the counter should be read from.
+ * @rmid: rmid of the counter to read.
+ * @eventid: eventid to read, e.g. L3 occupancy.
+ * @val: result of the counter read in chunks.
+ *
+ * Call from process context on a CPU that belongs to domain @d.
+ *
+ * Return:
+ * 0 on success, or -EIO, -EINVAL etc on error.
+ */
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
+ u32 rmid, enum resctrl_event_id eventid, u64 *val);

/**
* resctrl_arch_reset_rmid() - Reset any private state associated with rmid
--
2.30.2

2022-06-22 17:29:26

by James Morse

[permalink] [raw]
Subject: [PATCH v5 05/21] x86/resctrl: Add domain offline callback for resctrl work

Because domains are exposed to user-space via resctrl, the filesystem
must update its state when CPU hotplug callbacks are triggered.

Some of this work is common to any architecture that would support
resctrl, but the work is tied up with the architecture code to
free the memory.

Move the monitor subdir removal and the cancelling of the mbm/limbo
works into a new resctrl_offline_domain() call. These bits are not
specific to the architecture. Grouping them in one function allows
that code to be moved to /fs/ and re-used by another architecture.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v2:
* Moved kfree()ing to domain_destroy_mon_state() for later re-use.

Changes since v1:
* Removed a redundant mon_capable check
* Capitalisation
* Removed inline comment
* Added to the commit message
---
arch/x86/kernel/cpu/resctrl/core.c | 26 ++-------------
arch/x86/kernel/cpu/resctrl/internal.h | 2 --
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 45 +++++++++++++++++++++++---
include/linux/resctrl.h | 1 +
4 files changed, 44 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index e37889f7a1a5..f69182973175 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -523,27 +523,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)

cpumask_clear_cpu(cpu, &d->cpu_mask);
if (cpumask_empty(&d->cpu_mask)) {
- /*
- * If resctrl is mounted, remove all the
- * per domain monitor data directories.
- */
- if (static_branch_unlikely(&rdt_mon_enable_key))
- rmdir_mondata_subdir_allrdtgrp(r, d->id);
+ resctrl_offline_domain(r, d);
list_del(&d->list);
- if (r->mon_capable && is_mbm_enabled())
- cancel_delayed_work(&d->mbm_over);
- if (is_llc_occupancy_enabled() && has_busy_rmid(r, d)) {
- /*
- * When a package is going down, forcefully
- * decrement rmid->ebusy. There is no way to know
- * that the L3 was flushed and hence may lead to
- * incorrect counts in rare scenarios, but leaving
- * the RMID as busy creates RMID leaks if the
- * package never comes back.
- */
- __check_limbo(d, true);
- cancel_delayed_work(&d->cqm_limbo);
- }

/*
* rdt_domain "d" is going to be freed below, so clear
@@ -551,11 +532,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
*/
if (d->plr)
d->plr->d = NULL;
-
- bitmap_free(d->rmid_busy_llc);
- kfree(d->mbm_total);
- kfree(d->mbm_local);
domain_free(hw_dom);
+
return;
}

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index be48a682dbdb..e12b55f815bf 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -522,8 +522,6 @@ void free_rmid(u32 rmid);
int rdt_get_mon_l3_config(struct rdt_resource *r);
void mon_event_count(void *info);
int rdtgroup_mondata_show(struct seq_file *m, void *arg);
-void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
- unsigned int dom_id);
void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
struct rdt_domain *d, struct rdtgroup *rdtgrp,
int evtid, int first);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 030a70326ccc..5830905a92d2 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2499,14 +2499,12 @@ static int mon_addfile(struct kernfs_node *parent_kn, const char *name,
* Remove all subdirectories of mon_data of ctrl_mon groups
* and monitor groups with given domain id.
*/
-void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id)
+static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
+ unsigned int dom_id)
{
struct rdtgroup *prgrp, *crgrp;
char name[32];

- if (!r->mon_capable)
- return;
-
list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
sprintf(name, "mon_%s_%02d", r->name, dom_id);
kernfs_remove_by_name(prgrp->mon.mon_data_kn, name);
@@ -3233,6 +3231,45 @@ static int __init rdtgroup_setup_root(void)
return ret;
}

+static void domain_destroy_mon_state(struct rdt_domain *d)
+{
+ bitmap_free(d->rmid_busy_llc);
+ kfree(d->mbm_total);
+ kfree(d->mbm_local);
+}
+
+void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ if (!r->mon_capable)
+ return;
+
+ /*
+ * If resctrl is mounted, remove all the
+ * per domain monitor data directories.
+ */
+ if (static_branch_unlikely(&rdt_mon_enable_key))
+ rmdir_mondata_subdir_allrdtgrp(r, d->id);
+
+ if (is_mbm_enabled())
+ cancel_delayed_work(&d->mbm_over);
+ if (is_llc_occupancy_enabled() && has_busy_rmid(r, d)) {
+ /*
+ * When a package is going down, forcefully
+ * decrement rmid->ebusy. There is no way to know
+ * that the L3 was flushed and hence may lead to
+ * incorrect counts in rare scenarios, but leaving
+ * the RMID as busy creates RMID leaks if the
+ * package never comes back.
+ */
+ __check_limbo(d, true);
+ cancel_delayed_work(&d->cqm_limbo);
+ }
+
+ domain_destroy_mon_state(d);
+}
+
static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
{
size_t tsize;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index d512455b4c3a..5d283bdd6162 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -193,5 +193,6 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);

#endif /* _RESCTRL_H */
--
2.30.2

2022-06-22 17:31:31

by James Morse

[permalink] [raw]
Subject: [PATCH v5 20/21] x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data

resctrl_rmid_realloc_threshold can be set by user-space. The maximum
value is specified by the architecture.

Currently max_threshold_occ_write() reads the maximum value from
boot_cpu_data.x86_cache_size, which is not portable to another
architecture.

Add resctrl_rmid_realloc_limit to describe the maximum size in bytes
that user-space can set the threshold to.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
arch/x86/kernel/cpu/resctrl/monitor.c | 9 +++++++--
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 2 +-
include/linux/resctrl.h | 1 +
3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index e91afe99b763..8d15568d7121 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -67,6 +67,11 @@ unsigned int rdt_mon_features;
*/
unsigned int resctrl_rmid_realloc_threshold;

+/*
+ * This is the maximum value for the reallocation threshold, in bytes.
+ */
+unsigned int resctrl_rmid_realloc_limit;
+
#define CF(cf) ((unsigned long)(1048576 * (cf) + 0.5))

/*
@@ -747,10 +752,10 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
{
unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- unsigned int cl_size = boot_cpu_data.x86_cache_size;
unsigned int threshold;
int ret;

+ resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1;
hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
@@ -767,7 +772,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
*
* For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
*/
- threshold = cl_size * 1024 / r->num_rmid;
+ threshold = resctrl_rmid_realloc_limit / r->num_rmid;

/*
* Because num_rmid may not be a power of two, round the value
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index ae77ea5f323a..1b714523dc26 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1059,7 +1059,7 @@ static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
if (ret)
return ret;

- if (bytes > (boot_cpu_data.x86_cache_size * 1024))
+ if (bytes > resctrl_rmid_realloc_limit)
return -EINVAL;

resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(bytes);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 9995d043650a..cb857f753322 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -251,5 +251,6 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
u32 rmid, enum resctrl_event_id eventid);

extern unsigned int resctrl_rmid_realloc_threshold;
+extern unsigned int resctrl_rmid_realloc_limit;

#endif /* _RESCTRL_H */
--
2.30.2

2022-06-22 17:32:16

by James Morse

[permalink] [raw]
Subject: [PATCH v5 19/21] x86/resctrl: Rename and change the units of resctrl_cqm_threshold

resctrl_cqm_threshold is stored in a hardware specific chunk size,
but exposed to user-space as bytes.

This means the filesystem parts of resctrl need to know how the hardware
counts, to convert the user provided byte value to chunks. The interface
between the architecture's resctrl code and the filesystem ought to
treat everything as bytes.

Change the unit of resctrl_cqm_threshold to bytes. resctrl_arch_rmid_read()
still returns its value in chunks, so this needs converting to bytes.
As all the users have been touched, rename the variable to
resctrl_rmid_realloc_threshold, which describes what the value is for.

Neither r->num_rmid nor hw_res->mon_scale are guaranteed to be a power
of 2, so the existing code introduces a rounding error from resctrl's
theoretical fraction of the cache usage. This behaviour is kept as it
ensures the user visible value matches the value read from hardware
when the rmid will be reallocated.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v4:
* Added resctrl_arch_round_mon_val() to fix the user provided value.
* Keep the 'hw counts in chunks' comment.

Changes since v3:
* Preserved the rounding errors.
---
arch/x86/include/asm/resctrl.h | 9 ++++++
arch/x86/kernel/cpu/resctrl/internal.h | 1 -
arch/x86/kernel/cpu/resctrl/monitor.c | 43 ++++++++++++++++----------
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 9 ++----
include/linux/resctrl.h | 2 ++
5 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index d60ed0668a59..d24b04ebf950 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -81,6 +81,15 @@ static void __resctrl_sched_in(void)
}
}

+static inline unsigned int resctrl_arch_round_mon_val(unsigned int val)
+{
+ unsigned int scale = boot_cpu_data.x86_cache_occ_scale;
+
+ /* h/w works in units of "boot_cpu_data.x86_cache_occ_scale" */
+ val /= scale;
+ return val * scale;
+}
+
static inline void resctrl_sched_in(void)
{
if (static_branch_likely(&rdt_enable_key))
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index bdb55c2fbdd3..c05e9b7cf77a 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -98,7 +98,6 @@ struct rmid_read {
u64 val;
};

-extern unsigned int resctrl_cqm_threshold;
extern bool rdt_alloc_capable;
extern bool rdt_mon_capable;
extern unsigned int rdt_mon_features;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 27bb4947a176..e91afe99b763 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -17,7 +17,10 @@

#include <linux/module.h>
#include <linux/slab.h>
+
#include <asm/cpu_device_id.h>
+#include <asm/resctrl.h>
+
#include "internal.h"

struct rmid_entry {
@@ -37,8 +40,8 @@ static LIST_HEAD(rmid_free_lru);
* @rmid_limbo_count count of currently unused but (potentially)
* dirty RMIDs.
* This counts RMIDs that no one is currently using but that
- * may have a occupancy value > intel_cqm_threshold. User can change
- * the threshold occupancy value.
+ * may have a occupancy value > resctrl_rmid_realloc_threshold. User can
+ * change the threshold occupancy value.
*/
static unsigned int rmid_limbo_count;

@@ -59,10 +62,10 @@ bool rdt_mon_capable;
unsigned int rdt_mon_features;

/*
- * This is the threshold cache occupancy at which we will consider an
+ * This is the threshold cache occupancy in bytes at which we will consider an
* RMID available for re-allocation.
*/
-unsigned int resctrl_cqm_threshold;
+unsigned int resctrl_rmid_realloc_threshold;

#define CF(cf) ((unsigned long)(1048576 * (cf) + 0.5))

@@ -223,14 +226,13 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
*/
void __check_limbo(struct rdt_domain *d, bool force_free)
{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rmid_entry *entry;
- struct rdt_resource *r;
u32 crmid = 1, nrmid;
bool rmid_dirty;
u64 val = 0;

- r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
-
/*
* Skip RMID 0 and start from RMID 1 and check all the RMIDs that
* are marked as busy for occupancy < threshold. If the occupancy
@@ -245,10 +247,12 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
entry = __rmid_entry(nrmid);

if (resctrl_arch_rmid_read(r, d, entry->rmid,
- QOS_L3_OCCUP_EVENT_ID, &val))
+ QOS_L3_OCCUP_EVENT_ID, &val)) {
rmid_dirty = true;
- else
- rmid_dirty = (val >= resctrl_cqm_threshold);
+ } else {
+ val *= hw_res->mon_scale;
+ rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
+ }

if (force_free || !rmid_dirty) {
clear_bit(entry->rmid, d->rmid_busy_llc);
@@ -289,13 +293,12 @@ int alloc_rmid(void)

static void add_rmid_to_limbo(struct rmid_entry *entry)
{
- struct rdt_resource *r;
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rdt_domain *d;
int cpu, err;
u64 val = 0;

- r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
-
entry->busy = 0;
cpu = get_cpu();
list_for_each_entry(d, &r->domains, list) {
@@ -303,7 +306,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
err = resctrl_arch_rmid_read(r, d, entry->rmid,
QOS_L3_OCCUP_EVENT_ID,
&val);
- if (err || val <= resctrl_cqm_threshold)
+ val *= hw_res->mon_scale;
+ if (err || val <= resctrl_rmid_realloc_threshold)
continue;
}

@@ -744,6 +748,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
unsigned int cl_size = boot_cpu_data.x86_cache_size;
+ unsigned int threshold;
int ret;

hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
@@ -762,10 +767,14 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
*
* For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
*/
- resctrl_cqm_threshold = cl_size * 1024 / r->num_rmid;
+ threshold = cl_size * 1024 / r->num_rmid;

- /* h/w works in units of "boot_cpu_data.x86_cache_occ_scale" */
- resctrl_cqm_threshold /= hw_res->mon_scale;
+ /*
+ * Because num_rmid may not be a power of two, round the value
+ * to the nearest multiple of hw_res->mon_scale so it matches a
+ * value the hardware will measure. mon_scale may not be a power of 2.
+ */
+ resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(threshold);

ret = dom_data_init(r);
if (ret)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 572fc07c9608..ae77ea5f323a 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1030,10 +1030,7 @@ static int rdt_delay_linear_show(struct kernfs_open_file *of,
static int max_threshold_occ_show(struct kernfs_open_file *of,
struct seq_file *seq, void *v)
{
- struct rdt_resource *r = of->kn->parent->priv;
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
-
- seq_printf(seq, "%u\n", resctrl_cqm_threshold * hw_res->mon_scale);
+ seq_printf(seq, "%u\n", resctrl_rmid_realloc_threshold);

return 0;
}
@@ -1055,7 +1052,6 @@ static int rdt_thread_throttle_mode_show(struct kernfs_open_file *of,
static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
- struct rdt_hw_resource *hw_res;
unsigned int bytes;
int ret;

@@ -1066,8 +1062,7 @@ static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
if (bytes > (boot_cpu_data.x86_cache_size * 1024))
return -EINVAL;

- hw_res = resctrl_to_arch_res(of->kn->parent->priv);
- resctrl_cqm_threshold = bytes / hw_res->mon_scale;
+ resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(bytes);

return nbytes;
}
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 7ccfa0d1bb34..9995d043650a 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -250,4 +250,6 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
u32 rmid, enum resctrl_event_id eventid);

+extern unsigned int resctrl_rmid_realloc_threshold;
+
#endif /* _RESCTRL_H */
--
2.30.2

2022-06-22 17:32:48

by James Morse

[permalink] [raw]
Subject: [PATCH v5 09/21] x86/resctrl: Switch over to the resctrl mbps_val list

Updates to resctrl's software controller follow the same path as
other configuration updates, but they don't modify the hardware state.
rdtgroup_schemata_write() uses parse_line() and the resource's
parse_ctrlval() function to stage the configuration.
resctrl_arch_update_domains() then updates the mbps_val[] array
instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update()
call that would update hardware.

This complicates the interface between resctrl's filesystem parts
and architecture specific code. It should be possible for mba_sc
to be completely implemented by the filesystem parts of resctrl. This
would allow it to work on a second architecture with no additional code.
resctrl_arch_update_domains() using the mbps_val[] array prevents this.

Change parse_bw() to write the configuration value directly to the
mbps_val[] array in the domain structure. Change rdtgroup_schemata_write()
to skip the call to resctrl_arch_update_domains(), meaning all the
mba_sc specific code in resctrl_arch_update_domains() can be removed.
On the read-side, show_doms() and update_mba_bw() are changed to read
the mbps_val[] array from the domain structure. With this,
resctrl_arch_get_config() no longer needs to consider mba_sc resources.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v4:
* Wording of comments.
* Restored mbps_val[] reset code to set_mba_sc().
* Moved mbps_val[] reset in rdtgroup_init_mba() to reduce noise.

Changes since v3:
* Added the rdtgroup_init_mba() hunk to avoid ~0 being written
to the ctrl_val array, and to only reset mbps_val[] when its going
to be used.

Changes since v2:
* Fixed some names in the commit message.
* Added missing 'or mbps_val[]' code to rdtgroup_size_show()

Changes since v1:
* Squashed out struct resctrl_mba_sc
* Removed stray paragraphs from commit message
---
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 44 ++++++++++++++---------
arch/x86/kernel/cpu/resctrl/monitor.c | 10 +++---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 27 ++++++++++----
3 files changed, 52 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 87666275eed9..9f45207a6c74 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -61,6 +61,7 @@ int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
struct rdt_domain *d)
{
struct resctrl_staged_config *cfg;
+ u32 closid = data->rdtgrp->closid;
struct rdt_resource *r = s->res;
unsigned long bw_val;

@@ -72,6 +73,12 @@ int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,

if (!bw_validate(data->buf, &bw_val, r))
return -EINVAL;
+
+ if (is_mba_sc(r)) {
+ d->mbps_val[closid] = bw_val;
+ return 0;
+ }
+
cfg->new_ctrl = bw_val;
cfg->have_new_ctrl = true;

@@ -261,14 +268,13 @@ static u32 get_config_index(u32 closid, enum resctrl_conf_type type)

static bool apply_config(struct rdt_hw_domain *hw_dom,
struct resctrl_staged_config *cfg, u32 idx,
- cpumask_var_t cpu_mask, bool mba_sc)
+ cpumask_var_t cpu_mask)
{
struct rdt_domain *dom = &hw_dom->d_resctrl;
- u32 *dc = !mba_sc ? hw_dom->ctrl_val : hw_dom->mbps_val;

- if (cfg->new_ctrl != dc[idx]) {
+ if (cfg->new_ctrl != hw_dom->ctrl_val[idx]) {
cpumask_set_cpu(cpumask_any(&dom->cpu_mask), cpu_mask);
- dc[idx] = cfg->new_ctrl;
+ hw_dom->ctrl_val[idx] = cfg->new_ctrl;

return true;
}
@@ -284,14 +290,12 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
enum resctrl_conf_type t;
cpumask_var_t cpu_mask;
struct rdt_domain *d;
- bool mba_sc;
int cpu;
u32 idx;

if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
return -ENOMEM;

- mba_sc = is_mba_sc(r);
msr_param.res = NULL;
list_for_each_entry(d, &r->domains, list) {
hw_dom = resctrl_to_arch_dom(d);
@@ -301,7 +305,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
continue;

idx = get_config_index(closid, t);
- if (!apply_config(hw_dom, cfg, idx, cpu_mask, mba_sc))
+ if (!apply_config(hw_dom, cfg, idx, cpu_mask))
continue;

if (!msr_param.res) {
@@ -315,11 +319,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
}
}

- /*
- * Avoid writing the control msr with control values when
- * MBA software controller is enabled
- */
- if (cpumask_empty(cpu_mask) || mba_sc)
+ if (cpumask_empty(cpu_mask))
goto done;
cpu = get_cpu();
/* Update resource control msr on this CPU if it's in cpu_mask. */
@@ -406,6 +406,14 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,

list_for_each_entry(s, &resctrl_schema_all, list) {
r = s->res;
+
+ /*
+ * Writes to mba_sc resources update the software controller,
+ * not the control msr.
+ */
+ if (is_mba_sc(r))
+ continue;
+
ret = resctrl_arch_update_domains(r, rdtgrp->closid);
if (ret)
goto out;
@@ -433,9 +441,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
u32 idx = get_config_index(closid, type);

- if (!is_mba_sc(r))
- return hw_dom->ctrl_val[idx];
- return hw_dom->mbps_val[idx];
+ return hw_dom->ctrl_val[idx];
}

static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int closid)
@@ -450,8 +456,12 @@ static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int clo
if (sep)
seq_puts(s, ";");

- ctrl_val = resctrl_arch_get_config(r, dom, closid,
- schema->conf_type);
+ if (is_mba_sc(r))
+ ctrl_val = dom->mbps_val[closid];
+ else
+ ctrl_val = resctrl_arch_get_config(r, dom, closid,
+ schema->conf_type);
+
seq_printf(s, r->format_str, dom->id, max_data_width,
ctrl_val);
sep = true;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 497cadf3285d..16028b2f756a 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -447,13 +447,11 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
hw_dom_mba = resctrl_to_arch_dom(dom_mba);

cur_bw = pmbm_data->prev_bw;
- user_bw = resctrl_arch_get_config(r_mba, dom_mba, closid, CDP_NONE);
+ user_bw = dom_mba->mbps_val[closid];
delta_bw = pmbm_data->delta_bw;
- /*
- * resctrl_arch_get_config() chooses the mbps/ctrl value to return
- * based on is_mba_sc(). For now, reach into the hw_dom.
- */
- cur_msr_val = hw_dom_mba->ctrl_val[closid];
+
+ /* MBA resource doesn't support CDP */
+ cur_msr_val = resctrl_arch_get_config(r_mba, dom_mba, closid, CDP_NONE);

/*
* For Ctrl groups read data from child monitor groups.
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index bb1faf4f5b38..2b43bb9a9b76 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1356,11 +1356,13 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
struct seq_file *s, void *v)
{
struct resctrl_schema *schema;
+ enum resctrl_conf_type type;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
struct rdt_domain *d;
unsigned int size;
int ret = 0;
+ u32 closid;
bool sep;
u32 ctrl;

@@ -1386,8 +1388,11 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
goto out;
}

+ closid = rdtgrp->closid;
+
list_for_each_entry(schema, &resctrl_schema_all, list) {
r = schema->res;
+ type = schema->conf_type;
sep = false;
seq_printf(s, "%*s:", max_name_width, schema->name);
list_for_each_entry(d, &r->domains, list) {
@@ -1396,9 +1401,12 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
size = 0;
} else {
- ctrl = resctrl_arch_get_config(r, d,
- rdtgrp->closid,
- schema->conf_type);
+ if (is_mba_sc(r))
+ ctrl = d->mbps_val[closid];
+ else
+ ctrl = resctrl_arch_get_config(r, d,
+ closid,
+ type);
if (r->rid == RDT_RESOURCE_MBA)
size = ctrl;
else
@@ -2818,14 +2826,19 @@ static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid)
}

/* Initialize MBA resource with default values. */
-static void rdtgroup_init_mba(struct rdt_resource *r)
+static void rdtgroup_init_mba(struct rdt_resource *r, u32 closid)
{
struct resctrl_staged_config *cfg;
struct rdt_domain *d;

list_for_each_entry(d, &r->domains, list) {
+ if (is_mba_sc(r)) {
+ d->mbps_val[closid] = MBA_MAX_MBPS;
+ continue;
+ }
+
cfg = &d->staged_config[CDP_NONE];
- cfg->new_ctrl = is_mba_sc(r) ? MBA_MAX_MBPS : r->default_ctrl;
+ cfg->new_ctrl = r->default_ctrl;
cfg->have_new_ctrl = true;
}
}
@@ -2840,7 +2853,9 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
list_for_each_entry(s, &resctrl_schema_all, list) {
r = s->res;
if (r->rid == RDT_RESOURCE_MBA) {
- rdtgroup_init_mba(r);
+ rdtgroup_init_mba(r, rdtgrp->closid);
+ if (is_mba_sc(r))
+ continue;
} else {
ret = rdtgroup_init_cat(s, rdtgrp->closid);
if (ret < 0)
--
2.30.2

2022-06-22 17:36:36

by James Morse

[permalink] [raw]
Subject: [PATCH v5 02/21] x86/resctrl: Merge mon_capable and mon_enabled

mon_enabled and mon_capable are always set as a pair by
rdt_get_mon_l3_config().

There is no point having two values.

Merge them together.

Reviewed-by: Jamie Iles <[email protected]>
Tested-by: Xin Hao <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Cristian Marussi <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v1:
* Removed stray cdp_capable changes.
---
arch/x86/kernel/cpu/resctrl/internal.h | 4 ----
arch/x86/kernel/cpu/resctrl/monitor.c | 1 -
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++----
include/linux/resctrl.h | 2 --
4 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 53f3d275a98f..8828b5c1b6d2 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -459,10 +459,6 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
for_each_rdt_resource(r) \
if (r->mon_capable)

-#define for_each_mon_enabled_rdt_resource(r) \
- for_each_rdt_resource(r) \
- if (r->mon_enabled)
-
/* CPUID.(EAX=10H, ECX=ResID=1).EAX */
union cpuid_0x10_1_eax {
struct {
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index eaf25a234ff5..497cadf3285d 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -717,7 +717,6 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
l3_mon_evt_init(r);

r->mon_capable = true;
- r->mon_enabled = true;

return 0;
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 526eb933333b..def7c6681f8b 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1765,7 +1765,7 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
goto out_destroy;
}

- for_each_mon_enabled_rdt_resource(r) {
+ for_each_mon_capable_rdt_resource(r) {
fflags = r->fflags | RF_MON_INFO;
sprintf(name, "%s_MON", r->name);
ret = rdtgroup_mkdir_info_resdir(r, name, fflags);
@@ -2504,7 +2504,7 @@ void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id)
struct rdtgroup *prgrp, *crgrp;
char name[32];

- if (!r->mon_enabled)
+ if (!r->mon_capable)
return;

list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
@@ -2572,7 +2572,7 @@ void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
struct rdtgroup *prgrp, *crgrp;
struct list_head *head;

- if (!r->mon_enabled)
+ if (!r->mon_capable)
return;

list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
@@ -2642,7 +2642,7 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
* Create the subdirectories for each domain. Note that all events
* in a domain like L3 are grouped into a resource whose domain is L3
*/
- for_each_mon_enabled_rdt_resource(r) {
+ for_each_mon_capable_rdt_resource(r) {
ret = mkdir_mondata_subdir_alldom(kn, r, prgrp);
if (ret)
goto out_destroy;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 386ab3a41500..8180c539800d 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -130,7 +130,6 @@ struct resctrl_schema;
/**
* struct rdt_resource - attributes of a resctrl resource
* @rid: The index of the resource
- * @mon_enabled: Is monitoring enabled for this feature
* @alloc_capable: Is allocation available on this machine
* @mon_capable: Is monitor feature available on this machine
* @num_rmid: Number of RMIDs available
@@ -149,7 +148,6 @@ struct resctrl_schema;
*/
struct rdt_resource {
int rid;
- bool mon_enabled;
bool alloc_capable;
bool mon_capable;
int num_rmid;
--
2.30.2

2022-07-03 16:45:50

by haoxin

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes

Hi  james,

I have a review all of the patches, it looks goot to me, but i also test
them once again, i have a little confusion with my test.

# mkdir p1

# echo "L3:0=001;1=001" > p1/schemata

# [root@iZbp1bu26qv0j3ddyusot3Z p1]# cat schemata
    MB:0=100;1=100
    L3:0=001;1=001

# memhog -r1000000 1000m > /mnt/log &

[1] 53023
[root@iZbp1bu26qv0j3ddyusot3Z p1]# echo 53023 > tasks
[
[root@iZbp1bu26qv0j3ddyusot3Z p1]# cat mon_data/mon_L3_00/llc_occupancy
3833856
[root@iZbp1bu26qv0j3ddyusot3Z p1]# cat mon_data/mon_L3_00/llc_occupancy
3620864
[root@iZbp1bu26qv0j3ddyusot3Z p1]# cat mon_data/mon_L3_00/llc_occupancy
3727360
[root@iZbp1bu26qv0j3ddyusot3Z p1]# cat size
    MB:0=100;1=100
    L3:0=3407872;1=3407872

Obviously, the value has been overflowed,  Can you explain why?

My machine environment is:

Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz

numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23 24 25 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
73 74 75 76 77
node 0 size: 191813 MB
node 0 free: 189340 MB
node 1 cpus: 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
46 47 48 49 50 51 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
96 97 98 99 100 101 102 103
node 1 size: 193522 MB
node 1 free: 192332 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

On 6/23/22 12:46 AM, James Morse wrote:
> Changes in this version?
> * Use supports_mba_mbps() in resctrl_{on,off}line_domain()
> * Remove some error handling for errors that can't happen.
> * Restored mbps_val[] reset code to set_mba_sc().
> * Moved mbps_val[] reset in rdtgroup_init_mba() to reduce noise.
> * Added resctrl_arch_round_mon_val() to fix the user provided value.
>
> ---
> The aim of this series is to insert a split between the parts of the monitor
> code that the architecture must implement, and those that are part of the
> resctrl filesystem. The eventual aim is to move all filesystem parts out
> to live in /fs/resctrl, so that resctrl can be wired up for MPAM.
>
> What's MPAM? See the cover letter of a previous series. [1]
>
> The series adds domain online/offline callbacks to allow the filesystem to
> manage some of its structures itself, then moves all the 'mba_sc' behaviour
> to be part of the filesystem.
> This means another architecture doesn't need to provide an mbps_val array.
> As its all software, the resctrl filesystem should be able to do this without
> any help from the architecture code.
>
> Finally __rmid_read() is refactored to be the API call that the architecture
> provides to read a counter value. All the hardware specific overflow detection,
> scaling and value correction should occur behind this helper.
>
>
> This series is based on v5.19-rc1, and can be retrieved from:
> git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/resctrl_monitors_in_bytes/v5
>
> [0] git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/resctrl_merge_cdp/v7
> [1] https://lore.kernel.org/lkml/[email protected]/
>
> [v1] https://lore.kernel.org/lkml/[email protected]/
> [v2] https://lore.kernel.org/lkml/[email protected]/
> [v3] https://lore.kernel.org/lkml/[email protected]/
> [v4] https://lore.kernel.org/lkml/[email protected]/
>
> James Morse (21):
> x86/resctrl: Kill off alloc_enabled
> x86/resctrl: Merge mon_capable and mon_enabled
> x86/resctrl: Add domain online callback for resctrl work
> x86/resctrl: Group struct rdt_hw_domain cleanup
> x86/resctrl: Add domain offline callback for resctrl work
> x86/resctrl: Remove set_mba_sc()s control array re-initialisation
> x86/resctrl: Abstract and use supports_mba_mbps()
> x86/resctrl: Create mba_sc configuration in the rdt_domain
> x86/resctrl: Switch over to the resctrl mbps_val list
> x86/resctrl: Remove architecture copy of mbps_val
> x86/resctrl: Allow update_mba_bw() to update controls directly
> x86/resctrl: Calculate bandwidth from the previous __mon_event_count()
> chunks
> x86/resctrl: Add per-rmid arch private storage for overflow and chunks
> x86/resctrl: Allow per-rmid arch private storage to be reset
> x86/resctrl: Abstract __rmid_read()
> x86/resctrl: Pass the required parameters into
> resctrl_arch_rmid_read()
> x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()
> x86/resctrl: Move get_corrected_mbm_count() into
> resctrl_arch_rmid_read()
> x86/resctrl: Rename and change the units of resctrl_cqm_threshold
> x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's
> boot_cpu_data
> x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes
>
> arch/x86/include/asm/resctrl.h | 9 +
> arch/x86/kernel/cpu/resctrl/core.c | 117 ++++-------
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 75 ++++---
> arch/x86/kernel/cpu/resctrl/internal.h | 61 +++---
> arch/x86/kernel/cpu/resctrl/monitor.c | 232 ++++++++++++++--------
> arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 216 ++++++++++++++++----
> include/linux/resctrl.h | 64 +++++-
> 8 files changed, 514 insertions(+), 262 deletions(-)
>
--
Best Regards!
Xin Hao

2022-08-23 19:08:53

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes

Hi,

On 7/3/2022 8:54 AM, Xin Hao wrote:
> Hi  james,
>
> I have a review all of the patches, it looks goot to me, but i also test them once again, i have a little confusion with my test.
>
> # mkdir p1
>
> # echo "L3:0=001;1=001" > p1/schemata
>
> # [root@iZbp1bu26qv0j3ddyusot3Z p1]# cat schemata
>     MB:0=100;1=100
>     L3:0=001;1=001
>
> # memhog -r1000000 1000m > /mnt/log &
>
> [1] 53023
> [root@iZbp1bu26qv0j3ddyusot3Z p1]# echo 53023 > tasks
> [
> [root@iZbp1bu26qv0j3ddyusot3Z p1]# cat mon_data/mon_L3_00/llc_occupancy
> 3833856
> [root@iZbp1bu26qv0j3ddyusot3Z p1]# cat mon_data/mon_L3_00/llc_occupancy
> 3620864
> [root@iZbp1bu26qv0j3ddyusot3Z p1]# cat mon_data/mon_L3_00/llc_occupancy
> 3727360
> [root@iZbp1bu26qv0j3ddyusot3Z p1]# cat size
>     MB:0=100;1=100
>     L3:0=3407872;1=3407872
>
> Obviously, the value has been overflowed,  Can you explain why?

Are you seeing different behavior before and after you apply this
series?

I do not think the conclusion should immediately be that there is an
overflow issue. Have you perhaps run into the scenario "Notes on
cache occupancy monitoring and control" described in
Documentation/x86/resctrl.rst?

When "memhog" starts it can allocate to the entire L3 for a while
before it is moved to the constrained resource group. It's cache
lines are not evicted as part of this move so it is not unusual for
it to have more lines in L3 than it is allowed to allocate into.

Understanding the occupancy values require understanding of the workload
as well as the system environment.

Depending on the workload's data usage (for example if it keeps loading
new data - note that if the workload keeps loading the same data and the
data is already present in an area of cache that the workload cannot
allocate into then the data read would still result in a cache hit for the
workload, the data would not be moved to the area the
workload can allocate into) and other workloads on the system (there is
other load present also that evicts the lines owned by the workload) the
L3 occupancy rate should go down after a while to match the space it
can allocate into.

Reinette

2022-08-23 19:25:24

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes

Hi James,

On 6/22/2022 9:46 AM, James Morse wrote:
> The aim of this series is to insert a split between the parts of the monitor
> code that the architecture must implement, and those that are part of the
> resctrl filesystem. The eventual aim is to move all filesystem parts out
> to live in /fs/resctrl, so that resctrl can be wired up for MPAM.
>
> What's MPAM? See the cover letter of a previous series. [1]
>
> The series adds domain online/offline callbacks to allow the filesystem to
> manage some of its structures itself, then moves all the 'mba_sc' behaviour
> to be part of the filesystem.
> This means another architecture doesn't need to provide an mbps_val array.
> As its all software, the resctrl filesystem should be able to do this without
> any help from the architecture code.
>
> Finally __rmid_read() is refactored to be the API call that the architecture
> provides to read a counter value. All the hardware specific overflow detection,
> scaling and value correction should occur behind this helper.
>

Thank you for your patience as I was offline for a while.

This series looks good to me. I have one remaining comment that I provided
in reply to "[07/21] x86/resctrl: Abstract and use supports_mba_mbps()" where
it seems to me that an existing issue could easily be addressed in the new
function.

I do not have tests for the software controller and only did basic sanity
checks. It would be great if the folks using this feature could test this
series.

Thank you very much. From my side it looks good:

Reviewed-by: Reinette Chatre <[email protected]>

Reinette

2022-09-02 16:22:33

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes

Hi Reinette,

On 23/08/2022 18:20, Reinette Chatre wrote:
> On 6/22/2022 9:46 AM, James Morse wrote:
>> The aim of this series is to insert a split between the parts of the monitor
>> code that the architecture must implement, and those that are part of the
>> resctrl filesystem. The eventual aim is to move all filesystem parts out
>> to live in /fs/resctrl, so that resctrl can be wired up for MPAM.
>>
>> What's MPAM? See the cover letter of a previous series. [1]
>>
>> The series adds domain online/offline callbacks to allow the filesystem to
>> manage some of its structures itself, then moves all the 'mba_sc' behaviour
>> to be part of the filesystem.
>> This means another architecture doesn't need to provide an mbps_val array.
>> As its all software, the resctrl filesystem should be able to do this without
>> any help from the architecture code.
>>
>> Finally __rmid_read() is refactored to be the API call that the architecture
>> provides to read a counter value. All the hardware specific overflow detection,
>> scaling and value correction should occur behind this helper.
>>
>
> Thank you for your patience as I was offline for a while.

No problem,


> This series looks good to me. I have one remaining comment that I provided
> in reply to "[07/21] x86/resctrl: Abstract and use supports_mba_mbps()" where
> it seems to me that an existing issue could easily be addressed in the new
> function.

Yup, that made sense to me.


> I do not have tests for the software controller and only did basic sanity
> checks. It would be great if the folks using this feature could test this
> series.
>
> Thank you very much. From my side it looks good:
>
> Reviewed-by: Reinette Chatre <[email protected]>


Thanks!

James

2022-09-07 06:14:32

by haoxin

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes


在 2022/8/24 上午1:09, Reinette Chatre 写道:
> Hi,
>
> On 7/3/2022 8:54 AM, Xin Hao wrote:
>> Hi  james,
>>
>> I have a review all of the patches, it looks goot to me, but i also test them once again, i have a little confusion with my test.
>>
>> # mkdir p1
>>
>> # echo "L3:0=001;1=001" > p1/schemata
>>
>> # [root@iZbp1bu26qv0j3ddyusot3Z p1]# cat schemata
>>     MB:0=100;1=100
>>     L3:0=001;1=001
>>
>> # memhog -r1000000 1000m > /mnt/log &
>>
>> [1] 53023
>> [root@iZbp1bu26qv0j3ddyusot3Z p1]# echo 53023 > tasks
>> [
>> [root@iZbp1bu26qv0j3ddyusot3Z p1]# cat mon_data/mon_L3_00/llc_occupancy
>> 3833856
>> [root@iZbp1bu26qv0j3ddyusot3Z p1]# cat mon_data/mon_L3_00/llc_occupancy
>> 3620864
>> [root@iZbp1bu26qv0j3ddyusot3Z p1]# cat mon_data/mon_L3_00/llc_occupancy
>> 3727360
>> [root@iZbp1bu26qv0j3ddyusot3Z p1]# cat size
>>     MB:0=100;1=100
>>     L3:0=3407872;1=3407872
>>
>> Obviously, the value has been overflowed,  Can you explain why?
> Are you seeing different behavior before and after you apply this
> series?
No,they have the same test result。
>
> I do not think the conclusion should immediately be that there is an
> overflow issue. Have you perhaps run into the scenario "Notes on
> cache occupancy monitoring and control" described in
> Documentation/x86/resctrl.rst?
>
> When "memhog" starts it can allocate to the entire L3 for a while
> before it is moved to the constrained resource group. It's cache
> lines are not evicted as part of this move so it is not unusual for
> it to have more lines in L3 than it is allowed to allocate into.

Yes as you said, the mon_data/mon_L3_00/llc_occupancy does not
immediately become the value small than the set by schemata,  it may
takes a few minutes to reduce to the set value.

I don't quite understand why it takes so long to see the llc_occupancy
degrage.

>
> Understanding the occupancy values require understanding of the workload
> as well as the system environment.
>
> Depending on the workload's data usage (for example if it keeps loading
> new data - note that if the workload keeps loading the same data and the
> data is already present in an area of cache that the workload cannot
> allocate into then the data read would still result in a cache hit for the
> workload, the data would not be moved to the area the
> workload can allocate into) and other workloads on the system (there is
> other load present also that evicts the lines owned by the workload) the
> L3 occupancy rate should go down after a while to match the space it
> can allocate into.
>
> Reinette

2022-09-08 17:18:23

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes

Hi Hao Xin,

On 07/09/2022 06:46, haoxin wrote:
> 在 2022/8/24 上午1:09, Reinette Chatre 写道:
>> On 7/3/2022 8:54 AM, Xin Hao wrote:
>>> [root@iZbp1bu26qv0j3ddyusot3Z p1]# cat mon_data/mon_L3_00/llc_occupancy
>>> 3833856
>>> [root@iZbp1bu26qv0j3ddyusot3Z p1]# cat mon_data/mon_L3_00/llc_occupancy
>>> 3620864
>>> [root@iZbp1bu26qv0j3ddyusot3Z p1]# cat mon_data/mon_L3_00/llc_occupancy
>>> 3727360
>>> [root@iZbp1bu26qv0j3ddyusot3Z p1]# cat size
>>>      MB:0=100;1=100
>>>      L3:0=3407872;1=3407872
>>>
>>> Obviously, the value has been overflowed,  Can you explain why?

>> I do not think the conclusion should immediately be that there is an
>> overflow issue. Have you perhaps run into the scenario "Notes on
>> cache occupancy monitoring and control" described in
>> Documentation/x86/resctrl.rst?
>>
>> When "memhog" starts it can allocate to the entire L3 for a while
>> before it is moved to the constrained resource group. It's cache
>> lines are not evicted as part of this move so it is not unusual for
>> it to have more lines in L3 than it is allowed to allocate into.
>
> Yes as you said, the mon_data/mon_L3_00/llc_occupancy does not immediately become the
> value small than the set by schemata,  it may takes a few minutes to reduce to the set value.
>
> I don't quite understand why it takes so long to see the llc_occupancy degrage.

Do you have workloads in other control groups causing cache allocations?

One of the ways this stuff can be built is for the cache to use the policy to choose which
lines to evict. The cache may already have some LRU or line-state preferences when it
comes to eviction, so it may not apply the RDT policy as the first choice.

If there is no cache pressure from outside the control group - does it matter how quickly
it takes to apply?


Thanks,

James