2016-10-07 23:45:00

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 00/18] Intel Cache Allocation Technology

From: Fenghua Yu <[email protected]>

[I know we are in the middle of the merge window ... just putting
these out there now so that if anyone has a boring meeting or plane
ride they can have something interesting to do]

Dave: Got rid of the atomic ops in fork() ... in fact there is no hook
at all now. Membership in a resource group is indicated by the
value of the current->closid field. So fork overhead is just
the extra time to copy the task_struct that is one "int" bigger.
We also got rid of some memory acceses in intel_rdt_sched_in()
(which is also now fully in-line, so no function call). Maybe
current->closid could be re-positioned inside the task_struct
to get it into a warmer cache location.

Shaohua:Sorry, no "procs" file yet. It can be added as another patch
(hint: if you write it, it will happen faster :-)

PeterZ: Using bitfields for the CPUID field extraction.

Nilay: Using DEFINE_STATIC_KEY_FALSE now.

Boris: Using init_scattered_cpuid_features()

Thomas: So many comments ... I hope we got them all covered.

Totally revamped patch series to make review easier. Here's a
guide to what is in each patch, and what you can expect to see
on a running system with patches up to each point applied. Some
code got copied ... so Vikas deserves some Author credit, but I
didn't track which bits ended up in which new piece.

0001-Documentation-ABI-Add-a-document-entry-for-cache-id.patch
0002-cacheinfo-Introduce-cache-id.patch
0003-x86-intel_cacheinfo-Enable-cache-id-in-x86.patch

These three define an "id" for each cache ... we need a "name"
for a cache so we can say what restrictions to apply to each
cache in the system. All you will see at this point is an
extra "id" file in each /sys/devices/system/cpu/cpu*/cache/index*/
directory.

0004-x86-intel_rdt-Feature-discovery.patch

Look at CPUID for the features related to cache allocation.
At this point /proc/cpuinfo shows extra flags for the features
found on your system.

0005-Documentation-x86-Documentation-for-Intel-resource-a.patch

Documentation patch could be anywhere in this sequence. We
put in early so you can read it to see how to use the
interface.

0006-x86-intel_rdt-Add-CONFIG-Makefile-and-basic-initiali.patch

Add CONFIG_INTEL_RDT (default "n" ... you'll have to set
it to have this, and all the following patches do anything).
Template driver here just checks for features and spams
the console with one line for each.

0007-x86-intel_rdt-Add-Haswell-feature-discovery.patch

There are some Haswell systems that support cache allocation,
but they were made before the CPUID bits were fully defined.
So we check by probing the PQR_ASSOC MSR to see if CLOSID
bits stick. Unless you have one of these Haswells, you won't
see any difference here.

0008-x86-intel_rdt-Pick-up-L3-RDT-parameters-from-CPUID.patch

This is all new code, not seen in the previous versions of this
patch series. L3 cache allocation is just the first of several
resource control features. Define rdt_resource structure that
contains all the useful things we need to know about a resource.
Pick up the parameters for the resource from CPUID. The console
spam strings change format here.

0009-x86-cqm-Move-PQR_ASSOC-management-code-into-generic-.patch

The PQR_ASSOC MSR has a field for the CLOSID (which we need
define which allocation rules are in effect). But it also
contains the RMID (used by CQM and MBM perf monitoring).
The perf code got here first, but defined structures that
make it easy for the two systems to co-exist without stomping
on each other. This patch moves the relevant parts into a
common header file and changes the scope from "static" to
global so we can access them. No visible change.

0010-x86-intel_rdt-Build-structures-for-each-resource-bas.patch

For each enabled resource, we build a list of "rdt_domains" based
on hotplug cpu notifications. Since we only have L3 at this point,
this is just a list of L3 caches (named by the "id" established
in the first three patches). As each cache is found we initialize
the array of CBMs (cache bit masks). No visible change here.

0011-x86-intel_rdt-Add-basic-resctrl-filesystem-support.patch

Our interface is a kernfs backed file system. Establish the
mount point, and provide mount/unmount functionality.
At this point "/sys/fs/resctrl" appears. You can mount and
unmount the resctrl file system (if your system supports
code/data prioritization, you can use the "cdp" mount option).
The file system is empty and doesn't allow creation of any
files or subdirectories.

0012-x86-intel_rdt-Add-info-files-to-resctrl-file-system.patch

Parameters for each resource are buried in CPUID leaf 0x10.
This isn't very user friendly for scripts and applications
that want to configure resource allocation. Create an
"info" directory, with a subdirectory for each resource
containing a couple of useful parameters. Visible change:
$ ls -l /sys/fs/resctrl/info/L3
total 0
-r--r--r-- 1 root root 0 Oct 7 11:20 cbm_val
-r--r--r-- 1 root root 0 Oct 7 11:20 num_closid

0013-x86-intel_rdt-Add-mkdir-to-resctrl-file-system.patch

Each resource group is represented by a directory in the
resctrl file system. The root directory is the default group.
Use "mkdir" to create new groups and "rmdir" to remove them.
The maximum number of groups is defined by the effective
number of CLOSIDs.
Visible change: If you have CDP (and enable with the "cdp"
mount option) you will find that you can only create half
as many groups as without (e.g. 8 vs. 16 on Broadwell, but
the default group uses one ... so actually 7, 15).

0014-x86-intel_rdt-Add-cpus-file.patch

One of the control mechanisms for a resource group is the
logical CPU. Initially all CPUs are assigned to the default
group. They can be reassigned to other groups by writing
a cpumask to the "cpus" file. See the documentation for what
this means.
Visible change: "cpus" file in the root, and automatically
in each created subdirectory. You can "echo" masks to these
files and watch as CPUs added to one group are removed from
whatever group they previously belonged to. Removing a directory
will give all CPUs owned by it back to the default (root)
group.

0015-x86-intel_rdt-Add-tasks-files.patch

Tasks can be assigned to resource groups by writing their PID
to a "tasks" file (which removes the task from its previous
group). Forked/cloned tasks inherit the group from their
parent. You cannot remove a group (directory) that has any
tasks assigned.
Visible change: "tasks" files appear. E.g. (we see two tasks
in the group, our shell, and the "cat" that it spawned).
# echo $$ > p0/tasks; cat p0/tasks
268890
268914



0016-x86-intel_rdt-Add-schemata-file.patch

The "schemata" file in each group/directory defines what
access tasks controlled by this resource are permitted.
One line per resource type. Fields for each instance of
the resource. You redefine the access by wrting to the
file in the same format.
Visible change: "schemata" file which starts out with maximum
allowed resources. E.g.
$ cat schemata
L3:0=fffff;1=fffff
Now restrict this group to just 20% of L3 on first cache, but
allow 50% on the second
# echo L3:0=f;1=3ff > schemata

0017-x86-intel_rdt-Add-scheduler-hook.patch

When context switching we check if we are changing resource
groups for the new process, and update the PQR_ASSOC MSR with
the new CLOSID if needed.
Visble change: Everything should be working now. Tasks run with
the permitted access to L3 cache.

0018-MAINTAINERS-Add-maintainer-for-Intel-RDT-resource-al.patch

New files ... need a maintainer. Fenghua has the job.

Test note: The patches haven't been tested on 32-bit yet.


Fenghua Yu (15):
Documentation, ABI: Add a document entry for cache id
cacheinfo: Introduce cache id
x86, intel_cacheinfo: Enable cache id in x86
x86/intel_rdt: Feature discovery
Documentation, x86: Documentation for Intel resource allocation user
interface
x86/intel_rdt: Add CONFIG, Makefile, and basic initialization
x86/intel_rdt: Add Haswell feature discovery
x86/intel_rdt: Pick up L3 RDT parameters from CPUID
x86/cqm: Move PQR_ASSOC management code into generic code used by both
CQM and CAT
x86/intel_rdt: Add basic resctrl filesystem support
x86/intel_rdt: Add "info" files to resctrl file system
x86/intel_rdt: Add mkdir to resctrl file system
x86/intel_rdt: Add tasks files
x86/intel_rdt: Add scheduler hook
MAINTAINERS: Add maintainer for Intel RDT resource allocation

Tony Luck (3):
x86/intel_rdt: Build structures for each resource based on cache
topology
x86/intel_rdt: Add cpus file
x86/intel_rdt: Add schemata file

Documentation/ABI/testing/sysfs-devices-system-cpu | 16 +
Documentation/x86/intel_rdt_ui.txt | 162 ++++
MAINTAINERS | 8 +
arch/x86/Kconfig | 12 +
arch/x86/events/intel/cqm.c | 23 +-
arch/x86/include/asm/cpufeatures.h | 5 +
arch/x86/include/asm/intel_rdt.h | 205 +++++
arch/x86/include/asm/intel_rdt_common.h | 27 +
arch/x86/kernel/cpu/Makefile | 2 +
arch/x86/kernel/cpu/intel_cacheinfo.c | 20 +
arch/x86/kernel/cpu/intel_rdt.c | 294 +++++++
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 939 +++++++++++++++++++++
arch/x86/kernel/cpu/intel_rdt_schemata.c | 260 ++++++
arch/x86/kernel/cpu/scattered.c | 3 +
arch/x86/kernel/process_32.c | 4 +
arch/x86/kernel/process_64.c | 4 +
drivers/base/cacheinfo.c | 5 +
include/linux/cacheinfo.h | 3 +
include/linux/sched.h | 3 +
include/uapi/linux/magic.h | 1 +
20 files changed, 1975 insertions(+), 21 deletions(-)
create mode 100644 Documentation/x86/intel_rdt_ui.txt
create mode 100644 arch/x86/include/asm/intel_rdt.h
create mode 100644 arch/x86/include/asm/intel_rdt_common.h
create mode 100644 arch/x86/kernel/cpu/intel_rdt.c
create mode 100644 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
create mode 100644 arch/x86/kernel/cpu/intel_rdt_schemata.c

--
2.5.0


2016-10-07 23:44:48

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 01/18] Documentation, ABI: Add a document entry for cache id

From: Fenghua Yu <[email protected]>

Add an ABI document entry for /sys/devices/system/cpu/cpu*/cache/index*/id.

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
Documentation/ABI/testing/sysfs-devices-system-cpu | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 4987417..b1c3d69 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -272,6 +272,22 @@ Description: Parameters for the CPU cache attributes
the modified cache line is written to main
memory only when it is replaced

+
+What: /sys/devices/system/cpu/cpu*/cache/index*/id
+Date: September 2016
+Contact: Linux kernel mailing list <[email protected]>
+Description: Cache id
+
+ The id provides a unique name for a specific instance of
+ a cache of a particular type. E.g. there may be a level
+ 3 unified cache on each socket in a server and we may
+ assign them ids 0, 1, 2, ...
+
+ Note that id value may not be contiguous. E.g. level 1
+ caches typically exist per core, but there may not be a
+ power of two cores on a socket, so these caches may be
+ numbered 0, 1, 2, 3, 4, 5, 8, 9, 10, ...
+
What: /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats
/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/turbo_stat
/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/sub_turbo_stat
--
2.5.0

2016-10-07 23:44:52

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 02/18] cacheinfo: Introduce cache id

From: Fenghua Yu <[email protected]>

Cache management software needs a name for each instance of a cache of
a particular type.

The current cacheinfo structure does not provide any information about
the underlying hardware so there is no way to expose it.

Hardware with cache management features provides means (cpuid, enumeration
etc.) to retrieve the hardware id of a particular cache instance. Cache
instances which share hardware have the same hardware id.

Add an 'id' field to struct cacheinfo to store this information. Expose
this information under the /sys/devices/system/cpu/cpu*/cache/index*/
directory as well.

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
drivers/base/cacheinfo.c | 5 +++++
include/linux/cacheinfo.h | 3 +++
2 files changed, 8 insertions(+)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index e9fd32e..2a21c15 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -233,6 +233,7 @@ static ssize_t file_name##_show(struct device *dev, \
return sprintf(buf, "%u\n", this_leaf->object); \
}

+show_one(id, id);
show_one(level, level);
show_one(coherency_line_size, coherency_line_size);
show_one(number_of_sets, number_of_sets);
@@ -314,6 +315,7 @@ static ssize_t write_policy_show(struct device *dev,
return n;
}

+static DEVICE_ATTR_RO(id);
static DEVICE_ATTR_RO(level);
static DEVICE_ATTR_RO(type);
static DEVICE_ATTR_RO(coherency_line_size);
@@ -327,6 +329,7 @@ static DEVICE_ATTR_RO(shared_cpu_list);
static DEVICE_ATTR_RO(physical_line_partition);

static struct attribute *cache_default_attrs[] = {
+ &dev_attr_id.attr,
&dev_attr_type.attr,
&dev_attr_level.attr,
&dev_attr_shared_cpu_map.attr,
@@ -350,6 +353,8 @@ cache_default_attrs_is_visible(struct kobject *kobj,
const struct cpumask *mask = &this_leaf->shared_cpu_map;
umode_t mode = attr->mode;

+ if ((attr == &dev_attr_id.attr) && this_leaf->attributes & CACHE_ID)
+ return mode;
if ((attr == &dev_attr_type.attr) && this_leaf->type)
return mode;
if ((attr == &dev_attr_level.attr) && this_leaf->level)
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 2189935..0bcbb67 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -18,6 +18,7 @@ enum cache_type {

/**
* struct cacheinfo - represent a cache leaf node
+ * @id: This cache's id. It is unique among caches with the same (type, level).
* @type: type of the cache - data, inst or unified
* @level: represents the hierarchy in the multi-level cache
* @coherency_line_size: size of each cache line usually representing
@@ -44,6 +45,7 @@ enum cache_type {
* keeping, the remaining members form the core properties of the cache
*/
struct cacheinfo {
+ unsigned int id;
enum cache_type type;
unsigned int level;
unsigned int coherency_line_size;
@@ -61,6 +63,7 @@ struct cacheinfo {
#define CACHE_WRITE_ALLOCATE BIT(3)
#define CACHE_ALLOCATE_POLICY_MASK \
(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
+#define CACHE_ID BIT(4)

struct device_node *of_node;
bool disable_sysfs;
--
2.5.0

2016-10-07 23:45:11

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 09/18] x86/cqm: Move PQR_ASSOC management code into generic code used by both CQM and CAT

From: Fenghua Yu <[email protected]>

A CLOSID goes into effect when written to the PQR_ASSOC MSR. But this MSR
also contains the RMID used for preformance monitoring of cache occupancy
and memory bandwidth. Move the management code for this MSR out of
arch/x86/events/intel/cqm.c and into generic RDT code so we can coordinate
updates to the MSR.

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/events/intel/cqm.c | 21 +--------------------
arch/x86/include/asm/intel_rdt_common.h | 21 +++++++++++++++++++++
2 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index df86874..0c45cc8 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -24,32 +24,13 @@ static unsigned int cqm_l3_scale; /* supposedly cacheline size */
static bool cqm_enabled, mbm_enabled;
unsigned int mbm_socket_max;

-/**
- * struct intel_pqr_state - State cache for the PQR MSR
- * @rmid: The cached Resource Monitoring ID
- * @closid: The cached Class Of Service ID
- * @rmid_usecnt: The usage counter for rmid
- *
- * The upper 32 bits of MSR_IA32_PQR_ASSOC contain closid and the
- * lower 10 bits rmid. The update to MSR_IA32_PQR_ASSOC always
- * contains both parts, so we need to cache them.
- *
- * The cache also helps to avoid pointless updates if the value does
- * not change.
- */
-struct intel_pqr_state {
- u32 rmid;
- u32 closid;
- int rmid_usecnt;
-};
-
/*
* The cached intel_pqr_state is strictly per CPU and can never be
* updated from a remote CPU. Both functions which modify the state
* (intel_cqm_event_start and intel_cqm_event_stop) are called with
* interrupts disabled, which is sufficient for the protection.
*/
-static DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
+DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
static struct hrtimer *mbm_timers;
/**
* struct sample - mbm event's (local or total) data
diff --git a/arch/x86/include/asm/intel_rdt_common.h b/arch/x86/include/asm/intel_rdt_common.h
index e6e15cf..b31081b 100644
--- a/arch/x86/include/asm/intel_rdt_common.h
+++ b/arch/x86/include/asm/intel_rdt_common.h
@@ -3,4 +3,25 @@

#define MSR_IA32_PQR_ASSOC 0x0c8f

+/**
+ * struct intel_pqr_state - State cache for the PQR MSR
+ * @rmid: The cached Resource Monitoring ID
+ * @closid: The cached Class Of Service ID
+ * @rmid_usecnt: The usage counter for rmid
+ *
+ * The upper 32 bits of MSR_IA32_PQR_ASSOC contain closid and the
+ * lower 10 bits rmid. The update to MSR_IA32_PQR_ASSOC always
+ * contains both parts, so we need to cache them.
+ *
+ * The cache also helps to avoid pointless updates if the value does
+ * not change.
+ */
+struct intel_pqr_state {
+ u32 rmid;
+ u32 closid;
+ int rmid_usecnt;
+};
+
+DECLARE_PER_CPU(struct intel_pqr_state, pqr_state);
+
#endif /* _ASM_X86_INTEL_RDT_COMMON_H */
--
2.5.0

2016-10-07 23:45:22

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 08/18] x86/intel_rdt: Pick up L3 RDT parameters from CPUID

From: Fenghua Yu <[email protected]>

Define struct rdt_resource to hold all the parameterized
values for an RDT resource. Fill in some of those values
from CPUID leaf 0x10 (on Haswell we hard code them).

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/include/asm/intel_rdt.h | 67 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/cpu/intel_rdt.c | 54 +++++++++++++++++++++++++++++---
2 files changed, 117 insertions(+), 4 deletions(-)
create mode 100644 arch/x86/include/asm/intel_rdt.h

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
new file mode 100644
index 0000000..251ac2a
--- /dev/null
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -0,0 +1,67 @@
+#ifndef _ASM_X86_INTEL_RDT_H
+#define _ASM_X86_INTEL_RDT_H
+
+/**
+ * struct rdt_resource - attributes of an RDT resource
+ * @enabled: Is this feature enabled on this machine
+ * @name: Name to use in "schemata" file
+ * @max_closid: Maximum number of CLOSIDs supported
+ * @num_closid: Current number of CLOSIDs available
+ * @max_cbm: Largest Cache Bit Mask allowed
+ * @min_cbm_bits: Minimum number of bits to be set in a cache
+ * bit mask
+ * @domains: All domains for this resource
+ * @num_domains: Number of domains active
+ * @msr_base: Base MSR address for CBMs
+ * @cdp_capable: Code/Data Prioritization available
+ * @cdp_enabled: Code/Data Prioritization enabled
+ * @tmp_cbms: Scratch space when updating schemata
+ * @cache_level: Which cache level defines scope of this domain
+ */
+struct rdt_resource {
+ bool enabled;
+ char *name;
+ int max_closid;
+ int num_closid;
+ int cbm_len;
+ int min_cbm_bits;
+ u32 max_cbm;
+ struct list_head domains;
+ int num_domains;
+ int msr_base;
+ bool cdp_capable;
+ bool cdp_enabled;
+ u32 *tmp_cbms;
+ int cache_level;
+};
+
+#define for_each_rdt_resource(r) \
+ for (r = rdt_resources_all; r->name; r++) \
+ if (r->enabled)
+
+#define IA32_L3_CBM_BASE 0xc90
+extern struct rdt_resource rdt_resources_all[];
+
+enum {
+ RDT_RESOURCE_L3,
+};
+
+/* Maximum CLOSID allowed across all enabled resoources */
+extern int rdt_max_closid;
+
+/* CPUID.(EAX=10H, ECX=ResID=1).EAX */
+union cpuid_0x10_1_eax {
+ struct {
+ unsigned int cbm_len:5;
+ } split;
+ unsigned int full;
+};
+
+/* CPUID.(EAX=10H, ECX=ResID=1).EDX */
+union cpuid_0x10_1_edx {
+ struct {
+ unsigned int cos_max:16;
+ } split;
+ unsigned int full;
+};
+#endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index bc7f10b..b9b1e9e 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -28,6 +28,24 @@
#include <linux/err.h>
#include <asm/intel_rdt_common.h>
#include <asm/intel-family.h>
+#include <asm/intel_rdt.h>
+
+int rdt_max_closid;
+
+#define domain_init(name) LIST_HEAD_INIT(rdt_resources_all[name].domains)
+
+struct rdt_resource rdt_resources_all[] = {
+ {
+ .name = "L3",
+ .domains = domain_init(RDT_RESOURCE_L3),
+ .msr_base = IA32_L3_CBM_BASE,
+ .min_cbm_bits = 1,
+ .cache_level = 3
+ },
+ {
+ /* NULL terminated */
+ }
+};

/*
* cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
@@ -41,6 +59,7 @@
static inline bool cache_alloc_hsw_probe(void)
{
u32 l, h_old, h_new, h_tmp;
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3];

if (rdmsr_safe(MSR_IA32_PQR_ASSOC, &l, &h_old))
return false;
@@ -58,11 +77,19 @@ static inline bool cache_alloc_hsw_probe(void)

wrmsr(MSR_IA32_PQR_ASSOC, l, h_old);

+ r->max_closid = 4;
+ r->num_closid = r->max_closid;
+ r->cbm_len = 20;
+ r->max_cbm = BIT_MASK(20) - 1;
+ r->min_cbm_bits = 2;
+ r->enabled = true;
+
return true;
}

static inline bool get_rdt_resources(struct cpuinfo_x86 *c)
{
+ struct rdt_resource *r;
bool ret = false;

if (c->x86_vendor == X86_VENDOR_INTEL && c->x86 == 6 &&
@@ -71,8 +98,23 @@ static inline bool get_rdt_resources(struct cpuinfo_x86 *c)

if (!cpu_has(c, X86_FEATURE_RDT_A))
return false;
- if (cpu_has(c, X86_FEATURE_CAT_L3))
+ if (cpu_has(c, X86_FEATURE_CAT_L3)) {
+ union cpuid_0x10_1_eax eax;
+ union cpuid_0x10_1_edx edx;
+ u32 ebx, ecx;
+
+ r = &rdt_resources_all[RDT_RESOURCE_L3];
+ cpuid_count(0x00000010, 1, &eax.full, &ebx, &ecx, &edx.full);
+ r->max_closid = edx.split.cos_max + 1;
+ r->num_closid = r->max_closid;
+ r->cbm_len = eax.split.cbm_len + 1;
+ r->max_cbm = BIT_MASK(eax.split.cbm_len + 1) - 1;
+ if (cpu_has(c, X86_FEATURE_CDP_L3))
+ r->cdp_capable = true;
+ r->enabled = true;
+
ret = true;
+ }

return ret;
}
@@ -80,13 +122,17 @@ static inline bool get_rdt_resources(struct cpuinfo_x86 *c)
static int __init intel_rdt_late_init(void)
{
struct cpuinfo_x86 *c = &boot_cpu_data;
+ struct rdt_resource *r;

if (!get_rdt_resources(c))
return -ENODEV;

- pr_info("Intel cache allocation detected\n");
- if (cpu_has(c, X86_FEATURE_CDP_L3))
- pr_info("Intel code data prioritization detected\n");
+ for_each_rdt_resource(r)
+ rdt_max_closid = max(rdt_max_closid, r->max_closid);
+
+ for_each_rdt_resource(r)
+ pr_info("Intel %s allocation %s detected\n", r->name,
+ r->cdp_capable ? " (with CDP)" : "");

return 0;
}
--
2.5.0

2016-10-07 23:45:32

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 10/18] x86/intel_rdt: Build structures for each resource based on cache topology

From: Tony Luck <[email protected]>

We use the cpu hotplug notifier to catch each cpu in turn and look at
its cache topology w.r.t each of the resource groups. As we discover
new resources, we initialize the bitmask array for each to the default
(full access) value.

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/intel_rdt.h | 35 ++++++++++
arch/x86/kernel/cpu/intel_rdt.c | 135 +++++++++++++++++++++++++++++++++++++++
2 files changed, 170 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 251ac2a..bad8dc7 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -40,6 +40,39 @@ struct rdt_resource {
if (r->enabled)

#define IA32_L3_CBM_BASE 0xc90
+
+/**
+ * struct rdt_domain - group of cpus sharing an RDT resource
+ * @list: all instances of this resource
+ * @id: unique id for this instance
+ * @cpu_mask: which cpus share this resource
+ * @cbm: array of cache bit masks (indexed by CLOSID)
+ */
+struct rdt_domain {
+ struct list_head list;
+ int id;
+ struct cpumask cpu_mask;
+ u32 *cbm;
+};
+
+/**
+ * struct msr_param - set a range of MSRs from a domain
+ * @res: The resource to use
+ * @low: Beginning index from base MSR
+ * @high: End index
+ */
+struct msr_param {
+ struct rdt_resource *res;
+ int low;
+ int high;
+};
+
+extern struct static_key_false rdt_enable_key;
+extern struct mutex rdtgroup_mutex;
+
+int __init rdtgroup_init(void);
+
+extern struct rdtgroup *rdtgroup_default;
extern struct rdt_resource rdt_resources_all[];

enum {
@@ -64,4 +97,6 @@ union cpuid_0x10_1_edx {
} split;
unsigned int full;
};
+
+void rdt_cbm_update(void *arg);
#endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index b9b1e9e..76b7476 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -24,12 +24,18 @@

#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

+#include <linux/cacheinfo.h>
#include <linux/slab.h>
#include <linux/err.h>
+#include <linux/cpuhotplug.h>
+
#include <asm/intel_rdt_common.h>
#include <asm/intel-family.h>
#include <asm/intel_rdt.h>

+/* Mutex to protect rdtgroup access. */
+DEFINE_MUTEX(rdtgroup_mutex);
+
int rdt_max_closid;

#define domain_init(name) LIST_HEAD_INIT(rdt_resources_all[name].domains)
@@ -119,14 +125,143 @@ static inline bool get_rdt_resources(struct cpuinfo_x86 *c)
return ret;
}

+static int get_cache_id(int cpu, int level)
+{
+ struct cpu_cacheinfo *ci = get_cpu_cacheinfo(cpu);
+ int i;
+
+ for (i = 0; i < ci->num_leaves; i++)
+ if (ci->info_list[i].level == level)
+ return ci->info_list[i].id;
+ return -1;
+}
+
+void rdt_cbm_update(void *arg)
+{
+ struct msr_param *m = (struct msr_param *)arg;
+ struct rdt_resource *r = m->res;
+ struct rdt_domain *d;
+ struct list_head *l;
+ int i, cpu = smp_processor_id();
+
+ list_for_each(l, &r->domains) {
+ d = list_entry(l, struct rdt_domain, list);
+ if (cpumask_test_cpu(cpu, &d->cpu_mask))
+ goto found;
+ }
+ pr_info_once("cpu %d not found in any domain for resource %s\n",
+ cpu, r->name);
+
+found:
+ for (i = m->low; i < m->high; i++)
+ wrmsrl(r->msr_base + i, d->cbm[i]);
+}
+
+static void update_domain(int cpu, struct rdt_resource *r, int add)
+{
+ struct list_head *l;
+ struct rdt_domain *d;
+ int i, cache_id;
+
+ cache_id = get_cache_id(cpu, r->cache_level);
+
+ if (cache_id == -1) {
+ pr_info_once("Could't find cache id for cpu %d\n", cpu);
+ return;
+ }
+ list_for_each(l, &r->domains) {
+ d = list_entry(l, struct rdt_domain, list);
+ if (cache_id == d->id)
+ goto found;
+ if (cache_id < d->id)
+ break;
+ }
+ if (!add) {
+ pr_info_once("removed unknown cpu %d\n", cpu);
+ return;
+ }
+ d = kzalloc(sizeof(*d), GFP_KERNEL);
+ if (!d)
+ return;
+
+ d->id = cache_id;
+ d->cbm = kmalloc_array(r->max_closid, sizeof(*d->cbm), GFP_KERNEL);
+ if (!d->cbm) {
+ pr_info("Failed to alloc CBM array for cpu %d\n", cpu);
+ kfree(d);
+ return;
+ }
+ cpumask_set_cpu(cpu, &d->cpu_mask);
+ for (i = 0; i < r->max_closid; i++) {
+ d->cbm[i] = r->max_cbm;
+ wrmsrl(r->msr_base + i, d->cbm[i]);
+ }
+ list_add_tail(&d->list, l);
+ r->num_domains++;
+ return;
+
+found:
+ if (add) {
+ cpumask_set_cpu(cpu, &d->cpu_mask);
+ } else {
+ cpumask_clear_cpu(cpu, &d->cpu_mask);
+ if (cpumask_empty(&d->cpu_mask)) {
+ r->num_domains--;
+ kfree(d->cbm);
+ list_del(&d->list);
+ kfree(d);
+ }
+ }
+}
+
+static void rdt_reset_pqr_assoc_closid(void *v)
+{
+ struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
+
+ state->closid = 0;
+ wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, 0);
+}
+
+static int intel_rdt_online_cpu(unsigned int cpu)
+{
+ struct rdt_resource *r;
+
+ mutex_lock(&rdtgroup_mutex);
+ for_each_rdt_resource(r)
+ update_domain(cpu, r, 1);
+ smp_call_function_single(cpu, rdt_reset_pqr_assoc_closid, NULL, 1);
+ mutex_unlock(&rdtgroup_mutex);
+
+ return 0;
+}
+
+static int intel_rdt_offline_cpu(unsigned int cpu)
+{
+ struct rdt_resource *r;
+
+ mutex_lock(&rdtgroup_mutex);
+ for_each_rdt_resource(r)
+ update_domain(cpu, r, 0);
+ mutex_unlock(&rdtgroup_mutex);
+
+ return 0;
+}
+
static int __init intel_rdt_late_init(void)
{
struct cpuinfo_x86 *c = &boot_cpu_data;
struct rdt_resource *r;
+ int ret;

if (!get_rdt_resources(c))
return -ENODEV;

+ ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
+ "AP_INTEL_RDT_ONLINE",
+ intel_rdt_online_cpu, intel_rdt_offline_cpu);
+ if (ret < 0)
+ return ret;
+
for_each_rdt_resource(r)
rdt_max_closid = max(rdt_max_closid, r->max_closid);

--
2.5.0

2016-10-07 23:45:42

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 07/18] x86/intel_rdt: Add Haswell feature discovery

From: Fenghua Yu <[email protected]>

Some Haswell generation CPUs support RDT, but they don't enumerate this
using CPUID. Use rdmsr_safe() and wrmsr_safe() to probe the MSRs on
cpu model 63 (INTEL_FAM6_HASWELL_X)

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/events/intel/cqm.c | 2 +-
arch/x86/include/asm/intel_rdt_common.h | 6 ++++++
arch/x86/kernel/cpu/intel_rdt.c | 38 +++++++++++++++++++++++++++++++++
3 files changed, 45 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/include/asm/intel_rdt_common.h

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index 8f82b02..df86874 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -7,9 +7,9 @@
#include <linux/perf_event.h>
#include <linux/slab.h>
#include <asm/cpu_device_id.h>
+#include <asm/intel_rdt_common.h>
#include "../perf_event.h"

-#define MSR_IA32_PQR_ASSOC 0x0c8f
#define MSR_IA32_QM_CTR 0x0c8e
#define MSR_IA32_QM_EVTSEL 0x0c8d

diff --git a/arch/x86/include/asm/intel_rdt_common.h b/arch/x86/include/asm/intel_rdt_common.h
new file mode 100644
index 0000000..e6e15cf
--- /dev/null
+++ b/arch/x86/include/asm/intel_rdt_common.h
@@ -0,0 +1,6 @@
+#ifndef _ASM_X86_INTEL_RDT_COMMON_H
+#define _ASM_X86_INTEL_RDT_COMMON_H
+
+#define MSR_IA32_PQR_ASSOC 0x0c8f
+
+#endif /* _ASM_X86_INTEL_RDT_COMMON_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index ebe8dae..bc7f10b 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -26,11 +26,49 @@

#include <linux/slab.h>
#include <linux/err.h>
+#include <asm/intel_rdt_common.h>
+#include <asm/intel-family.h>
+
+/*
+ * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
+ * as it does not have CPUID enumeration support for Cache allocation.
+ *
+ * Probes by writing to the high 32 bits(CLOSid) of the IA32_PQR_MSR and
+ * testing if the bits stick. Max CLOSids is always 4 and max cbm length
+ * is always 20 on hsw server parts. The minimum cache bitmask length
+ * allowed for HSW server is always 2 bits. Hardcode all of them.
+ */
+static inline bool cache_alloc_hsw_probe(void)
+{
+ u32 l, h_old, h_new, h_tmp;
+
+ if (rdmsr_safe(MSR_IA32_PQR_ASSOC, &l, &h_old))
+ return false;
+
+ /*
+ * Default value is always 0 if feature is present.
+ */
+ h_tmp = h_old ^ 0x1U;
+ if (wrmsr_safe(MSR_IA32_PQR_ASSOC, l, h_tmp))
+ return false;
+ rdmsr(MSR_IA32_PQR_ASSOC, l, h_new);
+
+ if (h_tmp != h_new)
+ return false;
+
+ wrmsr(MSR_IA32_PQR_ASSOC, l, h_old);
+
+ return true;
+}

static inline bool get_rdt_resources(struct cpuinfo_x86 *c)
{
bool ret = false;

+ if (c->x86_vendor == X86_VENDOR_INTEL && c->x86 == 6 &&
+ c->x86_model == INTEL_FAM6_HASWELL_X)
+ return cache_alloc_hsw_probe();
+
if (!cpu_has(c, X86_FEATURE_RDT_A))
return false;
if (cpu_has(c, X86_FEATURE_CAT_L3))
--
2.5.0

2016-10-07 23:45:54

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 17/18] x86/intel_rdt: Add scheduler hook

From: Fenghua Yu <[email protected]>

Hook the x86 scheduler code to update closid based on whether the current
task is assigned to a specific closid or running on a CPU assigned to a
specific closid.

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/include/asm/intel_rdt.h | 41 ++++++++++++++++++++++++++++++++
arch/x86/kernel/cpu/intel_rdt.c | 1 -
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 3 +++
arch/x86/kernel/process_32.c | 4 ++++
arch/x86/kernel/process_64.c | 4 ++++
5 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index c052652..aefa3a6 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -1,6 +1,9 @@
#ifndef _ASM_X86_INTEL_RDT_H
#define _ASM_X86_INTEL_RDT_H

+#ifdef CONFIG_INTEL_RDT
+#include <asm/intel_rdt_common.h>
+
/**
* struct rdtgroup - store rdtgroup's data in resctrl file system.
* @kn: kernfs node
@@ -161,4 +164,42 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off);
int rdtgroup_schemata_show(struct kernfs_open_file *of,
struct seq_file *s, void *v);
+
+/*
+ * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
+ *
+ * Following considerations are made so that this has minimal impact
+ * on scheduler hot path:
+ * - This will stay as no-op unless we are running on an Intel SKU
+ * which supports resource control and we enable by mounting the
+ * resctrl file system.
+ * - Caches the per cpu CLOSid values and does the MSR write only
+ * when a task with a different CLOSid is scheduled in.
+ */
+static inline void intel_rdt_sched_in(void)
+{
+ if (static_branch_likely(&rdt_enable_key)) {
+ struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
+ int closid;
+
+ /*
+ * If this task has a closid assigned, use it.
+ * Else use the closid assigned to this cpu.
+ */
+ closid = current->closid;
+ if (closid == 0)
+ closid = this_cpu_read(cpu_closid);
+
+ if (closid != state->closid) {
+ state->closid = closid;
+ wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, closid);
+ }
+ }
+}
+
+#else
+
+static inline void intel_rdt_sched_in(void) {}
+
+#endif /* CONFIG_INTEL_RDT */
#endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 0b076d0..16a982e7 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -29,7 +29,6 @@
#include <linux/err.h>
#include <linux/cpuhotplug.h>

-#include <asm/intel_rdt_common.h>
#include <asm/intel-family.h>
#include <asm/intel_rdt.h>

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 4740b7e..b8c6e6e 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -282,6 +282,9 @@ static void move_myself(struct callback_head *head)
kfree(rdtgrp);
}

+ /* update PQR_ASSOC MSR to make resource group go into effect */
+ intel_rdt_sched_in();
+
kfree(callback);
}

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index d86be29..5495527 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -54,6 +54,7 @@
#include <asm/debugreg.h>
#include <asm/switch_to.h>
#include <asm/vm86.h>
+#include <asm/intel_rdt.h>

asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
asmlinkage void ret_from_kernel_thread(void) __asm__("ret_from_kernel_thread");
@@ -314,5 +315,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)

this_cpu_write(current_task, next_p);

+ /* Load the Intel cache allocation PQR MSR. */
+ intel_rdt_sched_in();
+
return prev_p;
}
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 63236d8..cdea7e3 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -49,6 +49,7 @@
#include <asm/debugreg.h>
#include <asm/switch_to.h>
#include <asm/xen/hypervisor.h>
+#include <asm/intel_rdt.h>

asmlinkage extern void ret_from_fork(void);

@@ -472,6 +473,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
loadsegment(ss, __KERNEL_DS);
}

+ /* Load the Intel cache allocation PQR MSR. */
+ intel_rdt_sched_in();
+
return prev_p;
}

--
2.5.0

2016-10-07 23:46:06

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 18/18] MAINTAINERS: Add maintainer for Intel RDT resource allocation

From: Fenghua Yu <[email protected]>

We create five new files for Intel RDT resource allocation:
arch/x86/kernel/cpu/intel_rdt.c
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
arch/x86/kernel/cpu/intel_rdt_schemata.c
arch/x86/include/asm/intel_rdt.h
Documentation/x86/intel_rdt_ui.txt

Fenghua Yu will maintain this code.

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
MAINTAINERS | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f593300..bcb75e9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9845,6 +9845,14 @@ L: [email protected]
S: Supported
F: drivers/infiniband/sw/rdmavt

+RDT - RESOURCE ALLOCATION
+M: Fenghua Yu <[email protected]>
+L: [email protected]
+S: Supported
+F: arch/x86/kernel/cpu/intel_rdt*
+F: arch/x86/include/asm/intel_rdt*
+F: Documentation/x86/intel_rdt*
+
READ-COPY UPDATE (RCU)
M: "Paul E. McKenney" <[email protected]>
M: Josh Triplett <[email protected]>
--
2.5.0

2016-10-07 23:46:18

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 15/18] x86/intel_rdt: Add tasks files

From: Fenghua Yu <[email protected]>

The root directory all subdirectories are automatically populated
with a read/write (mode 0644) file named "tasks". When read it will
show all the task IDs assigned to the resource group. Tasks can be
added (one at a time) to a group by writing the task ID to the file.
E.g.

Membership in a resource group is indicated by a new field in the
task_struct "int closid" which holds the CLOSID for each task. The
default resource group uses CLOSID=0 which means that all existing
tasks when the resctrl file system is mounted belong to the default
group.

A resource group cannot be removed while there are tasks assigned
to it.

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 173 +++++++++++++++++++++++++++++++
include/linux/sched.h | 3 +
2 files changed, 176 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 2a4e78a..e9e834b 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -27,6 +27,7 @@
#include <linux/seq_file.h>
#include <linux/sched.h>
#include <linux/slab.h>
+#include <linux/task_work.h>

#include <uapi/linux/magic.h>

@@ -261,6 +262,152 @@ unlock:
return ret ?: nbytes;
}

+struct task_move_callback {
+ struct callback_head work;
+ struct rdtgroup *rdtgrp;
+};
+
+static void move_myself(struct callback_head *head)
+{
+ struct task_move_callback *callback;
+ struct rdtgroup *rdtgrp;
+
+ callback = container_of(head, struct task_move_callback, work);
+ rdtgrp = callback->rdtgrp;
+
+ /* Resource group might have been deleted before process runs */
+ if (atomic_dec_and_test(&rdtgrp->waitcount) &&
+ (rdtgrp->flags & RDT_DELETED)) {
+ current->closid = 0;
+ kfree(rdtgrp);
+ }
+
+ kfree(callback);
+}
+
+static int __rdtgroup_move_task(struct task_struct *tsk,
+ struct rdtgroup *rdtgrp)
+{
+ struct task_move_callback *callback;
+ int ret;
+
+ callback = kzalloc(sizeof(*callback), GFP_KERNEL);
+ if (!callback)
+ return -ENOMEM;
+ callback->work.func = move_myself;
+ callback->rdtgrp = rdtgrp;
+ atomic_inc(&rdtgrp->waitcount);
+ ret = task_work_add(tsk, &callback->work, true);
+ if (ret) {
+ atomic_dec(&rdtgrp->waitcount);
+ kfree(callback);
+ } else {
+ tsk->closid = rdtgrp->closid;
+ }
+ return ret;
+}
+
+static int rdtgroup_task_write_permission(struct task_struct *task,
+ struct kernfs_open_file *of)
+{
+ const struct cred *cred = current_cred();
+ const struct cred *tcred = get_task_cred(task);
+ int ret = 0;
+
+ /*
+ * even if we're attaching all tasks in the thread group, we only
+ * need to check permissions on one of them.
+ */
+ if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
+ !uid_eq(cred->euid, tcred->uid) &&
+ !uid_eq(cred->euid, tcred->suid))
+ ret = -EPERM;
+
+ put_cred(tcred);
+ return ret;
+}
+
+static int rdtgroup_move_task(pid_t pid, struct rdtgroup *rdtgrp,
+ struct kernfs_open_file *of)
+{
+ struct task_struct *tsk;
+ int ret;
+
+ rcu_read_lock();
+ if (pid) {
+ tsk = find_task_by_vpid(pid);
+ if (!tsk) {
+ ret = -ESRCH;
+ goto out_unlock_rcu;
+ }
+ } else {
+ tsk = current;
+ }
+
+ get_task_struct(tsk);
+ rcu_read_unlock();
+
+ ret = rdtgroup_task_write_permission(tsk, of);
+ if (!ret)
+ ret = __rdtgroup_move_task(tsk, rdtgrp);
+
+ put_task_struct(tsk);
+ return ret;
+
+out_unlock_rcu:
+ rcu_read_unlock();
+ return ret;
+}
+
+static ssize_t rdtgroup_tasks_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off)
+{
+ struct rdtgroup *rdtgrp;
+ pid_t pid;
+ int ret = 0;
+
+ if (kstrtoint(strstrip(buf), 0, &pid) || pid < 0)
+ return -EINVAL;
+ rdtgrp = rdtgroup_kn_lock_live(of->kn);
+
+ if (rdtgrp)
+ ret = rdtgroup_move_task(pid, rdtgrp, of);
+ else
+ ret = -ENOENT;
+
+ rdtgroup_kn_unlock(of->kn);
+
+ return ret ?: nbytes;
+}
+
+static void show_rdt_tasks(struct rdtgroup *r, struct seq_file *s)
+{
+ struct task_struct *p;
+
+ rcu_read_lock();
+ for_each_process(p) {
+ if (p->closid == r->closid)
+ seq_printf(s, "%d\n", p->pid);
+ }
+ rcu_read_unlock();
+}
+
+static int rdtgroup_tasks_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdtgroup *rdtgrp;
+ int ret = 0;
+
+ rdtgrp = rdtgroup_kn_lock_live(of->kn);
+ if (rdtgrp)
+ show_rdt_tasks(rdtgrp, s);
+ else
+ ret = -ENOENT;
+ rdtgroup_kn_unlock(of->kn);
+
+ return ret;
+}
+
/* Files in each rdtgroup */
static struct rftype rdtgroup_base_files[] = {
{
@@ -271,6 +418,13 @@ static struct rftype rdtgroup_base_files[] = {
.seq_show = rdtgroup_cpus_show,
},
{
+ .name = "tasks",
+ .mode = 0644,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .write = rdtgroup_tasks_write,
+ .seq_show = rdtgroup_tasks_show,
+ },
+ {
/* NULL terminated */
}
};
@@ -532,6 +686,13 @@ static void rmdir_all_sub(void)
{
struct rdtgroup *rdtgrp;
struct list_head *l, *next;
+ struct task_struct *p;
+
+ /* move all tasks to default resource group */
+ read_lock(&tasklist_lock);
+ for_each_process(p)
+ p->closid = 0;
+ read_unlock(&tasklist_lock);

list_for_each_safe(l, next, &rdt_all_groups) {
rdtgrp = list_entry(l, struct rdtgroup, rdtgroup_list);
@@ -647,6 +808,7 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
{
struct rdtgroup *rdtgrp;
int ret = 0;
+ struct task_struct *p;

rdtgrp = rdtgroup_kn_lock_live(kn);
if (!rdtgrp) {
@@ -664,6 +826,17 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
return -EPERM;
}

+ /* Don't allow if there are processes in this group */
+ read_lock(&tasklist_lock);
+ for_each_process(p) {
+ if (p->closid == rdtgrp->closid) {
+ read_unlock(&tasklist_lock);
+ rdtgroup_kn_unlock(kn);
+ return -EBUSY;
+ }
+ }
+ read_unlock(&tasklist_lock);
+
/* Give any CPUs back to the default group */
cpumask_or(&rdtgroup_default.cpu_mask,
&rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 62c68e5..a0f3eb7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1766,6 +1766,9 @@ struct task_struct {
/* cg_list protected by css_set_lock and tsk->alloc_lock */
struct list_head cg_list;
#endif
+#ifdef CONFIG_INTEL_RDT
+ int closid;
+#endif
#ifdef CONFIG_FUTEX
struct robust_list_head __user *robust_list;
#ifdef CONFIG_COMPAT
--
2.5.0

2016-10-07 23:46:15

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 16/18] x86/intel_rdt: Add schemata file

From: Tony Luck <[email protected]>

Last of the per resource group files. Also mode 0644. This one shows
the resources available to the group. Syntax depends on whether the
"cdp" mount option was given. With code/data prioritization disabled
it is simply a list of masks for each cache domain. Initial value
allows access to all of the L3 cache on all domains. E.g. on a 2 socket
Broadwell:
L3:0=fffff;1=fffff
With CDP enabled, separate masks for data and instructions are provided:
L3:0=fffff,fffff;1=fffff,fffff

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/intel_rdt.h | 6 +
arch/x86/kernel/cpu/Makefile | 2 +-
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 7 +
arch/x86/kernel/cpu/intel_rdt_schemata.c | 260 +++++++++++++++++++++++++++++++
4 files changed, 274 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/kernel/cpu/intel_rdt_schemata.c

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 7c74272..c052652 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -67,6 +67,7 @@ struct rftype {
* @cdp_capable: Code/Data Prioritization available
* @cdp_enabled: Code/Data Prioritization enabled
* @tmp_cbms: Scratch space when updating schemata
+ * @num_cbms: Number of CBMs in tmp_cbms
* @cache_level: Which cache level defines scope of this domain
*/
struct rdt_resource {
@@ -83,6 +84,7 @@ struct rdt_resource {
bool cdp_capable;
bool cdp_enabled;
u32 *tmp_cbms;
+ int num_cbms;
int cache_level;
};

@@ -155,4 +157,8 @@ DECLARE_PER_CPU_READ_MOSTLY(int, cpu_closid);
void rdt_cbm_update(void *arg);
struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn);
void rdtgroup_kn_unlock(struct kernfs_node *kn);
+ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off);
+int rdtgroup_schemata_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v);
#endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 963c54a..4d7e0e5 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -34,7 +34,7 @@ obj-$(CONFIG_CPU_SUP_CENTAUR) += centaur.o
obj-$(CONFIG_CPU_SUP_TRANSMETA_32) += transmeta.o
obj-$(CONFIG_CPU_SUP_UMC_32) += umc.o

-obj-$(CONFIG_INTEL_RDT) += intel_rdt.o intel_rdt_rdtgroup.o
+obj-$(CONFIG_INTEL_RDT) += intel_rdt.o intel_rdt_rdtgroup.o intel_rdt_schemata.o

obj-$(CONFIG_X86_MCE) += mcheck/
obj-$(CONFIG_MTRR) += mtrr/
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index e9e834b..4740b7e 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -425,6 +425,13 @@ static struct rftype rdtgroup_base_files[] = {
.seq_show = rdtgroup_tasks_show,
},
{
+ .name = "schemata",
+ .mode = 0644,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .write = rdtgroup_schemata_write,
+ .seq_show = rdtgroup_schemata_show,
+ },
+ {
/* NULL terminated */
}
};
diff --git a/arch/x86/kernel/cpu/intel_rdt_schemata.c b/arch/x86/kernel/cpu/intel_rdt_schemata.c
new file mode 100644
index 0000000..d666854
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt_schemata.c
@@ -0,0 +1,260 @@
+/*
+ * Resource Director Technology(RDT)
+ * - Cache Allocation code.
+ *
+ * Copyright (C) 2016 Intel Corporation
+ *
+ * Authors:
+ * Fenghua Yu <[email protected]>
+ * Tony Luck <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * More information about RDT be found in the Intel (R) x86 Architecture
+ * Software Developer Manual June 2016, volume 3, section 17.17.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/kernfs.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <asm/intel_rdt.h>
+
+/*
+ * Check whether a cache bit mask is valid. The SDM says:
+ * Please note that all (and only) contiguous '1' combinations
+ * are allowed (e.g. FFFFH, 0FF0H, 003CH, etc.).
+ * Additionally Haswell requires at least two bits set.
+ */
+static bool cbm_validate(unsigned long var, struct rdt_resource *r)
+{
+ unsigned long first_bit, zero_bit;
+
+ if (var == 0 || var > r->max_cbm)
+ return false;
+
+ first_bit = find_first_bit(&var, r->cbm_len);
+ zero_bit = find_next_zero_bit(&var, r->cbm_len, first_bit);
+
+ if (find_next_bit(&var, r->cbm_len, zero_bit) < r->cbm_len)
+ return false;
+
+ if ((zero_bit - first_bit) < r->min_cbm_bits)
+ return false;
+ return true;
+}
+
+/*
+ * Read one cache bit mask (hex). Check that it is valid for the current
+ * resource type.
+ */
+static int parse_cbm_token(char *tok, struct rdt_resource *r)
+{
+ unsigned long data;
+ int ret;
+
+ ret = kstrtoul(tok, 16, &data);
+ if (ret)
+ return ret;
+ if (!cbm_validate(data, r))
+ return -EINVAL;
+ r->tmp_cbms[r->num_cbms++] = data;
+ return 0;
+}
+
+/*
+ * If code/data prioritization is enabled for this resource we need
+ * two bit masks separated by a ",". Otherwise a single bit mask.
+ */
+static int parse_cbm(char *buf, struct rdt_resource *r)
+{
+ char *cbm1 = buf;
+ int ret;
+
+ if (r->cdp_enabled)
+ cbm1 = strsep(&buf, ",");
+ if (!cbm1 || !buf)
+ return 1;
+ ret = parse_cbm_token(cbm1, r);
+ if (ret)
+ return ret;
+ if (r->cdp_enabled)
+ return parse_cbm_token(buf, r);
+ return 0;
+}
+
+/*
+ * For each domain in this resource we expect to find a series of:
+ * id=mask[,mask]
+ * separated by ";". The "id" is in decimal, and must appear in the
+ * right order.
+ */
+static int parse_line(char *line, struct rdt_resource *r)
+{
+ struct list_head *l;
+ struct rdt_domain *d;
+ char *dom = NULL, *id;
+ unsigned long dom_id;
+
+ list_for_each(l, &r->domains) {
+ d = list_entry(l, struct rdt_domain, list);
+ dom = strsep(&line, ";");
+ if (!dom)
+ return -EINVAL;
+ id = strsep(&dom, "=");
+ if (kstrtoul(id, 10, &dom_id) || dom_id != d->id)
+ return -EINVAL;
+ if (parse_cbm(dom, r))
+ return -EINVAL;
+ }
+
+ /* Any garbage at the end of the line? */
+ if (line && line[0])
+ return -EINVAL;
+ return 0;
+}
+
+static void update_domains(struct rdt_resource *r, int closid)
+{
+ int idx = 0;
+ struct list_head *l;
+ struct rdt_domain *d;
+ struct msr_param msr_param;
+ struct cpumask cpu_mask;
+
+ cpumask_clear(&cpu_mask);
+ msr_param.low = closid << r->cdp_enabled;
+ msr_param.high = msr_param.low + 1 + r->cdp_enabled;
+ msr_param.res = r;
+
+ list_for_each(l, &r->domains) {
+ d = list_entry(l, struct rdt_domain, list);
+ cpumask_set_cpu(cpumask_any(&d->cpu_mask), &cpu_mask);
+ d->cbm[msr_param.low] = r->tmp_cbms[idx++];
+ if (r->cdp_enabled)
+ d->cbm[msr_param.low + 1] = r->tmp_cbms[idx++];
+ }
+ smp_call_function_many(&cpu_mask, rdt_cbm_update, &msr_param, 1);
+}
+
+ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off)
+{
+ char *tok, *resname;
+ struct rdtgroup *rdtgrp;
+ struct rdt_resource *r;
+ int closid, ret = 0;
+ u32 *l3_cbms = NULL;
+
+ /* Legal input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n')
+ return -EINVAL;
+ buf[nbytes - 1] = '\0';
+
+ rdtgrp = rdtgroup_kn_lock_live(of->kn);
+ if (!rdtgrp) {
+ rdtgroup_kn_unlock(of->kn);
+ return -ENOENT;
+ }
+
+ closid = rdtgrp->closid;
+
+ /* get scratch space to save all the masks while we validate input */
+ for_each_rdt_resource(r) {
+ r->tmp_cbms = kcalloc(r->num_domains << r->cdp_enabled,
+ sizeof(*l3_cbms), GFP_KERNEL);
+ if (!r->tmp_cbms) {
+ ret = -ENOMEM;
+ goto fail;
+ }
+ r->num_cbms = 0;
+ }
+
+ while ((tok = strsep(&buf, "\n")) != NULL) {
+ resname = strsep(&tok, ":");
+ if (!tok) {
+ ret = -EINVAL;
+ goto fail;
+ }
+ for_each_rdt_resource(r) {
+ if (!strcmp(resname, r->name) &&
+ closid < r->num_closid) {
+ ret = parse_line(tok, r);
+ if (ret)
+ goto fail;
+ break;
+ }
+ }
+ if (!r->name) {
+ ret = -EINVAL;
+ goto fail;
+ }
+ }
+
+ /* Did the parser find all the masks we need? */
+ for_each_rdt_resource(r) {
+ if (r->num_cbms != r->num_domains << r->cdp_enabled) {
+ ret = -EINVAL;
+ goto fail;
+ }
+ }
+
+ for_each_rdt_resource(r)
+ update_domains(r, closid);
+
+fail:
+ rdtgroup_kn_unlock(of->kn);
+ for_each_rdt_resource(r) {
+ kfree(r->tmp_cbms);
+ r->tmp_cbms = NULL;
+ }
+ return ret ?: nbytes;
+}
+
+static void show_doms(struct seq_file *s, struct rdt_resource *r, int closid)
+{
+ struct list_head *l;
+ struct rdt_domain *dom;
+ int idx = closid << r->cdp_enabled;
+ bool sep = false;
+
+ seq_printf(s, "%s:", r->name);
+ list_for_each(l, &r->domains) {
+ dom = list_entry(l, struct rdt_domain, list);
+ if (sep)
+ seq_puts(s, ";");
+ seq_printf(s, "%d=%x", dom->id, dom->cbm[idx]);
+ if (r->cdp_enabled)
+ seq_printf(s, ",%x", dom->cbm[idx + 1]);
+ sep = true;
+ }
+ seq_puts(s, "\n");
+}
+
+int rdtgroup_schemata_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdtgroup *rdtgrp;
+ struct rdt_resource *r;
+ int closid, ret = 0;
+
+ rdtgrp = rdtgroup_kn_lock_live(of->kn);
+ if (rdtgrp) {
+ closid = rdtgrp->closid;
+ for_each_rdt_resource(r)
+ if (closid < r->num_closid)
+ show_doms(s, r, closid);
+ } else {
+ ret = -ENOENT;
+ }
+ rdtgroup_kn_unlock(of->kn);
+ return ret;
+}
--
2.5.0

2016-10-07 23:47:06

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 14/18] x86/intel_rdt: Add cpus file

From: Tony Luck <[email protected]>

Now we populate each directory with a read/write (mode 0644) file
named "cpus". This is used to over-ride the resources available
to processes in the default resource group when running on specific
CPUs. Each "cpus" file reads as a cpumask showing which CPUs belong
to this resource group. Initially all online CPUs are assigned to
the default group. They can be added to other groups by writing a
cpumask to the "cpus" file in the directory for the resource group
(which will remove them from the previous group to which they were
assigned). CPU online/offline operations will delete CPUs that go
offline from whatever group they are in and add new CPUs to the
default group.

If there are CPUs assigned to a group when the directory is removed,
they are returned to the default group.

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/include/asm/intel_rdt.h | 5 ++
arch/x86/kernel/cpu/intel_rdt.c | 12 +++
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 127 +++++++++++++++++++++++++++++++
3 files changed, 144 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 99d8d14..7c74272 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -6,13 +6,16 @@
* @kn: kernfs node
* @rdtgroup_list: linked list for all rdtgroups
* @closid: closid for this rdtgroup
+ * @cpu_mask: CPUs assigned to this rdtgroup
* @flags: status bits
* @waitcount: how many cpus expect to find this
+ * group when they acquire rdtgroup_mutex
*/
struct rdtgroup {
struct kernfs_node *kn;
struct list_head rdtgroup_list;
int closid;
+ struct cpumask cpu_mask;
int flags;
atomic_t waitcount;
};
@@ -147,6 +150,8 @@ union cpuid_0x10_1_edx {
unsigned int full;
};

+DECLARE_PER_CPU_READ_MOSTLY(int, cpu_closid);
+
void rdt_cbm_update(void *arg);
struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn);
void rdtgroup_kn_unlock(struct kernfs_node *kn);
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index f6caabd..0b076d0 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -38,6 +38,8 @@ DEFINE_MUTEX(rdtgroup_mutex);

int rdt_max_closid;

+DEFINE_PER_CPU_READ_MOSTLY(int, cpu_closid);
+
#define domain_init(name) LIST_HEAD_INIT(rdt_resources_all[name].domains)

struct rdt_resource rdt_resources_all[] = {
@@ -227,8 +229,11 @@ static int intel_rdt_online_cpu(unsigned int cpu)
struct rdt_resource *r;

mutex_lock(&rdtgroup_mutex);
+ per_cpu(cpu_closid, cpu) = 0;
for_each_rdt_resource(r)
update_domain(cpu, r, 1);
+ /* The cpu is set in default rdtgroup after online. */
+ cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
smp_call_function_single(cpu, rdt_reset_pqr_assoc_closid, NULL, 1);
mutex_unlock(&rdtgroup_mutex);

@@ -238,10 +243,17 @@ static int intel_rdt_online_cpu(unsigned int cpu)
static int intel_rdt_offline_cpu(unsigned int cpu)
{
struct rdt_resource *r;
+ struct rdtgroup *rdtgrp;
+ struct list_head *l;

mutex_lock(&rdtgroup_mutex);
for_each_rdt_resource(r)
update_domain(cpu, r, 0);
+ list_for_each(l, &rdt_all_groups) {
+ rdtgrp = list_entry(l, struct rdtgroup, rdtgroup_list);
+ if (cpumask_test_and_clear_cpu(cpu, &rdtgrp->cpu_mask))
+ break;
+ }
mutex_unlock(&rdtgroup_mutex);

return 0;
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index efcbfe7..2a4e78a 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -20,6 +20,7 @@

#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

+#include <linux/cpu.h>
#include <linux/fs.h>
#include <linux/sysfs.h>
#include <linux/kernfs.h>
@@ -163,6 +164,117 @@ static struct kernfs_ops rdtgroup_kf_single_ops = {
.seq_show = rdtgroup_seqfile_show,
};

+static int rdtgroup_cpus_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdtgroup *rdtgrp;
+ int ret = 0;
+
+ rdtgrp = rdtgroup_kn_lock_live(of->kn);
+
+ if (rdtgrp)
+ seq_printf(s, "%*pb\n", cpumask_pr_args(&rdtgrp->cpu_mask));
+ else
+ ret = -ENOENT;
+ rdtgroup_kn_unlock(of->kn);
+
+ return ret;
+}
+
+static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off)
+{
+ struct rdtgroup *rdtgrp, *r;
+ cpumask_var_t tmpmask, newmask;
+ int ret, cpu;
+
+ if (!buf)
+ return -EINVAL;
+
+ if (!alloc_cpumask_var(&tmpmask, GFP_KERNEL))
+ return -ENOMEM;
+ if (!alloc_cpumask_var(&newmask, GFP_KERNEL)) {
+ free_cpumask_var(tmpmask);
+ return -ENOMEM;
+ }
+ rdtgrp = rdtgroup_kn_lock_live(of->kn);
+ if (!rdtgrp) {
+ ret = -ENOENT;
+ goto unlock;
+ }
+
+ ret = cpumask_parse(buf, newmask);
+ if (ret)
+ goto unlock;
+
+ get_online_cpus();
+ /* check that user didn't specify any offline cpus */
+ cpumask_andnot(tmpmask, newmask, cpu_online_mask);
+ if (cpumask_weight(tmpmask)) {
+ ret = -EINVAL;
+ goto end;
+ }
+
+ /* Are trying to drop some cpus from this group? */
+ cpumask_andnot(tmpmask, &rdtgrp->cpu_mask, newmask);
+ if (cpumask_weight(tmpmask)) {
+ /* Can't drop from default group */
+ if (rdtgrp == &rdtgroup_default) {
+ ret = -EINVAL;
+ goto end;
+ }
+ /* Give any dropped cpus to rdtgroup_default */
+ cpumask_or(&rdtgroup_default.cpu_mask,
+ &rdtgroup_default.cpu_mask, tmpmask);
+ for_each_cpu(cpu, tmpmask)
+ per_cpu(cpu_closid, cpu) = 0;
+ }
+
+ /*
+ * If we added cpus, remove them from previous group that owned them
+ * and update per-cpu rdtgroup pointers to refer to us
+ */
+ cpumask_andnot(tmpmask, newmask, &rdtgrp->cpu_mask);
+ if (cpumask_weight(tmpmask)) {
+ struct list_head *l;
+
+ list_for_each(l, &rdt_all_groups) {
+ r = list_entry(l, struct rdtgroup, rdtgroup_list);
+ if (r == rdtgrp)
+ continue;
+ cpumask_andnot(&r->cpu_mask, &r->cpu_mask, tmpmask);
+ }
+ for_each_cpu(cpu, tmpmask)
+ per_cpu(cpu_closid, cpu) = rdtgrp->closid;
+ }
+
+ /* Done pushing/pulling - update this group with new mask */
+ cpumask_copy(&rdtgrp->cpu_mask, newmask);
+
+end:
+ put_online_cpus();
+unlock:
+ rdtgroup_kn_unlock(of->kn);
+ free_cpumask_var(tmpmask);
+ free_cpumask_var(newmask);
+
+ return ret ?: nbytes;
+}
+
+/* Files in each rdtgroup */
+static struct rftype rdtgroup_base_files[] = {
+ {
+ .name = "cpus",
+ .mode = 0644,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .write = rdtgroup_cpus_write,
+ .seq_show = rdtgroup_cpus_show,
+ },
+ {
+ /* NULL terminated */
+ }
+};
+
static int rdt_num_closid_show(struct kernfs_open_file *of,
struct seq_file *seq, void *v)
{
@@ -511,6 +623,10 @@ static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
if (ret)
goto out_destroy;

+ ret = rdtgroup_add_files(kn, rdtgroup_base_files);
+ if (ret)
+ goto out_destroy;
+
kernfs_activate(kn);

ret = 0;
@@ -548,6 +664,10 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
return -EPERM;
}

+ /* Give any CPUs back to the default group */
+ cpumask_or(&rdtgroup_default.cpu_mask,
+ &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
+
rdtgrp->flags = RDT_DELETED;
closid_free(rdtgrp->closid);
list_del(&rdtgrp->rdtgroup_list);
@@ -584,11 +704,18 @@ static int __init rdtgroup_setup_root(void)
rdtgroup_default.closid = 0;
list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);

+ ret = rdtgroup_add_files(rdt_root->kn, rdtgroup_base_files);
+ if (ret) {
+ kernfs_destroy_root(rdt_root);
+ goto out;
+ }
+
rdtgroup_default.kn = rdt_root->kn;
ret = rdtgroup_create_info_dir(rdtgroup_default.kn);
if (!ret)
kernfs_activate(rdtgroup_default.kn);

+out:
mutex_unlock(&rdtgroup_mutex);

return ret;
--
2.5.0

2016-10-07 23:47:16

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 13/18] x86/intel_rdt: Add mkdir to resctrl file system

From: Fenghua Yu <[email protected]>

Resource control groups are represented as directories in the resctrl
file system. The root directory describes the default resources available
to tasks that have not been assigned specific resources. Other directories
can be created at the root level to make new resource groups. It is not
permitted to make directories within other directories.

Hardware uses a CLOSID (Class of service ID) to determine which resource
limits are currently in effect. The exact number available is enumerated
by CPUID leaf 0x10, but on current implementations it is a small number.
We implement a simple bitmask allocator for CLOSIDs.

Each resource control group uses one CLOSID, which limits the total number
of directories that can be created.

Resource groups can be removed using rmdir.

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/include/asm/intel_rdt.h | 9 ++
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 222 +++++++++++++++++++++++++++++++
2 files changed, 231 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 8e3fafc..99d8d14 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -6,13 +6,20 @@
* @kn: kernfs node
* @rdtgroup_list: linked list for all rdtgroups
* @closid: closid for this rdtgroup
+ * @flags: status bits
+ * @waitcount: how many cpus expect to find this
*/
struct rdtgroup {
struct kernfs_node *kn;
struct list_head rdtgroup_list;
int closid;
+ int flags;
+ atomic_t waitcount;
};

+/* rdtgroup.flags */
+#define RDT_DELETED 1
+
/* List of all resource groups */
extern struct list_head rdt_all_groups;

@@ -141,4 +148,6 @@ union cpuid_0x10_1_edx {
};

void rdt_cbm_update(void *arg);
+struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn);
+void rdtgroup_kn_unlock(struct kernfs_node *kn);
#endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 28df92e..efcbfe7 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -36,6 +36,52 @@ struct kernfs_root *rdt_root;
struct rdtgroup rdtgroup_default;
LIST_HEAD(rdt_all_groups);

+/*
+ * Trivial allocator for CLOSIDs. Since h/w only supports
+ * a small number, we can keep a bitmap of free CLOSIDs in
+ * a single integer.
+ */
+static int closid_free_map;
+
+static void closid_init(void)
+{
+ closid_free_map = BIT_MASK(rdt_max_closid) - 1;
+
+ /* CLOSID 0 is always reserved for the default group */
+ closid_free_map &= ~1;
+}
+
+int closid_alloc(void)
+{
+ int closid = ffs(closid_free_map);
+
+ if (closid == 0)
+ return -ENOSPC;
+ closid--;
+ closid_free_map &= ~(1 << closid);
+
+ return closid;
+}
+
+static void closid_free(int closid)
+{
+ closid_free_map |= 1 << closid;
+}
+
+static struct rdtgroup *rdtgroup_alloc(void)
+{
+ struct rdtgroup *rdtgrp;
+
+ rdtgrp = kzalloc(sizeof(*rdtgrp), GFP_KERNEL);
+
+ return rdtgrp;
+}
+
+static void rdtgroup_free(struct rdtgroup *rdtgroup)
+{
+ kfree(rdtgroup);
+}
+
/* set uid and gid of rdtgroup dirs and files to that of the creator */
static int rdtgroup_kn_set_ugid(struct kernfs_node *kn)
{
@@ -240,6 +286,54 @@ static int parse_rdtgroupfs_options(char *data, struct rdt_resource *r)
return 0;
}

+/*
+ * We don't allow rdtgroup directories to be created anywhere
+ * except the root directory. Thus when looking for the rdtgroup
+ * structure for a kernfs node we are either looking at a directory,
+ * in which case the rdtgroup structure is pointed at by the "priv"
+ * field, otherwise we have a file, and need only look to the parent
+ * to find the rdtgroup.
+ */
+static struct rdtgroup *kernfs_to_rdtgroup(struct kernfs_node *kn)
+{
+ if (kernfs_type(kn) == KERNFS_DIR)
+ return kn->priv;
+ else
+ return kn->parent->priv;
+}
+
+struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn)
+{
+ struct rdtgroup *rdtgrp = kernfs_to_rdtgroup(kn);
+
+ atomic_inc(&rdtgrp->waitcount);
+ kernfs_break_active_protection(kn);
+
+ mutex_lock(&rdtgroup_mutex);
+
+ /* Was this group deleted while we waited? */
+ if (rdtgrp->flags & RDT_DELETED)
+ return NULL;
+
+ return rdtgrp;
+}
+
+void rdtgroup_kn_unlock(struct kernfs_node *kn)
+{
+ struct rdtgroup *rdtgrp = kernfs_to_rdtgroup(kn);
+
+ mutex_unlock(&rdtgroup_mutex);
+
+ if (atomic_dec_and_test(&rdtgrp->waitcount) &&
+ (rdtgrp->flags & RDT_DELETED)) {
+ kernfs_unbreak_active_protection(kn);
+ kernfs_put(kn);
+ kfree(rdtgrp);
+ } else {
+ kernfs_unbreak_active_protection(kn);
+ }
+}
+
static struct dentry *rdt_mount(struct file_system_type *fs_type,
int flags, const char *unused_dev_name,
void *data)
@@ -275,6 +369,7 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
rdt_max_closid = max(rdt_max_closid, r->num_closid);
if (rdt_max_closid > 32)
rdt_max_closid = 32;
+ closid_init();

dentry = kernfs_mount(fs_type, flags, rdt_root,
RDTGROUP_SUPER_MAGIC, &new_sb);
@@ -318,6 +413,24 @@ static void reset_all_cbms(struct rdt_resource *r)
smp_call_function_many(&cpu_mask, rdt_cbm_update, &msr_param, 1);
}

+/*
+ * Forcibly remove all of subdirectories under root.
+ */
+static void rmdir_all_sub(void)
+{
+ struct rdtgroup *rdtgrp;
+ struct list_head *l, *next;
+
+ list_for_each_safe(l, next, &rdt_all_groups) {
+ rdtgrp = list_entry(l, struct rdtgroup, rdtgroup_list);
+ if (rdtgrp == &rdtgroup_default)
+ continue;
+ kernfs_remove(rdtgrp->kn);
+ list_del(&rdtgrp->rdtgroup_list);
+ rdtgroup_free(rdtgrp);
+ }
+}
+
static void rdt_kill_sb(struct super_block *sb)
{
struct rdt_resource *r;
@@ -333,6 +446,7 @@ static void rdt_kill_sb(struct super_block *sb)
set_l3_qos_cfg(r);
}

+ rmdir_all_sub();
static_branch_disable(&rdt_enable_key);
kernfs_kill_sb(sb);
mutex_unlock(&rdtgroup_mutex);
@@ -344,7 +458,115 @@ static struct file_system_type rdt_fs_type = {
.kill_sb = rdt_kill_sb,
};

+static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
+ umode_t mode)
+{
+ struct rdtgroup *parent, *rdtgrp;
+ struct kernfs_node *kn;
+ int ret, closid;
+
+ /* Only allow mkdir in the root directory */
+ if (parent_kn != rdtgroup_default.kn)
+ return -EPERM;
+
+ /* Do not accept '\n' to avoid unparsable situation. */
+ if (strchr(name, '\n'))
+ return -EINVAL;
+
+ parent = rdtgroup_kn_lock_live(parent_kn);
+ if (!parent) {
+ ret = -ENODEV;
+ goto out_unlock;
+ }
+
+ ret = closid_alloc();
+ if (ret < 0)
+ goto out_unlock;
+ closid = ret;
+
+ /* allocate the rdtgroup. */
+ rdtgrp = rdtgroup_alloc();
+ if (!rdtgrp) {
+ ret = -ENOSPC;
+ goto out_closid_free;
+ }
+ rdtgrp->closid = closid;
+ list_add(&rdtgrp->rdtgroup_list, &rdt_all_groups);
+
+ /* kernfs creates the directory for rdtgrp */
+ kn = kernfs_create_dir(parent->kn, name, mode, rdtgrp);
+ if (IS_ERR(kn)) {
+ ret = PTR_ERR(kn);
+ goto out_cancel_ref;
+ }
+ rdtgrp->kn = kn;
+
+ /*
+ * This extra ref will be put in kernfs_remove() and guarantees
+ * that @rdtgrp->kn is always accessible.
+ */
+ kernfs_get(kn);
+
+ ret = rdtgroup_kn_set_ugid(kn);
+ if (ret)
+ goto out_destroy;
+
+ kernfs_activate(kn);
+
+ ret = 0;
+ goto out_unlock;
+
+out_destroy:
+ kernfs_remove(rdtgrp->kn);
+out_cancel_ref:
+ rdtgroup_free(rdtgrp);
+out_closid_free:
+ closid_free(closid);
+out_unlock:
+ rdtgroup_kn_unlock(parent_kn);
+ return ret;
+}
+
+static int rdtgroup_rmdir(struct kernfs_node *kn)
+{
+ struct rdtgroup *rdtgrp;
+ int ret = 0;
+
+ rdtgrp = rdtgroup_kn_lock_live(kn);
+ if (!rdtgrp) {
+ rdtgroup_kn_unlock(kn);
+ return -ENOENT;
+ }
+
+ /*
+ * rmdir is for deleting resource groups. Don't
+ * allow deletion of "info" or any of its subdirectories
+ */
+ if (!rdtgrp) {
+ mutex_unlock(&rdtgroup_mutex);
+ kernfs_unbreak_active_protection(kn);
+ return -EPERM;
+ }
+
+ rdtgrp->flags = RDT_DELETED;
+ closid_free(rdtgrp->closid);
+ list_del(&rdtgrp->rdtgroup_list);
+
+ /*
+ * one extra hold on this, will drop when we kfree(rdtgrp)
+ * in rdtgroup_kn_unlock()
+ */
+ kernfs_get(kn);
+ kernfs_remove(rdtgrp->kn);
+
+ rdtgroup_kn_unlock(kn);
+
+ return ret;
+}
+
static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops = {
+ .mkdir = rdtgroup_mkdir,
+ .rmdir = rdtgroup_rmdir,
};

static int __init rdtgroup_setup_root(void)
--
2.5.0

2016-10-07 23:47:25

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 12/18] x86/intel_rdt: Add "info" files to resctrl file system

From: Fenghua Yu <[email protected]>

For the convenience of applications we make the decoded values of some
of the CPUID values available in read-only (0444) files.

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/include/asm/intel_rdt.h | 24 +++++
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 173 ++++++++++++++++++++++++++++++-
2 files changed, 196 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index f63815c..8e3fafc 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -19,6 +19,30 @@ extern struct list_head rdt_all_groups;
int __init rdtgroup_init(void);

/**
+ * struct rftype - describe each file in the resctrl file system
+ * @name: file name
+ * @mode: access mode
+ * @kf_ops: operations
+ * @seq_show: show content of the file
+ * @write: write to the file
+ */
+struct rftype {
+ char *name;
+ umode_t mode;
+ struct kernfs_ops *kf_ops;
+
+ int (*seq_show)(struct kernfs_open_file *of,
+ struct seq_file *sf, void *v);
+ /*
+ * write() is the generic write callback which maps directly to
+ * kernfs write operation and overrides all other operations.
+ * Maximum write size is determined by ->max_write_len.
+ */
+ ssize_t (*write)(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off);
+};
+
+/**
* struct rdt_resource - attributes of an RDT resource
* @enabled: Is this feature enabled on this machine
* @name: Name to use in "schemata" file
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index c99d3a0..28df92e 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -23,6 +23,8 @@
#include <linux/fs.h>
#include <linux/sysfs.h>
#include <linux/kernfs.h>
+#include <linux/seq_file.h>
+#include <linux/sched.h>
#include <linux/slab.h>

#include <uapi/linux/magic.h>
@@ -34,6 +36,173 @@ struct kernfs_root *rdt_root;
struct rdtgroup rdtgroup_default;
LIST_HEAD(rdt_all_groups);

+/* set uid and gid of rdtgroup dirs and files to that of the creator */
+static int rdtgroup_kn_set_ugid(struct kernfs_node *kn)
+{
+ struct iattr iattr = { .ia_valid = ATTR_UID | ATTR_GID,
+ .ia_uid = current_fsuid(),
+ .ia_gid = current_fsgid(), };
+
+ if (uid_eq(iattr.ia_uid, GLOBAL_ROOT_UID) &&
+ gid_eq(iattr.ia_gid, GLOBAL_ROOT_GID))
+ return 0;
+
+ return kernfs_setattr(kn, &iattr);
+}
+
+static int rdtgroup_add_file(struct kernfs_node *parent_kn, struct rftype *rft)
+{
+ struct kernfs_node *kn;
+ int ret;
+
+ kn = __kernfs_create_file(parent_kn, rft->name, rft->mode,
+ 0, rft->kf_ops, rft, NULL, NULL);
+ if (IS_ERR(kn))
+ return PTR_ERR(kn);
+
+ ret = rdtgroup_kn_set_ugid(kn);
+ if (ret) {
+ kernfs_remove(kn);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int rdtgroup_add_files(struct kernfs_node *kn, struct rftype *rfts)
+{
+ struct rftype *rft;
+ int ret;
+
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ for (rft = rfts; rft->name; rft++) {
+ ret = rdtgroup_add_file(kn, rft);
+ if (ret)
+ goto error;
+ }
+
+ return 0;
+error:
+ pr_warn("%s: failed to add %s, err=%d\n", __func__, rft->name, ret);
+ while (--rft >= rfts)
+ kernfs_remove_by_name(kn, rft->name);
+ return ret;
+}
+
+static int rdtgroup_seqfile_show(struct seq_file *m, void *arg)
+{
+ struct kernfs_open_file *of = m->private;
+ struct rftype *rft = of->kn->priv;
+
+ if (rft->seq_show)
+ return rft->seq_show(of, m, arg);
+ return 0;
+}
+
+static ssize_t rdtgroup_file_write(struct kernfs_open_file *of, char *buf,
+ size_t nbytes, loff_t off)
+{
+ struct rftype *rft = of->kn->priv;
+
+ if (rft->write)
+ return rft->write(of, buf, nbytes, off);
+
+ return -EINVAL;
+}
+
+static struct kernfs_ops rdtgroup_kf_single_ops = {
+ .atomic_write_len = PAGE_SIZE,
+ .write = rdtgroup_file_write,
+ .seq_show = rdtgroup_seqfile_show,
+};
+
+static int rdt_num_closid_show(struct kernfs_open_file *of,
+ struct seq_file *seq, void *v)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+
+ seq_printf(seq, "%d\n", r->num_closid);
+
+ return 0;
+}
+
+static int rdt_cbm_val_show(struct kernfs_open_file *of,
+ struct seq_file *seq, void *v)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+
+ seq_printf(seq, "%x\n", r->max_cbm);
+
+ return 0;
+}
+
+/* rdtgroup information files for one cache resource. */
+static struct rftype res_info_files[] = {
+ {
+ .name = "num_closid",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = rdt_num_closid_show,
+ },
+ {
+ .name = "cbm_val",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = rdt_cbm_val_show,
+ },
+ {
+ /* NULL terminated */
+ }
+};
+
+static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
+{
+ struct kernfs_node *kn, *kn_subdir;
+ struct rdt_resource *r;
+ int ret;
+
+ /* create the directory */
+ kn = kernfs_create_dir(parent_kn, "info", parent_kn->mode, NULL);
+ if (IS_ERR(kn))
+ return PTR_ERR(kn);
+ kernfs_get(kn);
+
+ for_each_rdt_resource(r) {
+ kn_subdir = kernfs_create_dir(kn, r->name, kn->mode, r);
+ if (IS_ERR(kn_subdir)) {
+ ret = PTR_ERR(kn_subdir);
+ goto out_destroy;
+ }
+ kernfs_get(kn_subdir);
+ ret = rdtgroup_kn_set_ugid(kn_subdir);
+ if (ret)
+ goto out_destroy;
+ ret = rdtgroup_add_files(kn_subdir, res_info_files);
+ if (ret)
+ goto out_destroy;
+ kernfs_activate(kn_subdir);
+ }
+
+ /*
+ * This extra ref will be put in kernfs_remove() and guarantees
+ * that @rdtgrp->kn is always accessible.
+ */
+ kernfs_get(kn);
+
+ ret = rdtgroup_kn_set_ugid(kn);
+ if (ret)
+ goto out_destroy;
+
+ kernfs_activate(kn);
+
+ return 0;
+
+out_destroy:
+ kernfs_remove(kn);
+ return ret;
+}
+
static void l3_qos_cfg_update(void *arg)
{
struct rdt_resource *r = arg;
@@ -194,7 +363,9 @@ static int __init rdtgroup_setup_root(void)
list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);

rdtgroup_default.kn = rdt_root->kn;
- kernfs_activate(rdtgroup_default.kn);
+ ret = rdtgroup_create_info_dir(rdtgroup_default.kn);
+ if (!ret)
+ kernfs_activate(rdtgroup_default.kn);

mutex_unlock(&rdtgroup_mutex);

--
2.5.0

2016-10-07 23:47:36

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 11/18] x86/intel_rdt: Add basic resctrl filesystem support

From: Fenghua Yu <[email protected]>

Use kernfs as basis for our user interface filesystem. This patch
supports mount/umount, and one mount parameter "cdp" to enable code/data
prioritization (though all we do at this point is ensure that the system
can support CDP). The file system is not populated yet in this patch.

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/include/asm/intel_rdt.h | 20 ++-
arch/x86/kernel/cpu/Makefile | 2 +-
arch/x86/kernel/cpu/intel_rdt.c | 8 ++
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 236 +++++++++++++++++++++++++++++++
include/uapi/linux/magic.h | 1 +
5 files changed, 265 insertions(+), 2 deletions(-)
create mode 100644 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index bad8dc7..f63815c 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -2,6 +2,23 @@
#define _ASM_X86_INTEL_RDT_H

/**
+ * struct rdtgroup - store rdtgroup's data in resctrl file system.
+ * @kn: kernfs node
+ * @rdtgroup_list: linked list for all rdtgroups
+ * @closid: closid for this rdtgroup
+ */
+struct rdtgroup {
+ struct kernfs_node *kn;
+ struct list_head rdtgroup_list;
+ int closid;
+};
+
+/* List of all resource groups */
+extern struct list_head rdt_all_groups;
+
+int __init rdtgroup_init(void);
+
+/**
* struct rdt_resource - attributes of an RDT resource
* @enabled: Is this feature enabled on this machine
* @name: Name to use in "schemata" file
@@ -39,6 +56,7 @@ struct rdt_resource {
for (r = rdt_resources_all; r->name; r++) \
if (r->enabled)

+#define IA32_L3_QOS_CFG 0xc81
#define IA32_L3_CBM_BASE 0xc90

/**
@@ -72,7 +90,7 @@ extern struct mutex rdtgroup_mutex;

int __init rdtgroup_init(void);

-extern struct rdtgroup *rdtgroup_default;
+extern struct rdtgroup rdtgroup_default;
extern struct rdt_resource rdt_resources_all[];

enum {
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 5a791c8..963c54a 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -34,7 +34,7 @@ obj-$(CONFIG_CPU_SUP_CENTAUR) += centaur.o
obj-$(CONFIG_CPU_SUP_TRANSMETA_32) += transmeta.o
obj-$(CONFIG_CPU_SUP_UMC_32) += umc.o

-obj-$(CONFIG_INTEL_RDT) += intel_rdt.o
+obj-$(CONFIG_INTEL_RDT) += intel_rdt.o intel_rdt_rdtgroup.o

obj-$(CONFIG_X86_MCE) += mcheck/
obj-$(CONFIG_MTRR) += mtrr/
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 76b7476..f6caabd 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -265,6 +265,14 @@ static int __init intel_rdt_late_init(void)
for_each_rdt_resource(r)
rdt_max_closid = max(rdt_max_closid, r->max_closid);

+ /* limitation of our allocator, but h/w is more limited */
+ if (rdt_max_closid > 32) {
+ pr_warn("Only using 32/%d CLOSIDs\n", rdt_max_closid);
+ rdt_max_closid = 32;
+ }
+
+ rdtgroup_init();
+
for_each_rdt_resource(r)
pr_info("Intel %s allocation %s detected\n", r->name,
r->cdp_capable ? " (with CDP)" : "");
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
new file mode 100644
index 0000000..c99d3a0
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -0,0 +1,236 @@
+/*
+ * User interface for Resource Alloction in Resource Director Technology(RDT)
+ *
+ * Copyright (C) 2016 Intel Corporation
+ *
+ * Author: Fenghua Yu <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * More information about RDT be found in the Intel (R) x86 Architecture
+ * Software Developer Manual.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/fs.h>
+#include <linux/sysfs.h>
+#include <linux/kernfs.h>
+#include <linux/slab.h>
+
+#include <uapi/linux/magic.h>
+
+#include <asm/intel_rdt.h>
+
+DEFINE_STATIC_KEY_FALSE(rdt_enable_key);
+struct kernfs_root *rdt_root;
+struct rdtgroup rdtgroup_default;
+LIST_HEAD(rdt_all_groups);
+
+static void l3_qos_cfg_update(void *arg)
+{
+ struct rdt_resource *r = arg;
+
+ wrmsrl(IA32_L3_QOS_CFG, r->cdp_enabled);
+}
+
+static void set_l3_qos_cfg(struct rdt_resource *r)
+{
+ struct list_head *l;
+ struct rdt_domain *d;
+ struct cpumask cpu_mask;
+
+ cpumask_clear(&cpu_mask);
+ list_for_each(l, &r->domains) {
+ d = list_entry(l, struct rdt_domain, list);
+ cpumask_set_cpu(cpumask_any(&d->cpu_mask), &cpu_mask);
+ }
+ smp_call_function_many(&cpu_mask, l3_qos_cfg_update, r, 1);
+}
+
+static int parse_rdtgroupfs_options(char *data, struct rdt_resource *r)
+{
+ char *token, *o = data;
+
+ while ((token = strsep(&o, ",")) != NULL) {
+ if (!*token)
+ return -EINVAL;
+
+ if (!strcmp(token, "cdp"))
+ if (r->enabled && r->cdp_capable)
+ r->cdp_enabled = true;
+ }
+
+ return 0;
+}
+
+static struct dentry *rdt_mount(struct file_system_type *fs_type,
+ int flags, const char *unused_dev_name,
+ void *data)
+{
+ struct dentry *dentry;
+ int ret;
+ bool new_sb;
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3];
+
+ mutex_lock(&rdtgroup_mutex);
+ /*
+ * resctrl file system can only be mounted once.
+ */
+ if (static_branch_unlikely(&rdt_enable_key)) {
+ dentry = ERR_PTR(-EBUSY);
+ goto out;
+ }
+
+ r->cdp_enabled = false;
+ ret = parse_rdtgroupfs_options(data, r);
+ if (ret) {
+ dentry = ERR_PTR(ret);
+ goto out;
+ }
+ if (r->cdp_enabled)
+ r->num_closid = r->max_closid / 2;
+ else
+ r->num_closid = r->max_closid;
+
+ /* Recompute rdt_max_closid because CDP may have changed things. */
+ rdt_max_closid = 0;
+ for_each_rdt_resource(r)
+ rdt_max_closid = max(rdt_max_closid, r->num_closid);
+ if (rdt_max_closid > 32)
+ rdt_max_closid = 32;
+
+ dentry = kernfs_mount(fs_type, flags, rdt_root,
+ RDTGROUP_SUPER_MAGIC, &new_sb);
+ if (IS_ERR(dentry))
+ goto out;
+ if (!new_sb) {
+ dentry = ERR_PTR(-EINVAL);
+ goto out;
+ }
+ r = &rdt_resources_all[RDT_RESOURCE_L3];
+ if (r->cdp_capable)
+ set_l3_qos_cfg(r);
+ static_branch_enable(&rdt_enable_key);
+
+out:
+ mutex_unlock(&rdtgroup_mutex);
+
+ return dentry;
+}
+
+static void reset_all_cbms(struct rdt_resource *r)
+{
+ struct list_head *l;
+ struct rdt_domain *d;
+ struct msr_param msr_param;
+ struct cpumask cpu_mask;
+ int i;
+
+ cpumask_clear(&cpu_mask);
+ msr_param.res = r;
+ msr_param.low = 0;
+ msr_param.high = r->max_closid;
+
+ list_for_each(l, &r->domains) {
+ d = list_entry(l, struct rdt_domain, list);
+ cpumask_set_cpu(cpumask_any(&d->cpu_mask), &cpu_mask);
+
+ for (i = 0; i < r->max_closid; i++)
+ d->cbm[i] = r->max_cbm;
+ }
+ smp_call_function_many(&cpu_mask, rdt_cbm_update, &msr_param, 1);
+}
+
+static void rdt_kill_sb(struct super_block *sb)
+{
+ struct rdt_resource *r;
+
+ mutex_lock(&rdtgroup_mutex);
+
+ /*Put everything back to default values. */
+ for_each_rdt_resource(r)
+ reset_all_cbms(r);
+ r = &rdt_resources_all[RDT_RESOURCE_L3];
+ if (r->cdp_capable) {
+ r->cdp_enabled = 0;
+ set_l3_qos_cfg(r);
+ }
+
+ static_branch_disable(&rdt_enable_key);
+ kernfs_kill_sb(sb);
+ mutex_unlock(&rdtgroup_mutex);
+}
+
+static struct file_system_type rdt_fs_type = {
+ .name = "resctrl",
+ .mount = rdt_mount,
+ .kill_sb = rdt_kill_sb,
+};
+
+static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops = {
+};
+
+static int __init rdtgroup_setup_root(void)
+{
+ int ret;
+
+ rdt_root = kernfs_create_root(&rdtgroup_kf_syscall_ops,
+ KERNFS_ROOT_CREATE_DEACTIVATED,
+ &rdtgroup_default);
+ if (IS_ERR(rdt_root))
+ return PTR_ERR(rdt_root);
+
+ mutex_lock(&rdtgroup_mutex);
+
+ rdtgroup_default.closid = 0;
+ list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);
+
+ rdtgroup_default.kn = rdt_root->kn;
+ kernfs_activate(rdtgroup_default.kn);
+
+ mutex_unlock(&rdtgroup_mutex);
+
+ return ret;
+}
+
+/*
+ * rdtgroup_init - rdtgroup initialization
+ *
+ * Setup resctrl file system including set up root, create mount point,
+ * register rdtgroup filesystem, and initialize files under root directory.
+ *
+ * Return: 0 on success or -errno
+ */
+int __init rdtgroup_init(void)
+{
+ int ret = 0;
+
+ ret = rdtgroup_setup_root();
+ if (ret)
+ return ret;
+
+ ret = sysfs_create_mount_point(fs_kobj, "resctrl");
+ if (ret)
+ goto cleanup_root;
+
+ ret = register_filesystem(&rdt_fs_type);
+ if (ret)
+ goto cleanup_mountpoint;
+
+ return 0;
+
+cleanup_mountpoint:
+ sysfs_remove_mount_point(fs_kobj, "resctrl");
+cleanup_root:
+ kernfs_destroy_root(rdt_root);
+
+ return ret;
+}
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index e398bea..27ef03d 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -57,6 +57,7 @@
#define CGROUP_SUPER_MAGIC 0x27e0eb
#define CGROUP2_SUPER_MAGIC 0x63677270

+#define RDTGROUP_SUPER_MAGIC 0x7655821

#define STACK_END_MAGIC 0x57AC6E9D

--
2.5.0

2016-10-07 23:47:44

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 06/18] x86/intel_rdt: Add CONFIG, Makefile, and basic initialization

From: Fenghua Yu <[email protected]>

Introduce CONFIG_INTEL_RDT (default: no, dependent on X86 and
CPU_SUP_INTEL) to control inclusion of Resource Director Technology in
the build.

Simple init() routine just checks which features are present. If they are
pr_info() one line summary for each feature.

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/Kconfig | 12 +++++++++
arch/x86/kernel/cpu/Makefile | 2 ++
arch/x86/kernel/cpu/intel_rdt.c | 56 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 70 insertions(+)
create mode 100644 arch/x86/kernel/cpu/intel_rdt.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2a1f0ce..cfca613 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -406,6 +406,18 @@ config GOLDFISH
def_bool y
depends on X86_GOLDFISH

+config INTEL_RDT
+ bool "Intel Resource Director Technology support"
+ default n
+ depends on X86 && CPU_SUP_INTEL
+ help
+ Select to enable resource allocation which is a sub-feature of
+ Intel Resource Director Technology(RDT). More information about
+ RDT can be found in the Intel x86 Architecture Software
+ Developer Manual June 2016, volume 3, section 17.17.
+
+ Say N if unsure.
+
if X86_32
config X86_EXTENDED_PLATFORM
bool "Support for extended (non-PC) x86 platforms"
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 4a8697f..5a791c8 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -34,6 +34,8 @@ obj-$(CONFIG_CPU_SUP_CENTAUR) += centaur.o
obj-$(CONFIG_CPU_SUP_TRANSMETA_32) += transmeta.o
obj-$(CONFIG_CPU_SUP_UMC_32) += umc.o

+obj-$(CONFIG_INTEL_RDT) += intel_rdt.o
+
obj-$(CONFIG_X86_MCE) += mcheck/
obj-$(CONFIG_MTRR) += mtrr/
obj-$(CONFIG_MICROCODE) += microcode/
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
new file mode 100644
index 0000000..ebe8dae
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -0,0 +1,56 @@
+/*
+ * Resource Director Technology(RDT)
+ * - Cache Allocation code.
+ *
+ * Copyright (C) 2016 Intel Corporation
+ *
+ * Authors:
+ * Fenghua Yu <[email protected]>
+ * Tony Luck <[email protected]>
+ * Vikas Shivappa <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * More information about RDT be found in the Intel (R) x86 Architecture
+ * Software Developer Manual June 2016, volume 3, section 17.17.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/slab.h>
+#include <linux/err.h>
+
+static inline bool get_rdt_resources(struct cpuinfo_x86 *c)
+{
+ bool ret = false;
+
+ if (!cpu_has(c, X86_FEATURE_RDT_A))
+ return false;
+ if (cpu_has(c, X86_FEATURE_CAT_L3))
+ ret = true;
+
+ return ret;
+}
+
+static int __init intel_rdt_late_init(void)
+{
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+
+ if (!get_rdt_resources(c))
+ return -ENODEV;
+
+ pr_info("Intel cache allocation detected\n");
+ if (cpu_has(c, X86_FEATURE_CDP_L3))
+ pr_info("Intel code data prioritization detected\n");
+
+ return 0;
+}
+
+late_initcall(intel_rdt_late_init);
--
2.5.0

2016-10-07 23:48:19

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 05/18] Documentation, x86: Documentation for Intel resource allocation user interface

From: Fenghua Yu <[email protected]>

The documentation describes user interface of how to allocate resource
in Intel RDT.

Please note that the documentation covers generic user interface. Current
patch set code only implemente CAT L3. CAT L2 code will be sent later.

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
Documentation/x86/intel_rdt_ui.txt | 162 +++++++++++++++++++++++++++++++++++++
1 file changed, 162 insertions(+)
create mode 100644 Documentation/x86/intel_rdt_ui.txt

diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt
new file mode 100644
index 0000000..ad51455
--- /dev/null
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -0,0 +1,162 @@
+User Interface for Resource Allocation in Intel Resource Director Technology
+
+Copyright (C) 2016 Intel Corporation
+
+Fenghua Yu <[email protected]>
+Tony Luck <[email protected]>
+
+This feature is enabled by the CONFIG_INTEL_RDT Kconfig and the
+X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3".
+
+To use the feature mount the file system:
+
+ # mount -t resctrl resctrl [-o cdp] /sys/fs/resctrl
+
+mount options are:
+
+"cdp": Enable code/data prioritization in L3 cache allocations.
+
+
+Resource groups
+---------------
+Resource groups are represented as directories in the resctrl file
+system. The default group is the root directory. Other groups may be
+created as desired by the system administrator using the "mkdir(1)"
+command, and removed using "rmdir(1)".
+
+There are three files associated with each group:
+
+"tasks": A list of tasks that belongs to this group. Tasks can be
+ added to a group by writing the task ID to the "tasks" file
+ (which will automatically remove them from the previous
+ group to which they belonged). New tasks created by fork(2)
+ and clone(2) are added to the same group as their parent.
+ If a pid is not in any sub partition, it is in root partition
+ (i.e. default partition).
+
+"cpus": A bitmask of logical CPUs assigned to this group. Writing
+ a new mask can add/remove CPUs from this group. Added CPUs
+ are removed from their previous group. Removed ones are
+ given to the default (root) group. You cannot remove CPUs
+ from the default group.
+
+"schemata": A list of all the resources available to this group.
+ Each resource has its own line and format - see below for
+ details.
+
+When a task is running the following rules define which resources
+are available to it:
+
+1) If the task is a member of a non-default group, then the schemata
+for that group is used.
+
+2) Else if the task belongs to the default group, but is running on a
+CPU that is assigned to some specific group, then the schemata for
+the CPU's group is used.
+
+3) Otherwise the schemata for the default group is used.
+
+
+Schemata files - general concepts
+---------------------------------
+Each line in the file describes one resource. The line starts with
+the name of the resource, followed by specific values to be applied
+in each of the instances of that resource on the system.
+
+Cache IDs
+---------
+On current generation systems there is one L3 cache per socket and L2
+caches are generally just shared by the hyperthreads on a core, but this
+isn't an architectural requirement. We could have multiple separate L3
+caches on a socket, multiple cores could share an L2 cache. So instead
+of using "socket" or "core" to define the set of logical cpus sharing
+a resource we use a "Cache ID". At a given cache level this will be a
+unique number across the whole system (but it isn't guaranteed to be a
+contiguous sequence, there may be gaps). To find the ID for each logical
+CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
+
+Cache Bit Masks (CBM)
+---------------------
+For cache resources we describe the portion of the cache that is available
+for allocation using a bitmask. The maximum value of the mask is defined
+by each cpu model (and may be different for different cache levels). It
+is found using CPUID, but is also provided in the "info" directory of
+the resctrl file system in "info/{resource}/max_cbm_val". X86 hardware
+requires that these masks have all the '1' bits in a contiguous block. So
+0x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
+and 0xA are not. On a system with a 20-bit mask each bit represents 5%
+of the capacity of the cache. You could partition the cache into four
+equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
+
+
+L3 details (code and data prioritization disabled)
+--------------------------------------------------
+With CDP disabled the L3 schemata format is:
+
+ L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
+
+L3 details (CDP enabled via mount option to resctrl)
+----------------------------------------------------
+When CDP is enabled, you need to specify separate cache bit masks for
+code and data access. The generic format is:
+
+ L3:<cache_id0>=<d_cbm>,<i_cbm>;<cache_id1>=<d_cbm>,<i_cbm>;...
+
+where the d_cbm masks are for data access, and the i_cbm masks for code.
+
+
+Example 1
+---------
+On a two socket machine (one L3 cache per socket) with just four bits
+for cache bit masks
+
+# mount -t resctrl resctrl /sys/fs/resctrl
+# cd /sys/fs/resctrl
+# mkdir p0 p1
+# echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata
+# echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata
+
+The default resource group is unmodified, so we have access to all parts
+of all caches (its schemata file reads "L3:0=f;1=f").
+
+Tasks that are under the control of group "p0" may only allocate from the
+"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
+Tasks in group "p1" use the "lower" 50% of cache on both sockets.
+
+Example 2
+---------
+Again two sockets, but this time with a more realistic 20-bit mask.
+
+Two real time tasks pid=1234 running on processor 0 and pid=5678 running on
+processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
+neighbors, each of the two real-time tasks exclusively occupies one quarter
+of L3 cache on socket 0.
+
+# mount -t resctrl resctrl /sys/fs/resctrl
+# cd /sys/fs/resctrl
+
+First we reset the schemata for the default group so that the "upper"
+50% of the L3 cache on socket 0 cannot be used by ordinary tasks:
+
+# echo "L3:0=3ff;1=fffff" > schemata
+
+Next we make a resource group for our first real time task and give
+it access to the "top" 25% of the cache on socket 0.
+
+# mkdir p0
+# echo "L3:0=f8000;1=fffff" > p0/schemata
+
+Finally we move our first real time task into this resource group. We
+also use taskset(1) to ensure the task always runs on a dedicated CPU
+on socket 0. Most uses of resource groups will also constrain which
+processors tasks run on.
+
+# echo 1234 > p0/tasks
+# taskset -cp 1 1234
+
+Ditto for the second real time task (with the remaining 25% of cache):
+
+# mkdir p1
+# echo "L3:0=7c00;1=fffff" > p1/schemata
+# echo 5678 > p1/tasks
+# taskset -cp 2 5678
--
2.5.0

2016-10-07 23:48:29

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 04/18] x86/intel_rdt: Feature discovery

From: Fenghua Yu <[email protected]>

Check CPUID leaves for all the Resource Director Technology (RDT)
Cache Allocation Technology (CAT) bits.

Prescence of allocation features:
CPUID.(EAX=7H, ECX=0):EBX[bit 15] X86_FEATURE_RDT_A

L2 and L3 caches are each separately enabled:
CPUID.(EAX=10H, ECX=0):EBX[bit 1] X86_FEATURE_CAT_L3
CPUID.(EAX=10H, ECX=0):EBX[bit 2] X86_FEATURE_CAT_L2

L3 cache may support independent control of allocation for
code and data (CDP = Code/Data Prioritization):
CPUID.(EAX=10H, ECX=1):ECX[bit 2] X86_FEATURE_CDP_L3

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 5 +++++
arch/x86/kernel/cpu/scattered.c | 3 +++
2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 92a8308..64dd8274 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -196,6 +196,10 @@

#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */

+#define X86_FEATURE_CAT_L3 ( 7*32+16) /* Cache Allocation Technology L3 */
+#define X86_FEATURE_CAT_L2 ( 7*32+17) /* Cache Allocation Technology L2 */
+#define X86_FEATURE_CDP_L3 ( 7*32+18) /* Code and Data Prioritization L3 */
+
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
#define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */
@@ -220,6 +224,7 @@
#define X86_FEATURE_RTM ( 9*32+11) /* Restricted Transactional Memory */
#define X86_FEATURE_CQM ( 9*32+12) /* Cache QoS Monitoring */
#define X86_FEATURE_MPX ( 9*32+14) /* Memory Protection Extension */
+#define X86_FEATURE_RDT_A ( 9*32+15) /* Resource Director Technology Allocation */
#define X86_FEATURE_AVX512F ( 9*32+16) /* AVX-512 Foundation */
#define X86_FEATURE_AVX512DQ ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */
#define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 8cb57df..11f39a2 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -34,6 +34,9 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
{ X86_FEATURE_INTEL_PT, CR_EBX,25, 0x00000007, 0 },
{ X86_FEATURE_APERFMPERF, CR_ECX, 0, 0x00000006, 0 },
{ X86_FEATURE_EPB, CR_ECX, 3, 0x00000006, 0 },
+ { X86_FEATURE_CAT_L3, CR_EBX, 1, 0x00000010, 0 },
+ { X86_FEATURE_CAT_L2, CR_EBX, 2, 0x00000010, 0 },
+ { X86_FEATURE_CDP_L3, CR_ECX, 2, 0x00000010, 1 },
{ X86_FEATURE_HW_PSTATE, CR_EDX, 7, 0x80000007, 0 },
{ X86_FEATURE_CPB, CR_EDX, 9, 0x80000007, 0 },
{ X86_FEATURE_PROC_FEEDBACK, CR_EDX,11, 0x80000007, 0 },
--
2.5.0

2016-10-07 23:48:38

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v3 03/18] x86, intel_cacheinfo: Enable cache id in x86

From: Fenghua Yu <[email protected]>

Cache id is retrieved from APIC ID and CPUID leaf 4 on x86.

For more details see the section on "Cache ID Extraction Parameters" in
"Intel 64 Architecture Processor Topology Enumeration" at
https://software.intel.com/sites/default/files/63/1a/Kuo_CpuTopology_rc1.rh1.final.pdf

Also "Intel 64 and IA-32 Architectures Software Developer's Manual" volume 2,
table 3-18 "information Returned by CPUID Instruction" at
http://www.intel.com/sdm

Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/kernel/cpu/intel_cacheinfo.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c b/arch/x86/kernel/cpu/intel_cacheinfo.c
index de6626c..8dc5720 100644
--- a/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -153,6 +153,7 @@ struct _cpuid4_info_regs {
union _cpuid4_leaf_eax eax;
union _cpuid4_leaf_ebx ebx;
union _cpuid4_leaf_ecx ecx;
+ unsigned int id;
unsigned long size;
struct amd_northbridge *nb;
};
@@ -894,6 +895,8 @@ static void __cache_cpumap_setup(unsigned int cpu, int index,
static void ci_leaf_init(struct cacheinfo *this_leaf,
struct _cpuid4_info_regs *base)
{
+ this_leaf->id = base->id;
+ this_leaf->attributes = CACHE_ID;
this_leaf->level = base->eax.split.level;
this_leaf->type = cache_type_map[base->eax.split.type];
this_leaf->coherency_line_size =
@@ -920,6 +923,22 @@ static int __init_cache_level(unsigned int cpu)
return 0;
}

+/*
+ * The max shared threads number comes from CPUID.4:EAX[25-14] with input
+ * ECX as cache index. Then right shift apicid by the number's order to get
+ * cache id for this cache node.
+ */
+static void get_cache_id(int cpu, struct _cpuid4_info_regs *id4_regs)
+{
+ struct cpuinfo_x86 *c = &cpu_data(cpu);
+ unsigned long num_threads_sharing;
+ int index_msb;
+
+ num_threads_sharing = 1 + id4_regs->eax.split.num_threads_sharing;
+ index_msb = get_count_order(num_threads_sharing);
+ id4_regs->id = c->apicid >> index_msb;
+}
+
static int __populate_cache_leaves(unsigned int cpu)
{
unsigned int idx, ret;
@@ -931,6 +950,7 @@ static int __populate_cache_leaves(unsigned int cpu)
ret = cpuid4_cache_lookup_regs(idx, &id4_regs);
if (ret)
return ret;
+ get_cache_id(cpu, &id4_regs);
ci_leaf_init(this_leaf++, &id4_regs);
__cache_cpumap_setup(cpu, idx, &id4_regs);
}
--
2.5.0

2016-10-07 23:54:24

by Luck, Tony

[permalink] [raw]
Subject: [RFC PATCH 19/18] x86/intel_rdt: Add support for L2 cache allocation

Just in case you think that part 0010 is a bit complex setting
up all that infrastructure for just the L3 cache and using
for_each_rdt_resource() all over the place to loop over just one
thing. Here's the payoff.

Untested because I don't have a machine handy that supports L2 ... but this
should be all that needs to change for L2 cache allocation (both as an
alternate to L3, or in combination with L3).

Next feature after L2 is not a cache - so rdt_resource will have to grow
some new flags, or perhaps some "->resource_ops()" functions to handle
the pieces that work a bit differently, but the fundamentals should still
apply.

Not-signed-off-yet: Tony Luck
---
arch/x86/include/asm/intel_rdt.h | 2 ++
arch/x86/kernel/cpu/intel_rdt.c | 23 +++++++++++++++++++++++
2 files changed, 25 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index aefa3a655408..641d32db9d74 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -97,6 +97,7 @@ struct rdt_resource {

#define IA32_L3_QOS_CFG 0xc81
#define IA32_L3_CBM_BASE 0xc90
+#define IA32_L2_CBM_BASE 0xd10

/**
* struct rdt_domain - group of cpus sharing an RDT resource
@@ -134,6 +135,7 @@ extern struct rdt_resource rdt_resources_all[];

enum {
RDT_RESOURCE_L3,
+ RDT_RESOURCE_L2,
};

/* Maximum CLOSID allowed across all enabled resoources */
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 1afca46e080d..c457641c49b6 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -50,6 +50,13 @@ struct rdt_resource rdt_resources_all[] = {
.cache_level = 3
},
{
+ .name = "L2",
+ .domains = domain_init(RDT_RESOURCE_L2),
+ .msr_base = IA32_L2_CBM_BASE,
+ .min_cbm_bits = 1,
+ .cache_level = 2
+ },
+ {
/* NULL terminated */
}
};
@@ -122,6 +129,22 @@ static inline bool get_rdt_resources(struct cpuinfo_x86 *c)

ret = true;
}
+ if (cpu_has(c, X86_FEATURE_CAT_L2)) {
+ union cpuid_0x10_1_eax eax;
+ union cpuid_0x10_1_edx edx;
+ u32 ebx, ecx;
+
+ /* CPUID 0x10.2 fields are same format at 0x10.1 */
+ r = &rdt_resources_all[RDT_RESOURCE_L2];
+ cpuid_count(0x00000010, 2, &eax.full, &ebx, &ecx, &edx.full);
+ r->max_closid = edx.split.cos_max + 1;
+ r->num_closid = r->max_closid;
+ r->cbm_len = eax.split.cbm_len + 1;
+ r->max_cbm = BIT_MASK(eax.split.cbm_len + 1) - 1;
+ r->enabled = true;
+
+ ret = true;
+ }

return ret;
}
--
2.5.0

2016-10-08 17:12:55

by Nilay Vaish

[permalink] [raw]
Subject: Re: [PATCH v3 01/18] Documentation, ABI: Add a document entry for cache id

On 7 October 2016 at 21:45, Fenghua Yu <[email protected]> wrote:
> From: Fenghua Yu <[email protected]>
>
> Add an ABI document entry for /sys/devices/system/cpu/cpu*/cache/index*/id.
>
> Signed-off-by: Fenghua Yu <[email protected]>
> Signed-off-by: Tony Luck <[email protected]>
> ---
> Documentation/ABI/testing/sysfs-devices-system-cpu | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> index 4987417..b1c3d69 100644
> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> @@ -272,6 +272,22 @@ Description: Parameters for the CPU cache attributes
> the modified cache line is written to main
> memory only when it is replaced
>
> +
> +What: /sys/devices/system/cpu/cpu*/cache/index*/id
> +Date: September 2016
> +Contact: Linux kernel mailing list <[email protected]>
> +Description: Cache id
> +
> + The id provides a unique name for a specific instance of
> + a cache of a particular type. E.g. there may be a level
> + 3 unified cache on each socket in a server and we may
> + assign them ids 0, 1, 2, ...
> +
> + Note that id value may not be contiguous. E.g. level 1
> + caches typically exist per core, but there may not be a
> + power of two cores on a socket, so these caches may be
> + numbered 0, 1, 2, 3, 4, 5, 8, 9, 10, ...
> +

While it is ok that the caches are not numbered contiguously, it is
unclear how this is related to number of cores on a socket being a
power of 2 or not.

--
Nilay

2016-10-08 17:13:06

by Nilay Vaish

[permalink] [raw]
Subject: Re: [PATCH v3 04/18] x86/intel_rdt: Feature discovery

On 7 October 2016 at 21:45, Fenghua Yu <[email protected]> wrote:
> From: Fenghua Yu <[email protected]>
>
> Check CPUID leaves for all the Resource Director Technology (RDT)
> Cache Allocation Technology (CAT) bits.
>
> Prescence of allocation features:

Presence

> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 92a8308..64dd8274 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -196,6 +196,10 @@
>
> #define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
>
> +#define X86_FEATURE_CAT_L3 ( 7*32+16) /* Cache Allocation Technology L3 */
> +#define X86_FEATURE_CAT_L2 ( 7*32+17) /* Cache Allocation Technology L2 */
> +#define X86_FEATURE_CDP_L3 ( 7*32+18) /* Code and Data Prioritization L3 */
> +
> /* Virtualization flags: Linux defined, word 8 */
> #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
> #define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */
> @@ -220,6 +224,7 @@
> #define X86_FEATURE_RTM ( 9*32+11) /* Restricted Transactional Memory */
> #define X86_FEATURE_CQM ( 9*32+12) /* Cache QoS Monitoring */
> #define X86_FEATURE_MPX ( 9*32+14) /* Memory Protection Extension */
> +#define X86_FEATURE_RDT_A ( 9*32+15) /* Resource Director Technology Allocation */
> #define X86_FEATURE_AVX512F ( 9*32+16) /* AVX-512 Foundation */
> #define X86_FEATURE_AVX512DQ ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */
> #define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */

I think these #defines are specific to Intel. I would prefer if we
have _INTEL_ somewhere in them.

--
Nilay

2016-10-08 17:13:15

by Nilay Vaish

[permalink] [raw]
Subject: Re: [PATCH v3 05/18] Documentation, x86: Documentation for Intel resource allocation user interface

On 7 October 2016 at 21:45, Fenghua Yu <[email protected]> wrote:
> From: Fenghua Yu <[email protected]>
>
> +L3 details (code and data prioritization disabled)
> +--------------------------------------------------
> +With CDP disabled the L3 schemata format is:
> +
> + L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
> +
> +L3 details (CDP enabled via mount option to resctrl)
> +----------------------------------------------------
> +When CDP is enabled, you need to specify separate cache bit masks for
> +code and data access. The generic format is:
> +
> + L3:<cache_id0>=<d_cbm>,<i_cbm>;<cache_id1>=<d_cbm>,<i_cbm>;...

Can we drop L3 here and instead say:
L<level>:<cache_id0>=<d_cbm>,<i_cbm>;<cache_id1>=<d_cbm>,<i_cbm>;...

and similarly for without CDP as well.

--
Nilay

2016-10-08 17:27:34

by Nilay Vaish

[permalink] [raw]
Subject: Re: [PATCH v3 02/18] cacheinfo: Introduce cache id

On 7 October 2016 at 21:45, Fenghua Yu <[email protected]> wrote:
> From: Fenghua Yu <[email protected]>
>
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index e9fd32e..2a21c15 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -233,6 +233,7 @@ static ssize_t file_name##_show(struct device *dev, \
> return sprintf(buf, "%u\n", this_leaf->object); \
> }
>
> +show_one(id, id);
> show_one(level, level);
> show_one(coherency_line_size, coherency_line_size);
> show_one(number_of_sets, number_of_sets);
> @@ -314,6 +315,7 @@ static ssize_t write_policy_show(struct device *dev,
> return n;
> }
>
> +static DEVICE_ATTR_RO(id);
> static DEVICE_ATTR_RO(level);
> static DEVICE_ATTR_RO(type);
> static DEVICE_ATTR_RO(coherency_line_size);
> @@ -327,6 +329,7 @@ static DEVICE_ATTR_RO(shared_cpu_list);
> static DEVICE_ATTR_RO(physical_line_partition);
>
> static struct attribute *cache_default_attrs[] = {
> + &dev_attr_id.attr,
> &dev_attr_type.attr,
> &dev_attr_level.attr,
> &dev_attr_shared_cpu_map.attr,
> @@ -350,6 +353,8 @@ cache_default_attrs_is_visible(struct kobject *kobj,
> const struct cpumask *mask = &this_leaf->shared_cpu_map;
> umode_t mode = attr->mode;
>
> + if ((attr == &dev_attr_id.attr) && this_leaf->attributes & CACHE_ID)

Can you put parentheses around 'this_leaf->attributes & CACHE_ID'?
Whenever I look at expressions using bitwise operators without
parentheses, I have to lookup the precedence table to make sure
everything is fine.

--
Nilay

2016-10-08 17:30:04

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v3 05/18] Documentation, x86: Documentation for Intel resource allocation user interface

On Sat, Oct 08, 2016 at 12:12:07PM -0500, Nilay Vaish wrote:
> On 7 October 2016 at 21:45, Fenghua Yu <[email protected]> wrote:
> > From: Fenghua Yu <[email protected]>
> >
> > +L3 details (code and data prioritization disabled)
> > +--------------------------------------------------
> > +With CDP disabled the L3 schemata format is:
> > +
> > + L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
> > +
> > +L3 details (CDP enabled via mount option to resctrl)
> > +----------------------------------------------------
> > +When CDP is enabled, you need to specify separate cache bit masks for
> > +code and data access. The generic format is:
> > +
> > + L3:<cache_id0>=<d_cbm>,<i_cbm>;<cache_id1>=<d_cbm>,<i_cbm>;...
>
> Can we drop L3 here and instead say:
> L<level>:<cache_id0>=<d_cbm>,<i_cbm>;<cache_id1>=<d_cbm>,<i_cbm>;...
>
> and similarly for without CDP as well.

L3 and L2 are similar but different. L2 doesn't have CDP feature. It would
be better to talk them separately here.

Thanks.

-Fenghua

2016-10-08 17:51:47

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v3 04/18] x86/intel_rdt: Feature discovery

On Sat, Oct 08, 2016 at 12:11:22PM -0500, Nilay Vaish wrote:
> On 7 October 2016 at 21:45, Fenghua Yu <[email protected]> wrote:
> > From: Fenghua Yu <[email protected]>
> >
> > Check CPUID leaves for all the Resource Director Technology (RDT)
> > Cache Allocation Technology (CAT) bits.
> >
> > Prescence of allocation features:
>
> Presence
>
> > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> > index 92a8308..64dd8274 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -196,6 +196,10 @@
> >
> > #define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
> >
> > +#define X86_FEATURE_CAT_L3 ( 7*32+16) /* Cache Allocation Technology L3 */
> > +#define X86_FEATURE_CAT_L2 ( 7*32+17) /* Cache Allocation Technology L2 */
> > +#define X86_FEATURE_CDP_L3 ( 7*32+18) /* Code and Data Prioritization L3 */
> > +
> > /* Virtualization flags: Linux defined, word 8 */
> > #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
> > #define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */
> > @@ -220,6 +224,7 @@
> > #define X86_FEATURE_RTM ( 9*32+11) /* Restricted Transactional Memory */
> > #define X86_FEATURE_CQM ( 9*32+12) /* Cache QoS Monitoring */
> > #define X86_FEATURE_MPX ( 9*32+14) /* Memory Protection Extension */
> > +#define X86_FEATURE_RDT_A ( 9*32+15) /* Resource Director Technology Allocation */
> > #define X86_FEATURE_AVX512F ( 9*32+16) /* AVX-512 Foundation */
> > #define X86_FEATURE_AVX512DQ ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */
> > #define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */
>
> I think these #defines are specific to Intel. I would prefer if we
> have _INTEL_ somewhere in them.

Is adding "Intel" in comment good?

+#define X86_FEATURE_CAT_L3 ( 7*32+16) /* Intel Cache Allocation Technology L3 */
+#define X86_FEATURE_CAT_L2 ( 7*32+17) /* Intel Cache Allocation Technology L2 */
+#define X86_FEATURE_CDP_L3 ( 7*32+18) /* Intel Code and Data Prioritization L3 */

Thanks.

-Fenghua

2016-10-08 19:53:02

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 04/18] x86/intel_rdt: Feature discovery

On Sat, Oct 08, 2016 at 01:54:54PM -0700, Fenghua Yu wrote:
> > I think these #defines are specific to Intel. I would prefer if we
> > have _INTEL_ somewhere in them.

We don't generally add vendor names to those defines. Even more so if
the 0x0... leaf range is Intel-specific anyway.

> Is adding "Intel" in comment good?

I don't see any need if the leaf has already this heading:

/* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2016-10-08 20:57:38

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 06/18] x86/intel_rdt: Add CONFIG, Makefile, and basic initialization

On Fri, Oct 07, 2016 at 07:45:51PM -0700, Fenghua Yu wrote:
> From: Fenghua Yu <[email protected]>
>
> Introduce CONFIG_INTEL_RDT (default: no, dependent on X86 and
> CPU_SUP_INTEL) to control inclusion of Resource Director Technology in
> the build.
>
> Simple init() routine just checks which features are present. If they are
> pr_info() one line summary for each feature.
>
> Signed-off-by: Fenghua Yu <[email protected]>
> Signed-off-by: Tony Luck <[email protected]>
> ---
> arch/x86/Kconfig | 12 +++++++++
> arch/x86/kernel/cpu/Makefile | 2 ++
> arch/x86/kernel/cpu/intel_rdt.c | 56 +++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 70 insertions(+)
> create mode 100644 arch/x86/kernel/cpu/intel_rdt.c

...

> +static inline bool get_rdt_resources(struct cpuinfo_x86 *c)
> +{
> + bool ret = false;
> +
> + if (!cpu_has(c, X86_FEATURE_RDT_A))
> + return false;
> + if (cpu_has(c, X86_FEATURE_CAT_L3))
> + ret = true;
> +
> + return ret;
> +}
> +
> +static int __init intel_rdt_late_init(void)
> +{
> + struct cpuinfo_x86 *c = &boot_cpu_data;

No need for that - you can do directly

if (boot_cpu_has(...))

in get_rdt_resources() and here.

> +
> + if (!get_rdt_resources(c))
> + return -ENODEV;
> +
> + pr_info("Intel cache allocation detected\n");

So this reads stupid: Intel cache allocation? Reads like Intel can do
cache allocation. But it does already, even before that RDT stuff.

So you need to think of a more sensible name for it: "Intel cache
partitioning support detected" or some descriptive name which actually
denotes what the technology is.

Everybody can allocate lines in the cache :)

> + if (cpu_has(c, X86_FEATURE_CDP_L3))
> + pr_info("Intel code data prioritization detected\n");

Ditto.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2016-10-09 11:41:25

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 07/18] x86/intel_rdt: Add Haswell feature discovery

On Fri, Oct 07, 2016 at 07:45:52PM -0700, Fenghua Yu wrote:
> From: Fenghua Yu <[email protected]>
>
> Some Haswell generation CPUs support RDT, but they don't enumerate this
> using CPUID. Use rdmsr_safe() and wrmsr_safe() to probe the MSRs on
> cpu model 63 (INTEL_FAM6_HASWELL_X)
>
> Signed-off-by: Fenghua Yu <[email protected]>
> Signed-off-by: Tony Luck <[email protected]>
> ---
> arch/x86/events/intel/cqm.c | 2 +-
> arch/x86/include/asm/intel_rdt_common.h | 6 ++++++
> arch/x86/kernel/cpu/intel_rdt.c | 38 +++++++++++++++++++++++++++++++++
> 3 files changed, 45 insertions(+), 1 deletion(-)
> create mode 100644 arch/x86/include/asm/intel_rdt_common.h

...

> +static inline bool cache_alloc_hsw_probe(void)
> +{
> + u32 l, h_old, h_new, h_tmp;
> +
> + if (rdmsr_safe(MSR_IA32_PQR_ASSOC, &l, &h_old))
> + return false;
> +
> + /*
> + * Default value is always 0 if feature is present.
> + */
> + h_tmp = h_old ^ 0x1U;
> + if (wrmsr_safe(MSR_IA32_PQR_ASSOC, l, h_tmp))

I don't understand - you do the family/model check below and yet still
use the _safe() variants. Isn't the presence of that MSR guaranteed on
those machines?

> + return false;
> + rdmsr(MSR_IA32_PQR_ASSOC, l, h_new);
> +
> + if (h_tmp != h_new)
> + return false;
> +
> + wrmsr(MSR_IA32_PQR_ASSOC, l, h_old);
> +
> + return true;
> +}
>
> static inline bool get_rdt_resources(struct cpuinfo_x86 *c)
> {
> bool ret = false;
>
> + if (c->x86_vendor == X86_VENDOR_INTEL && c->x86 == 6 &&
> + c->x86_model == INTEL_FAM6_HASWELL_X)
> + return cache_alloc_hsw_probe();
> +
> if (!cpu_has(c, X86_FEATURE_RDT_A))
> return false;
> if (cpu_has(c, X86_FEATURE_CAT_L3))
> --
> 2.5.0
>

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2016-10-09 14:06:32

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v3 07/18] x86/intel_rdt: Add Haswell feature discovery

On Sun, Oct 09, 2016 at 01:41:16PM +0200, Borislav Petkov wrote:
> On Fri, Oct 07, 2016 at 07:45:52PM -0700, Fenghua Yu wrote:
> > From: Fenghua Yu <[email protected]>
> >
> > Some Haswell generation CPUs support RDT, but they don't enumerate this
> > using CPUID. Use rdmsr_safe() and wrmsr_safe() to probe the MSRs on
> > cpu model 63 (INTEL_FAM6_HASWELL_X)
> >
> > Signed-off-by: Fenghua Yu <[email protected]>
> > Signed-off-by: Tony Luck <[email protected]>
> > ---
> > arch/x86/events/intel/cqm.c | 2 +-
> > arch/x86/include/asm/intel_rdt_common.h | 6 ++++++
> > arch/x86/kernel/cpu/intel_rdt.c | 38 +++++++++++++++++++++++++++++++++
> > 3 files changed, 45 insertions(+), 1 deletion(-)
> > create mode 100644 arch/x86/include/asm/intel_rdt_common.h
>
> ...
>
> > +static inline bool cache_alloc_hsw_probe(void)
> > +{
> > + u32 l, h_old, h_new, h_tmp;
> > +
> > + if (rdmsr_safe(MSR_IA32_PQR_ASSOC, &l, &h_old))
> > + return false;
> > +
> > + /*
> > + * Default value is always 0 if feature is present.
> > + */
> > + h_tmp = h_old ^ 0x1U;
> > + if (wrmsr_safe(MSR_IA32_PQR_ASSOC, l, h_tmp))
>
> I don't understand - you do the family/model check below and yet still
> use the _safe() variants. Isn't the presence of that MSR guaranteed on
> those machines?

The MSR is not guaranteed on every stepping of the family and model machine
because some parts may have the MSR fused off. And some bits in the MSR
may not be implemented on some parts. And in KVM or guest, the MSR may not
implemented. Those are reasons why we use wrmsr_safe/rdmsr_safe in Haswell
probe.

Thanks.

-Fenghua

2016-10-09 16:28:32

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 07/18] x86/intel_rdt: Add Haswell feature discovery

On Sun, Oct 09, 2016 at 10:09:37AM -0700, Fenghua Yu wrote:
> The MSR is not guaranteed on every stepping of the family and model machine
> because some parts may have the MSR fused off. And some bits in the MSR
> may not be implemented on some parts. And in KVM or guest, the MSR may not
> implemented. Those are reasons why we use wrmsr_safe/rdmsr_safe in Haswell
> probe.

Please add that info in a comment somewhere there as we'll all forget
about it otherwise.

Thanks.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2016-10-09 21:59:28

by Nilay Vaish

[permalink] [raw]
Subject: Re: [PATCH v3 10/18] x86/intel_rdt: Build structures for each resource based on cache topology

On 7 October 2016 at 21:45, Fenghua Yu <[email protected]> wrote:
> From: Tony Luck <[email protected]>
>
> diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
> index 251ac2a..bad8dc7 100644
> --- a/arch/x86/include/asm/intel_rdt.h
> +++ b/arch/x86/include/asm/intel_rdt.h
> @@ -40,6 +40,39 @@ struct rdt_resource {
> if (r->enabled)
>
> #define IA32_L3_CBM_BASE 0xc90
> +
> +/**
> + * struct rdt_domain - group of cpus sharing an RDT resource
> + * @list: all instances of this resource
> + * @id: unique id for this instance
> + * @cpu_mask: which cpus share this resource
> + * @cbm: array of cache bit masks (indexed by CLOSID)
> + */
> +struct rdt_domain {
> + struct list_head list;
> + int id;
> + struct cpumask cpu_mask;
> + u32 *cbm;
> +};
> +
> +/**
> + * struct msr_param - set a range of MSRs from a domain
> + * @res: The resource to use
> + * @low: Beginning index from base MSR
> + * @high: End index
> + */
> +struct msr_param {
> + struct rdt_resource *res;
> + int low;
> + int high;
> +};
> +
> +extern struct static_key_false rdt_enable_key;
> +extern struct mutex rdtgroup_mutex;
> +
> +int __init rdtgroup_init(void);
> +
> +extern struct rdtgroup *rdtgroup_default;

struct rdtgroup has not been defined yet. The statement above should
be part of the next patch.

--
Nilay

2016-10-09 22:33:02

by Nilay Vaish

[permalink] [raw]
Subject: Re: [PATCH v3 11/18] x86/intel_rdt: Add basic resctrl filesystem support

On 7 October 2016 at 21:45, Fenghua Yu <[email protected]> wrote:
> From: Fenghua Yu <[email protected]>
>
> diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
> index bad8dc7..f63815c 100644
> --- a/arch/x86/include/asm/intel_rdt.h
> +++ b/arch/x86/include/asm/intel_rdt.h
> @@ -2,6 +2,23 @@
> #define _ASM_X86_INTEL_RDT_H
>
> /**
> + * struct rdtgroup - store rdtgroup's data in resctrl file system.
> + * @kn: kernfs node
> + * @rdtgroup_list: linked list for all rdtgroups
> + * @closid: closid for this rdtgroup
> + */
> +struct rdtgroup {
> + struct kernfs_node *kn;
> + struct list_head rdtgroup_list;
> + int closid;
> +};
> +
> +/* List of all resource groups */
> +extern struct list_head rdt_all_groups;
> +
> +int __init rdtgroup_init(void);
> +
> +/**
> * struct rdt_resource - attributes of an RDT resource
> * @enabled: Is this feature enabled on this machine
> * @name: Name to use in "schemata" file
> @@ -39,6 +56,7 @@ struct rdt_resource {
> for (r = rdt_resources_all; r->name; r++) \
> if (r->enabled)
>
> +#define IA32_L3_QOS_CFG 0xc81
> #define IA32_L3_CBM_BASE 0xc90
>
> /**
> @@ -72,7 +90,7 @@ extern struct mutex rdtgroup_mutex;
>
> int __init rdtgroup_init(void);

This statement appears about twenty lines above as well.


> diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
> new file mode 100644
> index 0000000..c99d3a0
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
> @@ -0,0 +1,236 @@
> +/*
> + * User interface for Resource Alloction in Resource Director Technology(RDT)
> + *
> + * Copyright (C) 2016 Intel Corporation
> + *
> + * Author: Fenghua Yu <[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + *
> + * More information about RDT be found in the Intel (R) x86 Architecture
> + * Software Developer Manual.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/fs.h>
> +#include <linux/sysfs.h>
> +#include <linux/kernfs.h>
> +#include <linux/slab.h>
> +
> +#include <uapi/linux/magic.h>
> +
> +#include <asm/intel_rdt.h>
> +
> +DEFINE_STATIC_KEY_FALSE(rdt_enable_key);
> +struct kernfs_root *rdt_root;
> +struct rdtgroup rdtgroup_default;
> +LIST_HEAD(rdt_all_groups);
> +
> +static void l3_qos_cfg_update(void *arg)
> +{
> + struct rdt_resource *r = arg;
> +
> + wrmsrl(IA32_L3_QOS_CFG, r->cdp_enabled);
> +}
> +
> +static void set_l3_qos_cfg(struct rdt_resource *r)
> +{
> + struct list_head *l;
> + struct rdt_domain *d;
> + struct cpumask cpu_mask;
> +
> + cpumask_clear(&cpu_mask);
> + list_for_each(l, &r->domains) {
> + d = list_entry(l, struct rdt_domain, list);
> + cpumask_set_cpu(cpumask_any(&d->cpu_mask), &cpu_mask);
> + }
> + smp_call_function_many(&cpu_mask, l3_qos_cfg_update, r, 1);
> +}
> +
> +static int parse_rdtgroupfs_options(char *data, struct rdt_resource *r)
> +{
> + char *token, *o = data;
> +
> + while ((token = strsep(&o, ",")) != NULL) {
> + if (!*token)
> + return -EINVAL;
> +
> + if (!strcmp(token, "cdp"))
> + if (r->enabled && r->cdp_capable)
> + r->cdp_enabled = true;
> + }
> +
> + return 0;
> +}
> +
> +static struct dentry *rdt_mount(struct file_system_type *fs_type,
> + int flags, const char *unused_dev_name,
> + void *data)
> +{
> + struct dentry *dentry;
> + int ret;
> + bool new_sb;
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3];
> +
> + mutex_lock(&rdtgroup_mutex);
> + /*
> + * resctrl file system can only be mounted once.
> + */
> + if (static_branch_unlikely(&rdt_enable_key)) {
> + dentry = ERR_PTR(-EBUSY);
> + goto out;
> + }
> +
> + r->cdp_enabled = false;
> + ret = parse_rdtgroupfs_options(data, r);
> + if (ret) {
> + dentry = ERR_PTR(ret);
> + goto out;
> + }
> + if (r->cdp_enabled)
> + r->num_closid = r->max_closid / 2;
> + else
> + r->num_closid = r->max_closid;
> +
> + /* Recompute rdt_max_closid because CDP may have changed things. */
> + rdt_max_closid = 0;
> + for_each_rdt_resource(r)
> + rdt_max_closid = max(rdt_max_closid, r->num_closid);
> + if (rdt_max_closid > 32)
> + rdt_max_closid = 32;
> +
> + dentry = kernfs_mount(fs_type, flags, rdt_root,
> + RDTGROUP_SUPER_MAGIC, &new_sb);
> + if (IS_ERR(dentry))
> + goto out;
> + if (!new_sb) {
> + dentry = ERR_PTR(-EINVAL);
> + goto out;
> + }
> + r = &rdt_resources_all[RDT_RESOURCE_L3];
> + if (r->cdp_capable)
> + set_l3_qos_cfg(r);
> + static_branch_enable(&rdt_enable_key);
> +
> +out:
> + mutex_unlock(&rdtgroup_mutex);
> +
> + return dentry;
> +}

I am not too happy with the function above. I would have expected
that the function above executes the common code and also makes calls
to per resource functions. The way it has been written as now, it
seems to be mixing up things.

> +
> +static void reset_all_cbms(struct rdt_resource *r)
> +{
> + struct list_head *l;
> + struct rdt_domain *d;
> + struct msr_param msr_param;
> + struct cpumask cpu_mask;
> + int i;
> +
> + cpumask_clear(&cpu_mask);
> + msr_param.res = r;
> + msr_param.low = 0;
> + msr_param.high = r->max_closid;
> +
> + list_for_each(l, &r->domains) {
> + d = list_entry(l, struct rdt_domain, list);
> + cpumask_set_cpu(cpumask_any(&d->cpu_mask), &cpu_mask);
> +
> + for (i = 0; i < r->max_closid; i++)
> + d->cbm[i] = r->max_cbm;
> + }
> + smp_call_function_many(&cpu_mask, rdt_cbm_update, &msr_param, 1);
> +}
> +
> +static void rdt_kill_sb(struct super_block *sb)
> +{
> + struct rdt_resource *r;
> +
> + mutex_lock(&rdtgroup_mutex);
> +
> + /*Put everything back to default values. */
> + for_each_rdt_resource(r)
> + reset_all_cbms(r);
> + r = &rdt_resources_all[RDT_RESOURCE_L3];
> + if (r->cdp_capable) {
> + r->cdp_enabled = 0;
> + set_l3_qos_cfg(r);
> + }
> +
> + static_branch_disable(&rdt_enable_key);
> + kernfs_kill_sb(sb);
> + mutex_unlock(&rdtgroup_mutex);
> +}
> +
> +static struct file_system_type rdt_fs_type = {
> + .name = "resctrl",
> + .mount = rdt_mount,
> + .kill_sb = rdt_kill_sb,
> +};
> +
> +static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops = {
> +};
> +
> +static int __init rdtgroup_setup_root(void)
> +{
> + int ret;
> +
> + rdt_root = kernfs_create_root(&rdtgroup_kf_syscall_ops,
> + KERNFS_ROOT_CREATE_DEACTIVATED,
> + &rdtgroup_default);
> + if (IS_ERR(rdt_root))
> + return PTR_ERR(rdt_root);
> +
> + mutex_lock(&rdtgroup_mutex);
> +
> + rdtgroup_default.closid = 0;
> + list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);
> +
> + rdtgroup_default.kn = rdt_root->kn;
> + kernfs_activate(rdtgroup_default.kn);
> +
> + mutex_unlock(&rdtgroup_mutex);
> +
> + return ret;

You don't set ret anywhere.


--
Nilay

2016-10-10 16:01:57

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v3 04/18] x86/intel_rdt: Feature discovery

On 10/08/2016 10:11 AM, Nilay Vaish wrote:
>> > #define X86_FEATURE_RTM ( 9*32+11) /* Restricted Transactional Memory */
>> > #define X86_FEATURE_CQM ( 9*32+12) /* Cache QoS Monitoring */
>> > #define X86_FEATURE_MPX ( 9*32+14) /* Memory Protection Extension */
>> > +#define X86_FEATURE_RDT_A ( 9*32+15) /* Resource Director Technology Allocation */
>> > #define X86_FEATURE_AVX512F ( 9*32+16) /* AVX-512 Foundation */
>> > #define X86_FEATURE_AVX512DQ ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */
>> > #define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */
> I think these #defines are specific to Intel. I would prefer if we
> have _INTEL_ somewhere in them.

While that isn't a horrible idea, it's also not something that we've
enforced at *all* in the past. Would you suggest that we do this only
for features in the generic CPUID leaves, or all features?

2016-10-10 16:19:21

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 04/18] x86/intel_rdt: Feature discovery

On Mon, Oct 10, 2016 at 09:01:44AM -0700, Dave Hansen wrote:
> While that isn't a horrible idea, it's also not something that we've
> enforced at *all* in the past. Would you suggest that we do this only
> for features in the generic CPUID leaves, or all features?

There's no reason to do anything really: the 0x0... CPUID leaf is owned
by Intel and 0x8... by AMD. So far, if AMD implements a feature in the
Intel leaf, it sets the respective CPUID bit. So a feature can be
present in multiple vendors. Thus you don't need the vendor name in the
define.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2016-10-10 16:46:22

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v3 01/18] Documentation, ABI: Add a document entry for cache id

On Sat, Oct 08, 2016 at 12:11:08PM -0500, Nilay Vaish wrote:
> On 7 October 2016 at 21:45, Fenghua Yu <[email protected]> wrote:
> > From: Fenghua Yu <[email protected]>

> > + caches typically exist per core, but there may not be a
> > + power of two cores on a socket, so these caches may be
> > + numbered 0, 1, 2, 3, 4, 5, 8, 9, 10, ...
> > +
>
> While it is ok that the caches are not numbered contiguously, it is
> unclear how this is related to number of cores on a socket being a
> power of 2 or not.

That's a side effect of the x86 algorithm to generate the unique ID
which uses a shift to put the socket number in some upper bits while
leaving the "id within a socket" in the low bits.

I don't think it worth documenting here, but I noticed that we don't
keep the IDs within a core contguous either. On my 24 core Broadwell
they are not 0 ... 23 then a gap from 24 to 31. I actually have on
socket 0:

0, 1, 2, 3, 4, 5
8, 9, 10, 11, 12, 13
16, 17, 18, 19, 20, 21
24, 25, 26, 27, 28, 29

-Tony

2016-10-10 17:20:04

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v3 05/18] Documentation, x86: Documentation for Intel resource allocation user interface

On Sat, Oct 08, 2016 at 01:33:06PM -0700, Fenghua Yu wrote:
> On Sat, Oct 08, 2016 at 12:12:07PM -0500, Nilay Vaish wrote:
> > On 7 October 2016 at 21:45, Fenghua Yu <[email protected]> wrote:
> > > From: Fenghua Yu <[email protected]>
> > >
> > > +L3 details (code and data prioritization disabled)
> > > +--------------------------------------------------
> > > +With CDP disabled the L3 schemata format is:
> > > +
> > > + L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
> > > +
> > > +L3 details (CDP enabled via mount option to resctrl)
> > > +----------------------------------------------------
> > > +When CDP is enabled, you need to specify separate cache bit masks for
> > > +code and data access. The generic format is:
> > > +
> > > + L3:<cache_id0>=<d_cbm>,<i_cbm>;<cache_id1>=<d_cbm>,<i_cbm>;...
> >
> > Can we drop L3 here and instead say:
> > L<level>:<cache_id0>=<d_cbm>,<i_cbm>;<cache_id1>=<d_cbm>,<i_cbm>;...
> >
> > and similarly for without CDP as well.
>
> L3 and L2 are similar but different. L2 doesn't have CDP feature. It would
> be better to talk them separately here.
>

Perhaps we should document the general form, and then show examples
for L3 (and later L2 and other resources as they are added). Note
in particular that other resources are not cache-like ... so the
pattern you see between L3 and L2 is short lived. Future resources
that can be controlled are not caches.

General form of a resource line is:

{resource type}:[{resource instanceM}={resource value};]*N

where "M" iterates over all online instances of this resource,
and "N" is the total number. [and the ";" is a separator, not a terminator,
but I'm not sure how to write that].

For the "L3" resource there are two formats for the value.
With CDP enabled:
{d-cache bit mask},{i-cache bit mask}
with CDP disabled:
{cache bit mask}

"L2" resource doesn't support CDP, so the only format is {cache bit mask}


The next resource coming will have values that are simple ranges {0 .. max}

-Tony

2016-10-10 17:53:25

by Nilay Vaish

[permalink] [raw]
Subject: Re: [PATCH v3 13/18] x86/intel_rdt: Add mkdir to resctrl file system

On 7 October 2016 at 21:45, Fenghua Yu <[email protected]> wrote:
> From: Fenghua Yu <[email protected]>
> diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
> index 28df92e..efcbfe7 100644
> --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
> +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
> @@ -36,6 +36,52 @@ struct kernfs_root *rdt_root;
> struct rdtgroup rdtgroup_default;
> LIST_HEAD(rdt_all_groups);
>
> +/*
> + * Trivial allocator for CLOSIDs. Since h/w only supports
> + * a small number, we can keep a bitmap of free CLOSIDs in
> + * a single integer.
> + */
> +static int closid_free_map;
> +
> +static void closid_init(void)
> +{
> + closid_free_map = BIT_MASK(rdt_max_closid) - 1;
> +
> + /* CLOSID 0 is always reserved for the default group */
> + closid_free_map &= ~1;
> +}
> +
> +int closid_alloc(void)
> +{
> + int closid = ffs(closid_free_map);
> +
> + if (closid == 0)
> + return -ENOSPC;
> + closid--;
> + closid_free_map &= ~(1 << closid);
> +
> + return closid;
> +}
> +
> +static void closid_free(int closid)
> +{
> + closid_free_map |= 1 << closid;
> +}
> +
> +static struct rdtgroup *rdtgroup_alloc(void)
> +{
> + struct rdtgroup *rdtgrp;
> +
> + rdtgrp = kzalloc(sizeof(*rdtgrp), GFP_KERNEL);
> +
> + return rdtgrp;
> +}
> +
> +static void rdtgroup_free(struct rdtgroup *rdtgroup)
> +{
> + kfree(rdtgroup);
> +}

Any reason why you defined the above two functions? Why not call
kzalloc() and kfree() directly?

--
Nilay

2016-10-10 18:55:48

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v3 07/18] x86/intel_rdt: Add Haswell feature discovery

On Sun, Oct 09, 2016 at 06:28:23PM +0200, Borislav Petkov wrote:
> On Sun, Oct 09, 2016 at 10:09:37AM -0700, Fenghua Yu wrote:
> > The MSR is not guaranteed on every stepping of the family and model machine
> > because some parts may have the MSR fused off. And some bits in the MSR
> > may not be implemented on some parts. And in KVM or guest, the MSR may not
> > implemented. Those are reasons why we use wrmsr_safe/rdmsr_safe in Haswell
> > probe.
>
> Please add that info in a comment somewhere there as we'll all forget
> about it otherwise.

How about this (this diff on top of current series, but obviously we'll
fold it into part 07.


commit cdb05159fb91ed1f85c950c0f2c6de25f143961d
Author: Tony Luck <[email protected]>
Date: Mon Oct 10 11:48:42 2016 -0700

Update the HSW probe code - better comments, and use IA32_L3_CBM_BASE
as the probe MSR instead of PQR_ASSOC at suggestion of h/w architect).

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 4903e21d660d..e3c397306f1a 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -56,39 +56,39 @@ struct rdt_resource rdt_resources_all[] = {

/*
* cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
- * as it does not have CPUID enumeration support for Cache allocation.
+ * as they do not have CPUID enumeration support for Cache allocation.
+ * The check for Vendor/Family/Model is not enough to guarantee that
+ * the MSRs won't #GP fault because only the following SKUs support
+ * CAT:
+ * Intel(R) Xeon(R) CPU E5-2658 v3 @ 2.20GHz
+ * Intel(R) Xeon(R) CPU E5-2648L v3 @ 1.80GHz
+ * Intel(R) Xeon(R) CPU E5-2628L v3 @ 2.00GHz
+ * Intel(R) Xeon(R) CPU E5-2618L v3 @ 2.30GHz
+ * Intel(R) Xeon(R) CPU E5-2608L v3 @ 2.00GHz
*
- * Probes by writing to the high 32 bits(CLOSid) of the IA32_PQR_MSR and
- * testing if the bits stick. Max CLOSids is always 4 and max cbm length
+ * Probe by trying to write the first of the L3 cach mask registers
+ * and checking that the bits stick. Max CLOSids is always 4 and max cbm length
* is always 20 on hsw server parts. The minimum cache bitmask length
* allowed for HSW server is always 2 bits. Hardcode all of them.
*/
static inline bool cache_alloc_hsw_probe(void)
{
- u32 l, h_old, h_new, h_tmp;
+ u32 l, h;
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3];

- if (rdmsr_safe(MSR_IA32_PQR_ASSOC, &l, &h_old))
- return false;
-
- /*
- * Default value is always 0 if feature is present.
- */
- h_tmp = h_old ^ 0x1U;
- if (wrmsr_safe(MSR_IA32_PQR_ASSOC, l, h_tmp))
- return false;
- rdmsr(MSR_IA32_PQR_ASSOC, l, h_new);
-
- if (h_tmp != h_new)
- return false;
-
- wrmsr(MSR_IA32_PQR_ASSOC, l, h_old);
-
r->max_closid = 4;
r->num_closid = r->max_closid;
r->cbm_len = 20;
r->max_cbm = BIT_MASK(20) - 1;
r->min_cbm_bits = 2;
+
+ if (wrmsr_safe(IA32_L3_CBM_BASE, r->max_cbm, 0))
+ return false;
+ rdmsr(IA32_L3_CBM_BASE, l, h);
+
+ if (l != r->max_cbm)
+ return false;
+
r->enabled = true;

return true;

2016-10-10 23:47:58

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v3 11/18] x86/intel_rdt: Add basic resctrl filesystem support

On Sun, Oct 09, 2016 at 05:31:25PM -0500, Nilay Vaish wrote:
> > +static struct dentry *rdt_mount(struct file_system_type *fs_type,
> > + int flags, const char *unused_dev_name,
> > + void *data)
> > +{
> > + struct dentry *dentry;
> > + int ret;
> > + bool new_sb;
> > + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3];
> > +
> > + mutex_lock(&rdtgroup_mutex);
> > + /*
> > + * resctrl file system can only be mounted once.
> > + */
> > + if (static_branch_unlikely(&rdt_enable_key)) {
> > + dentry = ERR_PTR(-EBUSY);
> > + goto out;
> > + }
> > +
> > + r->cdp_enabled = false;
> > + ret = parse_rdtgroupfs_options(data, r);
> > + if (ret) {
> > + dentry = ERR_PTR(ret);
> > + goto out;
> > + }
> > + if (r->cdp_enabled)
> > + r->num_closid = r->max_closid / 2;
> > + else
> > + r->num_closid = r->max_closid;
> > +
> > + /* Recompute rdt_max_closid because CDP may have changed things. */
> > + rdt_max_closid = 0;
> > + for_each_rdt_resource(r)
> > + rdt_max_closid = max(rdt_max_closid, r->num_closid);
> > + if (rdt_max_closid > 32)
> > + rdt_max_closid = 32;
> > +
> > + dentry = kernfs_mount(fs_type, flags, rdt_root,
> > + RDTGROUP_SUPER_MAGIC, &new_sb);
> > + if (IS_ERR(dentry))
> > + goto out;
> > + if (!new_sb) {
> > + dentry = ERR_PTR(-EINVAL);
> > + goto out;
> > + }
> > + r = &rdt_resources_all[RDT_RESOURCE_L3];
> > + if (r->cdp_capable)
> > + set_l3_qos_cfg(r);
> > + static_branch_enable(&rdt_enable_key);
> > +
> > +out:
> > + mutex_unlock(&rdtgroup_mutex);
> > +
> > + return dentry;
> > +}
>
> I am not too happy with the function above. I would have expected
> that the function above executes the common code and also makes calls
> to per resource functions. The way it has been written as now, it
> seems to be mixing up things.

It is a bit muddled. Below patch is on top of the series, but can
be threaded back into appropriate parts.

1) Moves the r->cdp_enabled = false into parse_rdtgroupfs_options()

2) Move the rdt_max_closid re-computation out of mount() and into closid_init()
[not in above as it is introduced later].

3) Bonus ... gets rid of global "rdt_max_closid" as really nothing
outside of closid_init() needs to know what it is.

Better?

-Tony

commit 15a8ef34e2f849cdcb3e7faa72226cf2ecd82780
Author: Tony Luck <[email protected]>
Date: Mon Oct 10 16:18:31 2016 -0700

Respond to Nilay's comment that the "mount" code is mixed up.

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index aefa3a655408..1104b214b972 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -136,9 +136,6 @@ enum {
RDT_RESOURCE_L3,
};

-/* Maximum CLOSID allowed across all enabled resoources */
-extern int rdt_max_closid;
-
/* CPUID.(EAX=10H, ECX=ResID=1).EAX */
union cpuid_0x10_1_eax {
struct {
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index e3c397306f1a..5c504abc9239 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -35,8 +35,6 @@
/* Mutex to protect rdtgroup access. */
DEFINE_MUTEX(rdtgroup_mutex);

-int rdt_max_closid;
-
DEFINE_PER_CPU_READ_MOSTLY(int, cpu_closid);

#define domain_init(name) LIST_HEAD_INIT(rdt_resources_all[name].domains)
@@ -267,15 +265,6 @@ static int __init intel_rdt_late_init(void)
if (ret < 0)
return ret;

- for_each_rdt_resource(r)
- rdt_max_closid = max(rdt_max_closid, r->max_closid);
-
- /* limitation of our allocator, but h/w is more limited */
- if (rdt_max_closid > 32) {
- pr_warn("Only using 32/%d CLOSIDs\n", rdt_max_closid);
- rdt_max_closid = 32;
- }
-
rdtgroup_init();

for_each_rdt_resource(r)
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 82f53ad935ca..5978a3742e5e 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -47,6 +47,23 @@ static int closid_free_map;

static void closid_init(void)
{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3];
+ int rdt_max_closid;
+
+ /* Enabling L3 CDP halves the number of CLOSIDs */
+ if (r->cdp_enabled)
+ r->num_closid = r->max_closid / 2;
+ else
+ r->num_closid = r->max_closid;
+
+ /* Compute rdt_max_closid across all resources */
+ rdt_max_closid = 0;
+ for_each_rdt_resource(r)
+ rdt_max_closid = max(rdt_max_closid, r->num_closid);
+ if (rdt_max_closid > 32) {
+ pr_warn_once("Only using 32/%d CLOSIDs\n", rdt_max_closid);
+ rdt_max_closid = 32;
+ }
closid_free_map = BIT_MASK(rdt_max_closid) - 1;

/* CLOSID 0 is always reserved for the default group */
@@ -546,10 +563,12 @@ static void set_l3_qos_cfg(struct rdt_resource *r)
smp_call_function_many(&cpu_mask, l3_qos_cfg_update, r, 1);
}

-static int parse_rdtgroupfs_options(char *data, struct rdt_resource *r)
+static int parse_rdtgroupfs_options(char *data)
{
char *token, *o = data;
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3];

+ r->cdp_enabled = false;
while ((token = strsep(&o, ",")) != NULL) {
if (!*token)
return -EINVAL;
@@ -617,7 +636,6 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
struct dentry *dentry;
int ret;
bool new_sb;
- struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3];

mutex_lock(&rdtgroup_mutex);
/*
@@ -628,23 +646,11 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
goto out;
}

- r->cdp_enabled = false;
- ret = parse_rdtgroupfs_options(data, r);
+ ret = parse_rdtgroupfs_options(data);
if (ret) {
dentry = ERR_PTR(ret);
goto out;
}
- if (r->cdp_enabled)
- r->num_closid = r->max_closid / 2;
- else
- r->num_closid = r->max_closid;
-
- /* Recompute rdt_max_closid because CDP may have changed things. */
- rdt_max_closid = 0;
- for_each_rdt_resource(r)
- rdt_max_closid = max(rdt_max_closid, r->num_closid);
- if (rdt_max_closid > 32)
- rdt_max_closid = 32;
closid_init();

dentry = kernfs_mount(fs_type, flags, rdt_root,
@@ -655,9 +661,8 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
dentry = ERR_PTR(-EINVAL);
goto out;
}
- r = &rdt_resources_all[RDT_RESOURCE_L3];
- if (r->cdp_capable)
- set_l3_qos_cfg(r);
+ if (rdt_resources_all[RDT_RESOURCE_L3].cdp_capable)
+ set_l3_qos_cfg(&rdt_resources_all[RDT_RESOURCE_L3]);
static_branch_enable(&rdt_enable_key);

out:

2016-10-11 11:18:34

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 07/18] x86/intel_rdt: Add Haswell feature discovery

On Mon, Oct 10, 2016 at 11:55:45AM -0700, Luck, Tony wrote:
> How about this (this diff on top of current series, but obviously we'll
> fold it into part 07.
>
>
> commit cdb05159fb91ed1f85c950c0f2c6de25f143961d
> Author: Tony Luck <[email protected]>
> Date: Mon Oct 10 11:48:42 2016 -0700
>
> Update the HSW probe code - better comments, and use IA32_L3_CBM_BASE
> as the probe MSR instead of PQR_ASSOC at suggestion of h/w architect).
>
> diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
> index 4903e21d660d..e3c397306f1a 100644
> --- a/arch/x86/kernel/cpu/intel_rdt.c
> +++ b/arch/x86/kernel/cpu/intel_rdt.c
> @@ -56,39 +56,39 @@ struct rdt_resource rdt_resources_all[] = {
>
> /*
> * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
> - * as it does not have CPUID enumeration support for Cache allocation.
> + * as they do not have CPUID enumeration support for Cache allocation.
> + * The check for Vendor/Family/Model is not enough to guarantee that
> + * the MSRs won't #GP fault because only the following SKUs support
> + * CAT:
> + * Intel(R) Xeon(R) CPU E5-2658 v3 @ 2.20GHz
> + * Intel(R) Xeon(R) CPU E5-2648L v3 @ 1.80GHz
> + * Intel(R) Xeon(R) CPU E5-2628L v3 @ 2.00GHz
> + * Intel(R) Xeon(R) CPU E5-2618L v3 @ 2.30GHz
> + * Intel(R) Xeon(R) CPU E5-2608L v3 @ 2.00GHz
> *
> - * Probes by writing to the high 32 bits(CLOSid) of the IA32_PQR_MSR and
> - * testing if the bits stick. Max CLOSids is always 4 and max cbm length
> + * Probe by trying to write the first of the L3 cach mask registers
> + * and checking that the bits stick. Max CLOSids is always 4 and max cbm length

I wonder what's worse - comparing SKU strings - we know that from the MCE
recovery experience - or poking at maybe nonexistent MSRs? :-)

I guess the latter is cleaner so let's try it.

Thanks for the writeup in the comments - this is exactly what I was
thinking about!

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2016-10-11 14:51:31

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH v3 07/18] x86/intel_rdt: Add Haswell feature discovery

> I wonder what's worse - comparing SKU strings - we know that from the MCE
> recovery experience - or poking at maybe nonexistent MSRs? :-)
>
> I guess the latter is cleaner so let's try it.

Vikas got beat up for comparing SKU strings, so the probe method
was offered as an alternative. It's definitely more robust. E.g. my
list turns out to be incomplete, somehow I forgot to include:

Intel(R) Xeon(R) processor E5-2658A v3

-Tony


2016-10-11 16:49:39

by Nilay Vaish

[permalink] [raw]
Subject: Re: [PATCH v3 01/18] Documentation, ABI: Add a document entry for cache id

On 10 October 2016 at 11:45, Luck, Tony <[email protected]> wrote:
> On Sat, Oct 08, 2016 at 12:11:08PM -0500, Nilay Vaish wrote:
>> On 7 October 2016 at 21:45, Fenghua Yu <[email protected]> wrote:
>> > From: Fenghua Yu <[email protected]>
>
>> > + caches typically exist per core, but there may not be a
>> > + power of two cores on a socket, so these caches may be
>> > + numbered 0, 1, 2, 3, 4, 5, 8, 9, 10, ...
>> > +
>>
>> While it is ok that the caches are not numbered contiguously, it is
>> unclear how this is related to number of cores on a socket being a
>> power of 2 or not.
>
> That's a side effect of the x86 algorithm to generate the unique ID
> which uses a shift to put the socket number in some upper bits while
> leaving the "id within a socket" in the low bits.
>
> I don't think it worth documenting here, but I noticed that we don't
> keep the IDs within a core contguous either. On my 24 core Broadwell
> they are not 0 ... 23 then a gap from 24 to 31. I actually have on
> socket 0:
>
> 0, 1, 2, 3, 4, 5
> 8, 9, 10, 11, 12, 13
> 16, 17, 18, 19, 20, 21
> 24, 25, 26, 27, 28, 29
>

Thanks for the info.

--
Nilay

2016-10-11 16:59:40

by Nilay Vaish

[permalink] [raw]
Subject: Re: [PATCH v3 04/18] x86/intel_rdt: Feature discovery

On 8 October 2016 at 14:52, Borislav Petkov <[email protected]> wrote:
> On Sat, Oct 08, 2016 at 01:54:54PM -0700, Fenghua Yu wrote:
>> > I think these #defines are specific to Intel. I would prefer if we
>> > have _INTEL_ somewhere in them.
>
> We don't generally add vendor names to those defines. Even more so if
> the 0x0... leaf range is Intel-specific anyway.
>
>> Is adding "Intel" in comment good?
>
> I don't see any need if the leaf has already this heading:
>
> /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
>

I think we should go with Fenghua' suggestion on this. Reading the
code around the edits from this patch, it seems word 7 is not owned by
anyone. Both AMD and Intel seem to be using it.

--
Nilay

2016-10-11 17:08:39

by Nilay Vaish

[permalink] [raw]
Subject: Re: [PATCH v3 05/18] Documentation, x86: Documentation for Intel resource allocation user interface

On 10 October 2016 at 12:19, Luck, Tony <[email protected]> wrote:
> On Sat, Oct 08, 2016 at 01:33:06PM -0700, Fenghua Yu wrote:
>> On Sat, Oct 08, 2016 at 12:12:07PM -0500, Nilay Vaish wrote:
>> > On 7 October 2016 at 21:45, Fenghua Yu <[email protected]> wrote:
>> > > From: Fenghua Yu <[email protected]>
>> > >
>> > > +L3 details (code and data prioritization disabled)
>> > > +--------------------------------------------------
>> > > +With CDP disabled the L3 schemata format is:
>> > > +
>> > > + L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
>> > > +
>> > > +L3 details (CDP enabled via mount option to resctrl)
>> > > +----------------------------------------------------
>> > > +When CDP is enabled, you need to specify separate cache bit masks for
>> > > +code and data access. The generic format is:
>> > > +
>> > > + L3:<cache_id0>=<d_cbm>,<i_cbm>;<cache_id1>=<d_cbm>,<i_cbm>;...
>> >
>> > Can we drop L3 here and instead say:
>> > L<level>:<cache_id0>=<d_cbm>,<i_cbm>;<cache_id1>=<d_cbm>,<i_cbm>;...
>> >
>> > and similarly for without CDP as well.
>>
>> L3 and L2 are similar but different. L2 doesn't have CDP feature. It would
>> be better to talk them separately here.
>>
>
> Perhaps we should document the general form, and then show examples
> for L3 (and later L2 and other resources as they are added). Note
> in particular that other resources are not cache-like ... so the
> pattern you see between L3 and L2 is short lived. Future resources
> that can be controlled are not caches.
>
> General form of a resource line is:
>
> {resource type}:[{resource instanceM}={resource value};]*N
>
> where "M" iterates over all online instances of this resource,
> and "N" is the total number. [and the ";" is a separator, not a terminator,
> but I'm not sure how to write that].
>
> For the "L3" resource there are two formats for the value.
> With CDP enabled:
> {d-cache bit mask},{i-cache bit mask}
> with CDP disabled:
> {cache bit mask}


I am in favour of documenting the general form, but Fenghua and you
can take a call on this.

>
> "L2" resource doesn't support CDP, so the only format is {cache bit mask}
>
>
> The next resource coming will have values that are simple ranges {0 .. max}
>

Regarding addition of more resources, I was wondering if one common
definition of struct rdt_resource to be used for each resource the way
to take. As you point out, L2 caches will not support CDP, but cdp
would be part of the variable describing L2 cache. Should we have
separate structs for each resource?

--
Nilay

2016-10-11 17:08:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 04/18] x86/intel_rdt: Feature discovery

On Tue, Oct 11, 2016 at 11:57:52AM -0500, Nilay Vaish wrote:
> I think we should go with Fenghua' suggestion on this. Reading the
> code around the edits from this patch, it seems word 7 is not owned by
> anyone. Both AMD and Intel seem to be using it.

No, this is wrong. The 0x0 CPUID leaf is owned by Intel. Let me quote my
reply to Dave from yesterday:

There's no reason to do anything really: the 0x0... CPUID leaf is owned
by Intel and 0x8... by AMD. So far, if AMD implements a feature in the
Intel leaf, it sets the respective CPUID bit. So a feature can be
present in multiple vendors. Thus you don't need the vendor name in the
define."

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2016-10-11 18:04:11

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v3 05/18] Documentation, x86: Documentation for Intel resource allocation user interface

On Tue, Oct 11, 2016 at 12:07:33PM -0500, Nilay Vaish wrote:
> On 10 October 2016 at 12:19, Luck, Tony <[email protected]> wrote:
> > The next resource coming will have values that are simple ranges {0 .. max}
> >
>
> Regarding addition of more resources, I was wondering if one common
> definition of struct rdt_resource to be used for each resource the way
> to take. As you point out, L2 caches will not support CDP, but cdp
> would be part of the variable describing L2 cache. Should we have
> separate structs for each resource?

Yes and no.

CDP is currently the only real difference between L2 and L3, and
the current code will just never set the @cdp_capable field to true.
It seems less of a burdern to carry two extra "bool" fields in the
resource that won't use them, than to architect some complex scheme
that allows different resources to have different structures.

The structure essentially has to be the same for each resource
so that we can do:

for_each_rdt_resource(r) {
something()
}

all over the code (from bringing cpus online and taking them off again
to deciding which lines to print in the schemata file). There are over
a dozen places where this happens, and it is the key to making it easy
to add support for new resources.

However, I expect the structure to morph as new resources require a
different "something()". E.g. validating a new resource limit value in
the schemata file. Right now the function that does that has the weird
rules about consecutive bits in the bitmask hard wired into it. When
we get a resource that allows a simple range, we'll want a simpler
function. Spoiler: that resource also expects "0" as the default value
which doesn't restrict accesss, bigger numbers limiting access more.
So we may also want an "initial_value", or "default_value" field so the
discovery and cleanup code can set the per resource MSR array correctly.

So we'll add some per-resource flags. Perhaps the validation function
will be a function pointer in the structure. If there are many such
differences, we might end up with all the common fields at the front
of the structure, and a union of structures with the per resource type
bits. But we can make that call as we add new resources.

-Tony