This series adds the support for Assignable Bandwidth Monitoring Counters
(ABMC). It is also called QoS RMID Pinning feature
The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC). The documentation is available at
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
The patches are based on top of commit
commit 573c35550b2b ("Merge branch into tip/master: 'x86/percpu'")
# Introduction
Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware.
The counters of any other RMIDs which are no longer being tracked will be
reset to zero. The MBM event counters return "Unavailable" for the RMIDs
that are not tracked by hardware. So, there can be only limited number of
groups that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups
are being tracked for certain point of time. Users do not have the option
to monitor a group or set of groups for certain period of time without
worrying about RMID being reset in between.
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as it is assigned.
The assigned RMID will be tracked by the hardware until the user unassigns
it manually. There is no need to worry about counters being reset during
this period. Additionally, the user can specify a bitmask identifying the
specific bandwidth types from the given source to track with the counter.
Without ABMC enabled, monitoring will work in current mode without
assignment option.
# Linux Implementation
Linux resctrl subsystem provides the interface to count maximum of two
memory bandwidth events per group, from a combination of available total
and local events. Keeping the current interface, users can enable a maximum
of 2 ABMC counters per group. User will also have the option to enable only
one counter to the group. If the system runs out of assignable ABMC
counters, kernel will display an error. Users need to disable an already
enabled counter to make space for new assignments.
# Examples
a. Check if ABMC support is available
#mount -t resctrl resctrl /sys/fs/resctrl/
#cat /sys/fs/resctrl/info/L3_MON/mbm_assign
[abmc]
mbm_legacy
Linux kernel detected ABMC feature and it is enabled.
b. Check how many ABMC counters are available.
#cat /sys/fs/resctrl/info/L3_MON/num_cntrs
32
c. Create few resctrl groups.
# mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
# mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
to list and modify the group's assignment states.
The list follows the following format:
"<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
Format for specific type of groups:
* Default CTRL_MON group:
"//<domain_id>=<assignment_flags>"
* Non-default CTRL_MON group:
"<CTRL_MON group>//<domain_id>=<assignment_flags>"
* Child MON group of default CTRL_MON group:
"/<MON group>/<domain_id>=<assignment_flags>"
* Child MON group of non-default CTRL_MON group:
"<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
Assignment flags can be one of the following:
t MBM total event is enabled
l MBM local event is enabled
tl Both total and local MBM events are enabled
_ None of the MBM events are enabled
Examples:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
//0=tl;1=tl;
/child_default_mon_grp/0=tl;1=tl;
There are four groups and all the groups have local and total
event enabled on domain 0 and 1.
=tl means both total and local events are enabled.
"//" - This is a default CONTROL MON group
"non_default_ctrl_mon_grp//" - This is non default CONTROL MON group
"/child_default_mon_grp/" - This is Child MON group of the defult group
"non_default_ctrl_mon_grp/child_non_default_mon_grp/" - This is child
MON group of the non default group
e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control.
The write format is similar to the above list format with addition of
op-code for the assignment operation.
* Default CTRL_MON group:
"//<domain_id><op-code><assignment_flags>"
* Non-default CTRL_MON group:
"<CTRL_MON group>//<domain_id><op-code><assignment_flags>"
* Child MON group of default CTRL_MON group:
"/<MON group>/<domain_id><op-code><assignment_flags>"
* Child MON group of non-default CTRL_MON group:
"<CTRL_MON group>/<MON group>/<domain_id><op-code><assignment_flags>"
Op-code can be one of the following:
= Update the assignment to match the flags
+ Assign a new state
- Unassign a new state
Initial group status:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
//0=tl;1=tl;
/child_default_mon_grp/0=tl;1=tl;
To update the default group to enable only total event on domain 0:
# echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
Assignment status after the update:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
//0=t;1=tl;
/child_default_mon_grp/0=tl;1=tl;
To update the MON group child_default_mon_grp to remove total event on domain 1:
# echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
Assignment status after the update:
$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
//0=t;1=l;
/child_default_mon_grp/0=t;1=tl;
To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
remove both local and total events on domain 1:
# echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
/sys/fs/resctrl/info/L3_MON/mbm_assign_control
Assignment status after the update:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
//0=t;1=l;
/child_default_mon_grp/0=t;1=tl;
To update the default group to add a total event domain 1.
# echo "//1+t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
Assignment status after the update:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
non_default_ctrl_mon_grp//0=tl;1=tl;
non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
//0=t;1=tl;
/child_default_mon_grp/0=t;1=tl;
f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
There is no change in reading the evetns with ABMC. If the event is unassigned
when reading, then the read will come back as Unavailable.
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
779247936
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
765207488
g. Users will have the option to go back to legacy_mbm mode if required.
This can be done using the following command.
# echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign
abmc
[mbm_legacy]
h. Check the bandwidth configuration for the group. Note that bandwidth
configuration has a domain scope. Total event defaults to 0x7F (to
count all the events) and local event defaults to 0x15 (to count all
the local numa events). The event bitmap decoding is available at
https://www.kernel.org/doc/Documentation/x86/resctrl.rst
in section "mbm_total_bytes_config", "mbm_local_bytes_config":
#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
0=0x7f;1=0x7f
#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
0=0x15;1=0x15
j. Change the bandwidth source for domain 0 for the total event to count only reads.
Note that this change effects total events on the domain 0.
#echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
0=0x33;1=0x7F
k. Now read the total event again. The mbm_total_bytes should display
only the read events.
#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
314101
l. Unmount the resctrl
#umount /sys/fs/resctrl/
---
v4:
Main change is domain specific event assignment.
Kept the ABMC feature as a default.
Dynamcic switching between ABMC and mbm_legacy is still allowed.
We are still not clear about mount option.
Moved the monitoring related data in resctrl_mon structure from rdt_resource.
Fixed the display of legacy and ABMC mode.
Used bimap APIs when possible.
Removed event configuration read from MSRs. We can use the
internal saved data.(patch 12)
Added more comments about L3_QOS_ABMC_CFG MSR.
Added IPIs to read the assignment status for each domain (patch 18 and 19)
More details in each patch.
v3:
This series adds the support for global assignment mode discussed in
the thread. https://lore.kernel.org/lkml/[email protected]/
Removed the individual assignment mode and included the global assignment interface.
Added following interface files.
a. /sys/fs/resctrl/info/L3_MON/mbm_assign
Used for displaying the current assignment mode and switch between
ABMC and legacy mode.
b. /sys/fs/resctrl/info/L3_MON/mbm_assign_control
Used for lising the groups assignment mode and modify the assignment states.
c. Most of the changes are related to the new interface.
d. Addressed the comments from Reinette, James and Peter.
e. Hope I have addressed most of the major feedbacks discussed. If I missed
something then it is not intentional. Please feel free to comment.
f. Sending this as an RFC as per Reinette's comment. So, this is still open
for discussion.
v2:
a. Major change is the way ABMC is enabled. Earlier, user needed to remount
with -o abmc to enable ABMC feature. Removed that option now.
Now users can enable ABMC by "$echo 1 to /sys/fs/resctrl/info/L3_MON/mbm_assign_enable".
b. Added new word 21 to x86/cpufeatures.h.
c. Display unsupported if user attempts to read the events when ABMC is enabled
and event is not assigned.
d. Display monitor_state as "Unsupported" when ABMC is disabled.
e. Text updates and rebase to latest tip tree (as of Jan 18).
f. This series is still work in progress. I am yet to hear from ARM developers.
v4:
https://lore.kernel.org/lkml/[email protected]/
v2:
https://lore.kernel.org/lkml/[email protected]/
v1 :
https://lore.kernel.org/lkml/[email protected]/
Babu Moger (17):
x86/resctrl: Add support for Assignable Bandwidth Monitoring Counters
(ABMC)
x86/resctrl: Add ABMC feature in the command line options
x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
x86/resctrl: Introduce resctrl_file_fflags_init
x86/resctrl: Introduce the interface to display the assignment state
x86/resctrl: Introduce interface to display number of ABMC counters
x86/resctrl: Add support to enable/disable ABMC feature
x86/resctrl: Initialize assignable counters bitmap
x86/resctrl: Introduce assign state for the mon group
x86/resctrl: Add data structures for ABMC assignment
x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg
x86/resctrl: Add the functionality to assign the RMID
x86/resctrl: Add the functionality to unassign the RMID
x86/resctrl: Enable ABMC by default on resctrl mount
x86/resctrl: Introduce the interface switch between ABMC and
legacy_mbm
x86/resctrl: Introduce interface to list assignment states of all the
groups
x86/resctrl: Introduce interface to modify assignment states of the
groups
.../admin-guide/kernel-parameters.txt | 2 +-
Documentation/arch/x86/resctrl.rst | 144 ++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 2 +
arch/x86/kernel/cpu/cpuid-deps.c | 3 +
arch/x86/kernel/cpu/resctrl/core.c | 25 +-
arch/x86/kernel/cpu/resctrl/internal.h | 56 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 24 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 714 +++++++++++++++++-
arch/x86/kernel/cpu/scattered.c | 1 +
include/linux/resctrl.h | 12 +
11 files changed, 964 insertions(+), 20 deletions(-)
--
2.34.1
Babu Moger (17):
x86/resctrl: Add support for Assignable Bandwidth Monitoring Counters
(ABMC)
x86/resctrl: Add ABMC feature in the command line options
x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
x86/resctrl: Introduce resctrl_file_fflags_init
x86/resctrl: Introduce the interface to display the assignment state
x86/resctrl: Introduce interface to display number of ABMC counters
x86/resctrl: Add support to enable/disable ABMC feature
x86/resctrl: Initialize assignable counters bitmap
x86/resctrl: Introduce assign state for the mon group
x86/resctrl: Add data structures for ABMC assignment
x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg
x86/resctrl: Add the functionality to assign the RMID
x86/resctrl: Add the functionality to unassign the RMID
x86/resctrl: Enable ABMC by default on resctrl mount
x86/resctrl: Introduce the interface switch between ABMC and
legacy_mbm
x86/resctrl: Introduce interface to list assignment states of all the
groups
x86/resctrl: Introduce interface to modify assignment states of the
groups
.../admin-guide/kernel-parameters.txt | 2 +-
Documentation/arch/x86/resctrl.rst | 144 ++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 2 +
arch/x86/kernel/cpu/cpuid-deps.c | 3 +
arch/x86/kernel/cpu/resctrl/core.c | 25 +-
arch/x86/kernel/cpu/resctrl/internal.h | 56 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 24 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 714 +++++++++++++++++-
arch/x86/kernel/cpu/scattered.c | 1 +
include/linux/resctrl.h | 12 +
11 files changed, 964 insertions(+), 20 deletions(-)
--
2.34.1
Babu Moger (19):
x86/resctrl: Add support for Assignable Bandwidth Monitoring Counters
(ABMC)
x86/resctrl: Add ABMC feature in the command line options
x86/resctrl: Consolidate monitoring related data from rdt_resource
x86/resctrl: Detect Assignable Bandwidth Monitoring feature details
x86/resctrl: Introduce resctrl_file_fflags_init to initialize fflags
x86/resctrl: Introduce interface to display number of ABMC counters
x86/resctrl: Add support to enable/disable ABMC feature
x86/resctrl: Introduce the interface to display monitor mode
x86/resctrl: Initialize ABMC counters bitmap
x86/resctrl: Introduce ABMC state for the monitor group
x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg
x86/resctrl: Remove MSR reading of event configuration value
x86/resctrl: Add data structures for ABMC assignment
x86/resctrl: Add the interface to assign ABMC counter
x86/resctrl: Add the interface to unassign ABMC counter
x86/resctrl: Enable ABMC by default on resctrl mount
x86/resctrl: Introduce the interface switch between ABMC and
mbm_legacy
x86/resctrl: Introduce interface to list monitor states of all the
groups
x86/resctrl: Introduce interface to modify assignment states of the
groups
.../admin-guide/kernel-parameters.txt | 2 +-
Documentation/arch/x86/resctrl.rst | 161 ++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 3 +
arch/x86/kernel/cpu/cpuid-deps.c | 3 +
arch/x86/kernel/cpu/resctrl/core.c | 10 +-
arch/x86/kernel/cpu/resctrl/internal.h | 61 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 60 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 830 +++++++++++++++++-
arch/x86/kernel/cpu/scattered.c | 1 +
include/linux/resctrl.h | 20 +-
11 files changed, 1086 insertions(+), 66 deletions(-)
--
2.34.1
Users can create as many monitor groups as RMIDs supported by the hardware.
However, bandwidth monitoring feature on AMD system only guarantees that
RMIDs currently assigned to a processor will be tracked by hardware.
The counters of any other RMIDs which are no longer being tracked will be
reset to zero. The MBM event counters return "Unavailable" for the RMIDs
that are not tracked by hardware. So, there can be only limited number of
groups that can give guaranteed monitoring numbers. With ever changing
configurations there is no way to definitely know which of these groups
are being tracked for certain point of time. Users do not have the option
to monitor a group or set of groups for certain period of time without
worrying about RMID being reset in between.
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as it is assigned.
The assigned RMID will be tracked by the hardware until the user unassigns
it manually. There is no need to worry about counters being reset during
this period. Additionally, the user can specify a bitmask identifying the
specific bandwidth types from the given source to track with the counter.
Without ABMC enabled, monitoring will work in current mode without
assignment option.
Linux resctrl subsystem provides the interface to count maximum of two
memory bandwidth events per group, from a combination of available total
and local events. Keeping the current interface, users can enable a maximum
of 2 ABMC counters per group. User will also have the option to enable only
one counter to the group. If the system runs out of assignable ABMC
counters, kernel will display an error. Users need to disable an already
enabled counter to make space for new assignments.
The feature can be detected via CPUID_Fn80000020_EBX_x00 bit 5.
Bits Description
5 ABMC (Assignable Bandwidth Monitoring Counters)
The feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Note: Checkpatch checks/warnings are ignored to maintain coding style.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <[email protected]>
---
v4: Changes because of rebase. Feature word 21 has few more additions now.
Changed the text to "tracked by hardware" instead of active.
v3: Change because of rebase. Actual patch did not change.
v2: Added dependency on X86_FEATURE_BMEC.
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 3 +++
arch/x86/kernel/cpu/scattered.c | 1 +
3 files changed, 5 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 3c7434329661..fbe118fa7f88 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -470,6 +470,7 @@
#define X86_FEATURE_BHI_CTRL (21*32+ 2) /* "" BHI_DIS_S HW control available */
#define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* "" BHI_DIS_S HW control enabled */
#define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
+#define X86_FEATURE_ABMC (21*32+ 5) /* "" Assignable Bandwidth Monitoring Counters */
/*
* BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index b7d9f530ae16..5227a6232e9e 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -70,6 +70,9 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC },
{ X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_TOTAL },
{ X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_LOCAL },
+ { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_TOTAL },
+ { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_LOCAL },
+ { X86_FEATURE_ABMC, X86_FEATURE_BMEC },
{ X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL },
{ X86_FEATURE_AVX512_FP16, X86_FEATURE_AVX512BW },
{ X86_FEATURE_ENQCMD, X86_FEATURE_XSAVES },
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index af5aa2c754c2..411b18c962bb 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -48,6 +48,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_MBA, CPUID_EBX, 6, 0x80000008, 0 },
{ X86_FEATURE_SMBA, CPUID_EBX, 2, 0x80000020, 0 },
{ X86_FEATURE_BMEC, CPUID_EBX, 3, 0x80000020, 0 },
+ { X86_FEATURE_ABMC, CPUID_EBX, 5, 0x80000020, 0 },
{ X86_FEATURE_PERFMON_V2, CPUID_EAX, 0, 0x80000022, 0 },
{ X86_FEATURE_AMD_LBR_V2, CPUID_EAX, 1, 0x80000022, 0 },
{ X86_FEATURE_AMD_LBR_PMC_FREEZE, CPUID_EAX, 2, 0x80000022, 0 },
--
2.34.1
Add the command line option to enable or disable the new resctrl feature
ABMC (Assignable Bandwidth Monitoring Counters).
Signed-off-by: Babu Moger <[email protected]>
---
v4: No changes
v3: No changes
v2: No changes
---
Documentation/admin-guide/kernel-parameters.txt | 2 +-
Documentation/arch/x86/resctrl.rst | 1 +
arch/x86/kernel/cpu/resctrl/core.c | 2 ++
3 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 500cfa776225..66a1be9d0ea6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5602,7 +5602,7 @@
rdt= [HW,X86,RDT]
Turn on/off individual RDT features. List is:
cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
- mba, smba, bmec.
+ mba, smba, bmec, abmc.
E.g. to turn on cmt and turn off mba use:
rdt=cmt,!mba
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 627e23869bca..02790efaabcc 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -26,6 +26,7 @@ MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local"
MBA (Memory Bandwidth Allocation) "mba"
SMBA (Slow Memory Bandwidth Allocation) ""
BMEC (Bandwidth Monitoring Event Configuration) ""
+ABMC (Assignable Bandwidth Monitoring Counters) ""
=============================================== ================================
Historically, new features were made visible by default in /proc/cpuinfo. This
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index a113d9aba553..df1b43fd9f0e 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -651,6 +651,7 @@ enum {
RDT_FLAG_MBA,
RDT_FLAG_SMBA,
RDT_FLAG_BMEC,
+ RDT_FLAG_ABMC,
};
#define RDT_OPT(idx, n, f) \
@@ -676,6 +677,7 @@ static struct rdt_options rdt_options[] __initdata = {
RDT_OPT(RDT_FLAG_MBA, "mba", X86_FEATURE_MBA),
RDT_OPT(RDT_FLAG_SMBA, "smba", X86_FEATURE_SMBA),
RDT_OPT(RDT_FLAG_BMEC, "bmec", X86_FEATURE_BMEC),
+ RDT_OPT(RDT_FLAG_ABMC, "abmc", X86_FEATURE_ABMC),
};
#define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
--
2.34.1
Consolidate all the data related to monitoring into separate structure
for better readability.
Suggested-by: Reinette Chatre <[email protected]>
Signed-off-by: Babu Moger <[email protected]>
---
v4: New patch.
---
arch/x86/kernel/cpu/resctrl/core.c | 2 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 16 ++++++++--------
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++----
include/linux/resctrl.h | 16 ++++++++++++----
4 files changed, 25 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index df1b43fd9f0e..6cf3c887c693 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -535,7 +535,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
return;
}
- if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+ if (r->mon_capable && arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
domain_free(hw_dom);
return;
}
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 2345e6836593..b35d04fc761b 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -254,11 +254,11 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d)
if (is_mbm_total_enabled())
memset(hw_dom->arch_mbm_total, 0,
- sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
+ sizeof(*hw_dom->arch_mbm_total) * r->mon.num_rmid);
if (is_mbm_local_enabled())
memset(hw_dom->arch_mbm_local, 0,
- sizeof(*hw_dom->arch_mbm_local) * r->num_rmid);
+ sizeof(*hw_dom->arch_mbm_local) * r->mon.num_rmid);
}
static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
@@ -1004,14 +1004,14 @@ static struct mon_evt mbm_local_event = {
*/
static void l3_mon_evt_init(struct rdt_resource *r)
{
- INIT_LIST_HEAD(&r->evt_list);
+ INIT_LIST_HEAD(&r->mon.evt_list);
if (is_llc_occupancy_enabled())
- list_add_tail(&llc_occupancy_event.list, &r->evt_list);
+ list_add_tail(&llc_occupancy_event.list, &r->mon.evt_list);
if (is_mbm_total_enabled())
- list_add_tail(&mbm_total_event.list, &r->evt_list);
+ list_add_tail(&mbm_total_event.list, &r->mon.evt_list);
if (is_mbm_local_enabled())
- list_add_tail(&mbm_local_event.list, &r->evt_list);
+ list_add_tail(&mbm_local_event.list, &r->mon.evt_list);
}
int __init rdt_get_mon_l3_config(struct rdt_resource *r)
@@ -1023,7 +1023,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
- r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1;
+ r->mon.num_rmid = boot_cpu_data.x86_cache_max_rmid + 1;
hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
@@ -1038,7 +1038,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
*
* For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
*/
- threshold = resctrl_rmid_realloc_limit / r->num_rmid;
+ threshold = resctrl_rmid_realloc_limit / r->mon.num_rmid;
/*
* Because num_rmid may not be a power of two, round the value
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 02f213f1c51c..7114d58ef1e3 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1098,7 +1098,7 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
{
struct rdt_resource *r = of->kn->parent->priv;
- seq_printf(seq, "%d\n", r->num_rmid);
+ seq_printf(seq, "%d\n", r->mon.num_rmid);
return 0;
}
@@ -1109,7 +1109,7 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
struct rdt_resource *r = of->kn->parent->priv;
struct mon_evt *mevt;
- list_for_each_entry(mevt, &r->evt_list, list) {
+ list_for_each_entry(mevt, &r->mon.evt_list, list) {
seq_printf(seq, "%s\n", mevt->name);
if (mevt->configurable)
seq_printf(seq, "%s_config\n", mevt->name);
@@ -3042,14 +3042,14 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
if (ret)
goto out_destroy;
- if (WARN_ON(list_empty(&r->evt_list))) {
+ if (WARN_ON(list_empty(&r->mon.evt_list))) {
ret = -EPERM;
goto out_destroy;
}
priv.u.rid = r->rid;
priv.u.domid = d->id;
- list_for_each_entry(mevt, &r->evt_list, list) {
+ list_for_each_entry(mevt, &r->mon.evt_list, list) {
priv.u.evtid = mevt->evtid;
ret = mon_addfile(kn, mevt->name, priv.priv);
if (ret)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index a365f67131ec..bf99eb9c6ce4 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -150,22 +150,31 @@ struct resctrl_membw {
struct rdt_parse_data;
struct resctrl_schema;
+/**
+ * struct resctrl_mon - Monitoring related data
+ * @num_rmid: Number of RMIDs available
+ * @evt_list: List of monitoring events
+ */
+struct resctrl_mon {
+ int num_rmid;
+ struct list_head evt_list;
+};
+
/**
* struct rdt_resource - attributes of a resctrl resource
* @rid: The index of the resource
* @alloc_capable: Is allocation available on this machine
* @mon_capable: Is monitor feature available on this machine
- * @num_rmid: Number of RMIDs available
* @cache_level: Which cache level defines scope of this resource
* @cache: Cache allocation related data
* @membw: If the component has bandwidth controls, their properties.
+ * @mon: Monitoring related data.
* @domains: RCU list of all domains for this resource
* @name: Name to use in "schemata" file.
* @data_width: Character width of data when displaying
* @default_ctrl: Specifies default cache cbm or memory B/W percent.
* @format_str: Per resource format string to show domain value
* @parse_ctrlval: Per resource function pointer to parse control values
- * @evt_list: List of monitoring events
* @fflags: flags to choose base and info files
* @cdp_capable: Is the CDP feature available on this resource
*/
@@ -173,10 +182,10 @@ struct rdt_resource {
int rid;
bool alloc_capable;
bool mon_capable;
- int num_rmid;
int cache_level;
struct resctrl_cache cache;
struct resctrl_membw membw;
+ struct resctrl_mon mon;
struct list_head domains;
char *name;
int data_width;
@@ -185,7 +194,6 @@ struct rdt_resource {
int (*parse_ctrlval)(struct rdt_parse_data *data,
struct resctrl_schema *s,
struct rdt_domain *d);
- struct list_head evt_list;
unsigned long fflags;
bool cdp_capable;
};
--
2.34.1
ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
Bits Description
15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
Monitoring Counter ID + 1
The feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <[email protected]>
---
v4: Removed resctrl_arch_has_abmc(). Added all the code inline. We dont
need to separate this as arch code.
v3: Removed changes related to mon_features.
Moved rdt_cpu_has to core.c and added new function resctrl_arch_has_abmc.
Also moved the fields mbm_assign_capable and mbm_assign_cntrs to
rdt_resource. (James)
v2: Changed the field name to mbm_assign_capable from abmc_capable.
---
arch/x86/kernel/cpu/resctrl/monitor.c | 14 ++++++++++++++
include/linux/resctrl.h | 4 ++++
2 files changed, 18 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index b35d04fc761b..1602b58ba23d 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1066,6 +1066,20 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
mbm_local_event.configurable = true;
mbm_config_rftype_init("mbm_local_bytes_config");
}
+
+ if (rdt_cpu_has(X86_FEATURE_ABMC)) {
+ r->abmc_capable = true;
+ /*
+ * Query CPUID_Fn80000020_EBX_x05 for number of
+ * ABMC counters
+ */
+ cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
+ r->mon.num_cntrs = (ebx & 0xFFFF) + 1;
+ if (r->mon.num_cntrs > 64) {
+ WARN(1, "Cannot support more than 64 ABMC counters\n");
+ r->mon.num_cntrs = 64;
+ }
+ }
}
l3_mon_evt_init(r);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index bf99eb9c6ce4..24087e6efbb6 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -153,10 +153,12 @@ struct resctrl_schema;
/**
* struct resctrl_mon - Monitoring related data
* @num_rmid: Number of RMIDs available
+ * @num_cntrs: Maximum number of abmc counters
* @evt_list: List of monitoring events
*/
struct resctrl_mon {
int num_rmid;
+ int num_cntrs;
struct list_head evt_list;
};
@@ -177,6 +179,7 @@ struct resctrl_mon {
* @parse_ctrlval: Per resource function pointer to parse control values
* @fflags: flags to choose base and info files
* @cdp_capable: Is the CDP feature available on this resource
+ * @abmc_capable: Is system capable of supporting monitor assignment?
*/
struct rdt_resource {
int rid;
@@ -196,6 +199,7 @@ struct rdt_resource {
struct rdt_domain *d);
unsigned long fflags;
bool cdp_capable;
+ bool abmc_capable;
};
/**
--
2.34.1
The functions thread_throttle_mode_init() and mbm_config_rftype_init()
both initialize fflags for resctrl files. Adding the new files will
involve adding another function to initialize the fflags. All of this
can be simplified by adding a new function resctrl_file_fflags_init()
and passing the file name and flags to be initialized.
Consolidate multiple fflags initialization into resctrl_file_fflags_init()
and remove thread_throttle_mode_init() and mbm_config_rftype_init().
Signed-off-by: Babu Moger <[email protected]>
---
v4: Commit message update.
v3: New patch to display ABMC capability.
---
arch/x86/kernel/cpu/resctrl/core.c | 4 +++-
arch/x86/kernel/cpu/resctrl/internal.h | 4 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++++--
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 +++-------------
4 files changed, 12 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 6cf3c887c693..ec93f6a50308 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -224,7 +224,9 @@ static bool __get_mem_config_intel(struct rdt_resource *r)
r->membw.throttle_mode = THREAD_THROTTLE_PER_THREAD;
else
r->membw.throttle_mode = THREAD_THROTTLE_MAX;
- thread_throttle_mode_init();
+
+ resctrl_file_fflags_init("thread_throttle_mode",
+ RFTYPE_CTRL_INFO | RFTYPE_RES_MB);
r->alloc_capable = true;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index f1d926832ec8..d566251094b2 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -603,8 +603,8 @@ void cqm_handle_limbo(struct work_struct *work);
bool has_busy_rmid(struct rdt_domain *d);
void __check_limbo(struct rdt_domain *d, bool force_free);
void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
-void __init thread_throttle_mode_init(void);
-void __init mbm_config_rftype_init(const char *config);
+void __init resctrl_file_fflags_init(const char *config,
+ unsigned long fflags);
void rdt_staged_configs_clear(void);
bool closid_allocated(unsigned int closid);
int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 1602b58ba23d..0db9f12debb9 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1060,11 +1060,13 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
mbm_total_event.configurable = true;
- mbm_config_rftype_init("mbm_total_bytes_config");
+ resctrl_file_fflags_init("mbm_total_bytes_config",
+ RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
}
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
mbm_local_event.configurable = true;
- mbm_config_rftype_init("mbm_local_bytes_config");
+ resctrl_file_fflags_init("mbm_local_bytes_config",
+ RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
}
if (rdt_cpu_has(X86_FEATURE_ABMC)) {
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 7114d58ef1e3..aa3eb6ea059a 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2022,24 +2022,14 @@ static struct rftype *rdtgroup_get_rftype_by_name(const char *name)
return NULL;
}
-void __init thread_throttle_mode_init(void)
-{
- struct rftype *rft;
-
- rft = rdtgroup_get_rftype_by_name("thread_throttle_mode");
- if (!rft)
- return;
-
- rft->fflags = RFTYPE_CTRL_INFO | RFTYPE_RES_MB;
-}
-
-void __init mbm_config_rftype_init(const char *config)
+void __init resctrl_file_fflags_init(const char *config,
+ unsigned long fflags)
{
struct rftype *rft;
rft = rdtgroup_get_rftype_by_name(config);
if (rft)
- rft->fflags = RFTYPE_MON_INFO | RFTYPE_RES_CACHE;
+ rft->fflags = fflags;
}
/**
--
2.34.1
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as the counter
is assigned. Number of assignments depend on number of ABMC counters
available.
Provide the interface to display the number of ABMC counters supported.
Signed-off-by: Babu Moger <[email protected]>
---
v4: Changed the counter name to num_cntrs. And few text changes.
v3: Changed the field name to mbm_assign_cntrs.
v2: Changed the field name to mbm_assignable_counters from abmc_counters.
---
Documentation/arch/x86/resctrl.rst | 4 ++++
arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 ++++++++++++++++
3 files changed, 21 insertions(+)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 02790efaabcc..7ab8172ef208 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -257,6 +257,10 @@ with the following files:
# cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
0=0x30;1=0x30;3=0x15;4=0x15
+"num_cntrs":
+ Available when ABMC feature is supported. The number of ABMC counters
+ available for configuration.
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 0db9f12debb9..e75a6146068b 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1081,6 +1081,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
WARN(1, "Cannot support more than 64 ABMC counters\n");
r->mon.num_cntrs = 64;
}
+ resctrl_file_fflags_init("num_cntrs", RFTYPE_MON_INFO);
}
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index aa3eb6ea059a..ca692712b393 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -846,6 +846,16 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
return ret;
}
+static int rdtgroup_num_cntrs_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+
+ seq_printf(s, "%d\n", r->mon.num_cntrs);
+
+ return 0;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -1911,6 +1921,12 @@ static struct rftype res_common_files[] = {
.seq_show = rdtgroup_cpus_show,
.fflags = RFTYPE_BASE,
},
+ {
+ .name = "num_cntrs",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = rdtgroup_num_cntrs_show,
+ },
{
.name = "cpus_list",
.mode = 0644,
--
2.34.1
Add the functionality to enable/disable ABMC feature.
ABMC feature is enabled by setting enabled bit(0) in MSR L3_QOS_EXT_CFG.
When the state of ABMC is changed, the MSR needs to be updated on all
the logical processors in the QOS Domain.
Hardware counters will reset when ABMC state is changed. Kernel internal
counters need to be reset to avoid overflow condition in the next update.
The ABMC feature details are documented in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Signed-off-by: Babu Moger <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---
v4: Removed resctrl_arch_set_abmc_enabled and resctrl_arch_get_abmc_enabled.
Directly calling resctrl_abmc_enable and resctrl_abmc_disable.
Renamed couple of functions.
resctrl_abmc_msrwrite() -> resctrl_abmc_set_one()
resctrl_abmc_setup() -> resctrl_abmc_set_all()
Added rdtgroup_mutex lockdep asserts.
Updated commit log and code comments.
v3: No changes.
v2: Few text changes in commit message.
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 8 ++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 62 ++++++++++++++++++++++++++
3 files changed, 71 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e022e6eb766c..5f9a0139e98c 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1171,6 +1171,7 @@
#define MSR_IA32_MBA_BW_BASE 0xc0000200
#define MSR_IA32_SMBA_BW_BASE 0xc0000280
#define MSR_IA32_EVT_CFG_BASE 0xc0000400
+#define MSR_IA32_L3_QOS_EXT_CFG 0xc00003ff
/* MSR_IA32_VMX_MISC bits */
#define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index d566251094b2..fabe40304798 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -97,6 +97,9 @@ cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
return cpu;
}
+/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature */
+#define ABMC_ENABLE BIT(0)
+
struct rdt_fs_context {
struct kernfs_fs_context kfc;
bool enable_cdpl2;
@@ -436,6 +439,7 @@ struct rdt_parse_data {
* @mbm_cfg_mask: Bandwidth sources that can be tracked when Bandwidth
* Monitoring Event Configuration (BMEC) is supported.
* @cdp_enabled: CDP state of this resource
+ * @abmc_enabled: ABMC feature is enabled
*
* Members of this structure are either private to the architecture
* e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
@@ -450,6 +454,7 @@ struct rdt_hw_resource {
unsigned int mbm_width;
unsigned int mbm_cfg_mask;
bool cdp_enabled;
+ bool abmc_enabled;
};
static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
@@ -493,6 +498,9 @@ static inline bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level l)
int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
+int resctrl_abmc_enable(enum resctrl_res_level l);
+void resctrl_abmc_disable(enum resctrl_res_level l);
+
/*
* To return the common struct rdt_resource, which is contained in struct
* rdt_hw_resource, walk the resctrl member of struct rdt_hw_resource.
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index ca692712b393..9148d1234ede 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2416,6 +2416,68 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable)
return 0;
}
+static void resctrl_abmc_set_one(void *arg)
+{
+ bool *enable = arg;
+ u64 msrval;
+
+ rdmsrl(MSR_IA32_L3_QOS_EXT_CFG, msrval);
+
+ if (*enable)
+ msrval |= ABMC_ENABLE;
+ else
+ msrval &= ~ABMC_ENABLE;
+
+ wrmsrl(MSR_IA32_L3_QOS_EXT_CFG, msrval);
+}
+
+static int resctrl_abmc_set_all(enum resctrl_res_level l, bool enable)
+{
+ struct rdt_resource *r = &rdt_resources_all[l].r_resctrl;
+ struct rdt_domain *d;
+
+ /*
+ * Update QOS_CFG MSR on all the CPUs associated with the resource
+ * Hardware counters will reset after switching the monotor mode.
+ * Reset the internal counters so that it is not considered as
+ * an overflow in next update.
+ */
+ list_for_each_entry(d, &r->domains, list) {
+ on_each_cpu_mask(&d->cpu_mask, resctrl_abmc_set_one, &enable, 1);
+ resctrl_arch_reset_rmid_all(r, d);
+ }
+
+ return 0;
+}
+
+int resctrl_abmc_enable(enum resctrl_res_level l)
+{
+ struct rdt_hw_resource *hw_res = &rdt_resources_all[l];
+ int ret = 0;
+
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ if (!hw_res->abmc_enabled) {
+ ret = resctrl_abmc_set_all(l, true);
+ if (!ret)
+ hw_res->abmc_enabled = true;
+ }
+
+ return ret;
+}
+
+void resctrl_abmc_disable(enum resctrl_res_level l)
+{
+ struct rdt_hw_resource *hw_res = &rdt_resources_all[l];
+
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ if (hw_res->abmc_enabled) {
+ resctrl_abmc_set_all(l, false);
+ hw_res->abmc_enabled = false;
+ }
+}
+
/*
* We don't allow rdtgroup directories to be created anywhere
* except the root directory. Thus when looking for the rdtgroup
--
2.34.1
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as it is assigned.
ABMC mode is enabled by default when supported. System can be one mode
at a time (Legacy monitor mode or ABMC mode).
Provide an interface to display the monitor mode on the system.
$cat /sys/fs/resctrl/info/L3_MON/mbm_assign
[abmc]
legacy
Signed-off-by: Babu Moger <[email protected]>
---
v4: Fixed the checks for legacy and abmc mode. Default it ABMC.
v3: New patch to display ABMC capability.
---
Documentation/arch/x86/resctrl.rst | 10 ++++++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 23 +++++++++++++++++++++++
3 files changed, 34 insertions(+)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 7ab8172ef208..ab3cde61a124 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -261,6 +261,16 @@ with the following files:
Available when ABMC feature is supported. The number of ABMC counters
available for configuration.
+"mbm_assign":
+ Available when ABMC feature is supported. Reports the list of assignable
+ monitoring features supported. The enclosed brackets indicate which
+ feature is enabled.
+ ::
+
+ cat /sys/fs/resctrl/info/L3_MON/mbm_assign
+ [abmc]
+ mbm_legacy
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index e75a6146068b..b1d002e5e93d 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1071,6 +1071,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
if (rdt_cpu_has(X86_FEATURE_ABMC)) {
r->abmc_capable = true;
+ resctrl_file_fflags_init("mbm_assign", RFTYPE_MON_INFO);
/*
* Query CPUID_Fn80000020_EBX_x05 for number of
* ABMC counters
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 9148d1234ede..3071bbb7a15e 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -856,6 +856,23 @@ static int rdtgroup_num_cntrs_show(struct kernfs_open_file *of,
return 0;
}
+static int rdtgroup_mbm_assign_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+
+ if (hw_res->abmc_enabled) {
+ seq_puts(s, "[abmc]\n");
+ seq_puts(s, "mbm_legacy\n");
+ } else {
+ seq_puts(s, "abmc\n");
+ seq_puts(s, "[mbm_legacy]\n");
+ }
+
+ return 0;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -1913,6 +1930,12 @@ static struct rftype res_common_files[] = {
.seq_show = mbm_local_bytes_config_show,
.write = mbm_local_bytes_config_write,
},
+ {
+ .name = "mbm_assign",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = rdtgroup_mbm_assign_show,
+ },
{
.name = "cpus",
.mode = 0644,
--
2.34.1
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as it is assigned.
There are two events per resctrl group, mbm total and mbm local.
User have the option to assign or unassign each event individually.
Add a new field mon_state in mongroup data structure to represent the
assignment state of the group. Reset the states when switching the
monitor mode.
Signed-off-by: Babu Moger <[email protected]>
---
v4: Changed ASSIGN_TOTAL and ASSIGN_LOCAL to use QOS_L3_MBM_TOTAL_EVENT_ID
and QOS_L3_MBM_LOCAL_EVENT_ID.
Few more commit text changes.
v3: Changed the field name to mon_state. Also thie state is not visible to
users directly as part of out global assign approach.
v2: Added check to display "Unsupported" when user tries to access
monitor state when ABMC is not enabled.
---
arch/x86/kernel/cpu/resctrl/internal.h | 9 +++++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++++++
2 files changed, 17 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index fabe40304798..5e7e76cd512f 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -100,6 +100,13 @@ cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature */
#define ABMC_ENABLE BIT(0)
+/*
+ * monitor group's state when ABMC is supported
+ */
+#define ASSIGN_NONE 0
+#define ASSIGN_TOTAL BIT(QOS_L3_MBM_TOTAL_EVENT_ID)
+#define ASSIGN_LOCAL BIT(QOS_L3_MBM_LOCAL_EVENT_ID)
+
struct rdt_fs_context {
struct kernfs_fs_context kfc;
bool enable_cdpl2;
@@ -203,12 +210,14 @@ enum rdtgrp_mode {
* @parent: parent rdtgrp
* @crdtgrp_list: child rdtgroup node list
* @rmid: rmid for this rdtgroup
+ * @mon_state: Assignment state of the group
*/
struct mongroup {
struct kernfs_node *mon_data_kn;
struct rdtgroup *parent;
struct list_head crdtgrp_list;
u32 rmid;
+ u32 mon_state;
};
/**
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 400ae405e10e..c176bacf7ba1 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2474,6 +2474,7 @@ static void resctrl_abmc_set_one(void *arg)
static int resctrl_abmc_set_all(enum resctrl_res_level l, bool enable)
{
struct rdt_resource *r = &rdt_resources_all[l].r_resctrl;
+ struct rdtgroup *prgrp, *crgrp;
struct rdt_domain *d;
/*
@@ -2493,6 +2494,13 @@ static int resctrl_abmc_set_all(enum resctrl_res_level l, bool enable)
resctrl_arch_reset_rmid_all(r, d);
}
+ /* Reset assign state for all the monitor groups */
+ list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
+ prgrp->mon.mon_state = ASSIGN_NONE;
+ list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list)
+ crgrp->mon.mon_state = ASSIGN_NONE;
+ }
+
return 0;
}
--
2.34.1
If the BMEC (Bandwidth Monitoring Event Configuration) feature is
supported, the bandwidth events can be configured to track specific
events. The event configuration is domain specific. ABMC (Assignable
Bandwidth Monitoring Counters) feature needs event configuration
information to assign hardware counter to an RMID. Event configurations
are not stored in resctrl but instead always read from or written to
hardware directly when prompted by user space.
Read the event configuration from the hardware during the domain
initialization. Save the configuration information in the rdt_hw_domain,
so it can be used for counter assignment.
Signed-off-by: Babu Moger <[email protected]>
---
v4: Read the configuration information from the hardware to initialize.
Added few commit messages.
Fixed the tab spaces.
v3: Minor changes related to rebase in mbm_config_write_domain.
v2: No changes.
---
arch/x86/kernel/cpu/resctrl/core.c | 2 ++
arch/x86/kernel/cpu/resctrl/internal.h | 5 +++++
arch/x86/kernel/cpu/resctrl/monitor.c | 21 +++++++++++++++++++++
3 files changed, 28 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index ec93f6a50308..856c46d12177 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -542,6 +542,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
return;
}
+ arch_domain_mbm_evt_config(hw_dom);
+
list_add_tail_rcu(&d->list, add_pos);
err = resctrl_online_domain(r, d);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 5e7e76cd512f..60a1ca0a11a7 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -373,6 +373,8 @@ struct arch_mbm_state {
* @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID)
* @arch_mbm_total: arch private state for MBM total bandwidth
* @arch_mbm_local: arch private state for MBM local bandwidth
+ * @mbm_total_cfg: MBM total bandwidth configuration
+ * @mbm_local_cfg: MBM local bandwidth configuration
*
* Members of this structure are accessed via helpers that provide abstraction.
*/
@@ -381,6 +383,8 @@ struct rdt_hw_domain {
u32 *ctrl_val;
struct arch_mbm_state *arch_mbm_total;
struct arch_mbm_state *arch_mbm_local;
+ u32 mbm_total_cfg;
+ u32 mbm_local_cfg;
};
static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
@@ -622,6 +626,7 @@ void __check_limbo(struct rdt_domain *d, bool force_free);
void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
void __init resctrl_file_fflags_init(const char *config,
unsigned long fflags);
+void arch_domain_mbm_evt_config(struct rdt_hw_domain *hw_dom);
void rdt_staged_configs_clear(void);
bool closid_allocated(unsigned int closid);
int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index b1d002e5e93d..ab0f4bb49bd9 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1093,6 +1093,27 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
return 0;
}
+void arch_domain_mbm_evt_config(struct rdt_hw_domain *hw_dom)
+{
+ u64 msrval;
+
+ /*
+ * Read the configuration registers QOS_EVT_CFG_n, where <n> is
+ * the BMEC event number (EvtID).
+ * 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
+ * 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
+ */
+ if (mbm_total_event.configurable) {
+ rdmsrl(MSR_IA32_EVT_CFG_BASE, msrval);
+ hw_dom->mbm_total_cfg = msrval & MAX_EVT_CONFIG_BITS;
+ }
+
+ if (mbm_local_event.configurable) {
+ rdmsrl(MSR_IA32_EVT_CFG_BASE + 1, msrval);
+ hw_dom->mbm_local_cfg = msrval & MAX_EVT_CONFIG_BITS;
+ }
+}
+
void __exit rdt_put_mon_l3_config(void)
{
dom_data_exit();
--
2.34.1
The event configuration is domain specific and initialized during domain
initialization. It is not required to read the configuration register
every time user asks for it. Use the value stored in rdt_hw_domain instead.
Signed-off-by: Babu Moger <[email protected]>
---
v4: New patch.
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 55 ++++++++++++--------------
1 file changed, 25 insertions(+), 30 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index c176bacf7ba1..0e425c91fa46 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1618,46 +1618,30 @@ static inline unsigned int mon_event_config_index_get(u32 evtid)
}
}
-static void mon_event_config_read(void *info)
-{
- struct mon_config_info *mon_info = info;
- unsigned int index;
- u64 msrval;
-
- index = mon_event_config_index_get(mon_info->evtid);
- if (index == INVALID_CONFIG_INDEX) {
- pr_warn_once("Invalid event id %d\n", mon_info->evtid);
- return;
- }
- rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval);
-
- /* Report only the valid event configuration bits */
- mon_info->mon_config = msrval & MAX_EVT_CONFIG_BITS;
-}
-
-static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
-{
- smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
-}
-
static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
{
- struct mon_config_info mon_info = {0};
struct rdt_domain *dom;
bool sep = false;
+ u32 val;
cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
list_for_each_entry(dom, &r->domains, list) {
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(dom);
if (sep)
seq_puts(s, ";");
- memset(&mon_info, 0, sizeof(struct mon_config_info));
- mon_info.evtid = evtid;
- mondata_config_read(dom, &mon_info);
+ if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
+ val = hw_dom->mbm_total_cfg;
+ } else if (evtid == QOS_L3_MBM_LOCAL_EVENT_ID) {
+ val = hw_dom->mbm_local_cfg;
+ } else {
+ pr_warn_once("Invalid event id %d\n", evtid);
+ break;
+ }
- seq_printf(s, "%d=0x%02x", dom->id, mon_info.mon_config);
+ seq_printf(s, "%d=0x%02x", dom->id, val);
sep = true;
}
seq_puts(s, "\n");
@@ -1704,17 +1688,28 @@ static void mon_event_config_write(void *info)
static void mbm_config_write_domain(struct rdt_resource *r,
struct rdt_domain *d, u32 evtid, u32 val)
{
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct mon_config_info mon_info = {0};
/*
* Read the current config value first. If both are the same then
* no need to write it again.
*/
- mon_info.evtid = evtid;
- mondata_config_read(d, &mon_info);
- if (mon_info.mon_config == val)
+ switch (evtid) {
+ case QOS_L3_MBM_TOTAL_EVENT_ID:
+ if (val == hw_dom->mbm_total_cfg)
+ return;
+ break;
+ case QOS_L3_MBM_LOCAL_EVENT_ID:
+ if (val == hw_dom->mbm_local_cfg)
+ return;
+ break;
+ default:
+ pr_warn_once("Invalid event id %d\n", evtid);
return;
+ }
+ mon_info.evtid = evtid;
mon_info.mon_config = val;
/*
--
2.34.1
ABMC feature requires to users to assign a hardware counter to an RMID
to monitor the events. Provide the interfaces to assign a counter.
Individual counters are configured by writing to L3_QOS_ABMC_CFG MSR
and specifying the counter id, bandwidth source, and bandwidth types.
The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Signed-off-by: Babu Moger <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---
v4: Commit message update.
User bitmap APIs where applicable.
Changed the interfaces considering MPAM(arm).
Added domain specific assignment.
v3: Removed the static from the prototype of rdtgroup_assign_abmc.
The function is not called directly from user anymore. These
changes are related to global assignment interface.
v2: Minor text changes in commit message.
---
arch/x86/kernel/cpu/resctrl/internal.h | 3 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 101 +++++++++++++++++++++++++
2 files changed, 104 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 45ed33f4f0ff..a88c8fc5e4df 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -657,6 +657,9 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
void __init resctrl_file_fflags_init(const char *config,
unsigned long fflags);
void arch_domain_mbm_evt_config(struct rdt_hw_domain *hw_dom);
+int resctrl_arch_assign(struct rdt_domain *d, u32 evtid, u32 rmid,
+ u32 ctr_id, u32 closid, bool enable);
+int resctrl_grp_assign(struct rdtgroup *rdtgrp, u32 evtid);
void rdt_staged_configs_clear(void);
bool closid_allocated(unsigned int closid);
int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 0e425c91fa46..48df76499a04 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -203,6 +203,19 @@ static void num_cntrs_init(void)
num_cntrs_free_map_len = r->mon.num_cntrs;
}
+static int assign_cntrs_alloc(void)
+{
+ u32 ctr_id = find_first_bit(&num_cntrs_free_map,
+ num_cntrs_free_map_len);
+
+ if (ctr_id >= num_cntrs_free_map_len)
+ return -ENOSPC;
+
+ __clear_bit(ctr_id, &num_cntrs_free_map);
+
+ return ctr_id;
+}
+
/**
* rdtgroup_mode_by_closid - Return mode of resource group with closid
* @closid: closid if the resource group
@@ -1830,6 +1843,94 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
return ret ?: nbytes;
}
+static void rdtgroup_abmc_cfg(void *info)
+{
+ u64 *msrval = info;
+
+ wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *msrval);
+}
+
+int resctrl_arch_assign(struct rdt_domain *d, u32 evtid, u32 rmid,
+ u32 ctr_id, u32 closid, bool enable)
+{
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+ union l3_qos_abmc_cfg abmc_cfg = { 0 };
+ struct arch_mbm_state *arch_mbm;
+
+ abmc_cfg.split.cfg_en = 1;
+ abmc_cfg.split.ctr_en = enable ? 1 : 0;
+ abmc_cfg.split.ctr_id = ctr_id;
+ abmc_cfg.split.bw_src = rmid;
+
+ /*
+ * Read the event configuration from the domain and pass it as
+ * bw_type.
+ */
+ if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
+ abmc_cfg.split.bw_type = hw_dom->mbm_total_cfg;
+ arch_mbm = &hw_dom->arch_mbm_total[rmid];
+ } else {
+ abmc_cfg.split.bw_type = hw_dom->mbm_local_cfg;
+ arch_mbm = &hw_dom->arch_mbm_local[rmid];
+ }
+
+ smp_call_function_any(&d->cpu_mask, rdtgroup_abmc_cfg, &abmc_cfg, 1);
+
+ /* Reset the internal counters */
+ if (arch_mbm)
+ memset(arch_mbm, 0, sizeof(struct arch_mbm_state));
+
+ return 0;
+}
+
+int resctrl_grp_assign(struct rdtgroup *rdtgrp, u32 evtid)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ int ctr_id = 0, index;
+ struct rdt_domain *d;
+ u32 mon_state;
+
+ index = mon_event_config_index_get(evtid);
+ if (index == INVALID_CONFIG_INDEX) {
+ rdt_last_cmd_puts("Invalid event id\n");
+ return -EINVAL;
+ }
+
+ if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
+ mon_state = ASSIGN_TOTAL;
+ } else if (evtid == QOS_L3_MBM_LOCAL_EVENT_ID) {
+ mon_state = ASSIGN_LOCAL;
+ } else {
+ rdt_last_cmd_puts("Invalid event id\n");
+ return -EINVAL;
+ }
+
+ /* Nothing to do if event has been assigned already */
+ if (rdtgrp->mon.mon_state & mon_state) {
+ rdt_last_cmd_puts("ABMC counter is assigned already\n");
+ return 0;
+ }
+
+ /*
+ * Allocate a new counter and update domains
+ */
+ ctr_id = assign_cntrs_alloc();
+ if (ctr_id < 0) {
+ rdt_last_cmd_puts("Out of ABMC counters\n");
+ return -ENOSPC;
+ }
+
+ rdtgrp->mon.ctr_id[index] = ctr_id;
+
+ list_for_each_entry(d, &r->domains, list)
+ resctrl_arch_assign(d, evtid, rdtgrp->mon.rmid, ctr_id,
+ rdtgrp->closid, 1);
+
+ rdtgrp->mon.mon_state |= mon_state;
+
+ return 0;
+}
+
/* rdtgroup information files for one cache resource. */
static struct rftype res_common_files[] = {
{
--
2.34.1
Hardware provides a limited number of ABMC counters. Once all the
counters are exhausted, counters need to be freed for new assignments.
Provide the interface to unassign the counter.
The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Signed-off-by: Babu Moger <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---
v4: Added domain specific unassign feature.
Few name changes.
v3: Removed the static from the prototype of rdtgroup_unassign_abmc.
The function is not called directly from user anymore. These
changes are related to global assignment interface.
v2: No changes.
---
arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 42 ++++++++++++++++++++++++++
2 files changed, 44 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index a88c8fc5e4df..e16244895350 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -660,6 +660,8 @@ void arch_domain_mbm_evt_config(struct rdt_hw_domain *hw_dom);
int resctrl_arch_assign(struct rdt_domain *d, u32 evtid, u32 rmid,
u32 ctr_id, u32 closid, bool enable);
int resctrl_grp_assign(struct rdtgroup *rdtgrp, u32 evtid);
+int resctrl_grp_unassign(struct rdtgroup *rdtgrp, u32 evtid);
+void num_cntrs_free(u32 ctr_id);
void rdt_staged_configs_clear(void);
bool closid_allocated(unsigned int closid);
int resctrl_find_cleanest_closid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 48df76499a04..5ea1e58c7201 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -216,6 +216,11 @@ static int assign_cntrs_alloc(void)
return ctr_id;
}
+void num_cntrs_free(u32 ctr_id)
+{
+ __set_bit(ctr_id, &num_cntrs_free_map);
+}
+
/**
* rdtgroup_mode_by_closid - Return mode of resource group with closid
* @closid: closid if the resource group
@@ -1931,6 +1936,43 @@ int resctrl_grp_assign(struct rdtgroup *rdtgrp, u32 evtid)
return 0;
}
+int resctrl_grp_unassign(struct rdtgroup *rdtgrp, u32 evtid)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ struct rdt_domain *d;
+ u32 mon_state;
+ int index;
+
+ index = mon_event_config_index_get(evtid);
+ if (index == INVALID_CONFIG_INDEX) {
+ pr_warn_once("Invalid event id %d\n", evtid);
+ return -EINVAL;
+ }
+
+ if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
+ mon_state = ASSIGN_TOTAL;
+ } else if (evtid == QOS_L3_MBM_LOCAL_EVENT_ID) {
+ mon_state = ASSIGN_LOCAL;
+ } else {
+ rdt_last_cmd_puts("Invalid event id\n");
+ return -EINVAL;
+ }
+
+ if (rdtgrp->mon.mon_state & mon_state) {
+ list_for_each_entry(d, &r->domains, list)
+ resctrl_arch_assign(d, evtid, rdtgrp->mon.rmid,
+ rdtgrp->mon.ctr_id[index],
+ rdtgrp->closid, 0);
+
+ /* Update the counter bitmap */
+ num_cntrs_free(rdtgrp->mon.ctr_id[index]);
+ }
+
+ rdtgrp->mon.mon_state &= ~mon_state;
+
+ return 0;
+}
+
/* rdtgroup information files for one cache resource. */
static struct rftype res_common_files[] = {
{
--
2.34.1
Hardware provides a set of counters when the ABMC feature is supported.
These counters are used for enabling the events in resctrl group when
the feature is supported.
Introduce num_cntrs_free_map bitmap to track available and free counters.
Signed-off-by: Babu Moger <[email protected]>
---
v4: Changed the name to num_cntrs where applicable.
Used bitmap apis.
Added more comments for the globals.
v3: Changed the bitmap name to assign_cntrs_free_map. Removed abmc
from the name.
v2: Changed the bitmap name to assignable_counter_free_map from
abmc_counter_free_map.
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 3071bbb7a15e..400ae405e10e 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -186,6 +186,23 @@ bool closid_allocated(unsigned int closid)
return !test_bit(closid, &closid_free_map);
}
+/*
+ * ABMC Counter bitmap and length for tracking available counters.
+ * ABMC feature provides set of hardware counters for enabling events.
+ * Each event takes one hardware counter. Kernel needs to keep track
+ * of number of available counters.
+ */
+static unsigned long num_cntrs_free_map;
+static u32 num_cntrs_free_map_len;
+
+static void num_cntrs_init(void)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+
+ bitmap_fill(&num_cntrs_free_map, r->mon.num_cntrs);
+ num_cntrs_free_map_len = r->mon.num_cntrs;
+}
+
/**
* rdtgroup_mode_by_closid - Return mode of resource group with closid
* @closid: closid if the resource group
@@ -2459,6 +2476,12 @@ static int resctrl_abmc_set_all(enum resctrl_res_level l, bool enable)
struct rdt_resource *r = &rdt_resources_all[l].r_resctrl;
struct rdt_domain *d;
+ /*
+ * The bitmap num_cntrs_free_map needs to be reset when switching
+ * the monitor mode.
+ */
+ num_cntrs_init();
+
/*
* Update QOS_CFG MSR on all the CPUs associated with the resource
* Hardware counters will reset after switching the monotor mode.
--
2.34.1
Enable ABMC by default when feature is supported.
Counters are assigned automatically on group creation. If the counters
are exhausted, report the warnings and continue. It is not required to
fail group creation for assignment failures. Users will have the option
to modify the assignments.
Signed-off-by: Babu Moger <[email protected]>
---
v4: Few name changes based on the upstream discussion.
Commit message update.
v3: This is a new patch. Patch addresses the upstream comment to enable
ABMC feature by default if the feature is available.
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 69 ++++++++++++++++++++++++++
1 file changed, 69 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 5ea1e58c7201..f452b6d9bb99 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2756,6 +2756,7 @@ static void rdt_disable_ctx(void)
{
resctrl_arch_set_cdp_enabled(RDT_RESOURCE_L3, false);
resctrl_arch_set_cdp_enabled(RDT_RESOURCE_L2, false);
+ resctrl_abmc_disable(RDT_RESOURCE_L3);
set_mba_sc(false);
resctrl_debug = false;
@@ -2786,6 +2787,8 @@ static int rdt_enable_ctx(struct rdt_fs_context *ctx)
if (ctx->enable_debug)
resctrl_debug = true;
+ resctrl_abmc_enable(RDT_RESOURCE_L3);
+
return 0;
out_cdpl3:
@@ -2882,6 +2885,41 @@ static void schemata_list_destroy(void)
}
}
+static int resctrl_assign_events(struct rdtgroup *rdtgrp)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ int ret = 0;
+
+ if (!hw_res->abmc_enabled)
+ return 0;
+
+ if (is_mbm_total_enabled())
+ ret = resctrl_grp_assign(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+ if (!ret && is_mbm_local_enabled())
+ ret = resctrl_grp_assign(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
+
+ return ret;
+}
+
+static int resctrl_unassign_events(struct rdtgroup *rdtgrp)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ int ret = 0;
+
+ if (!hw_res->abmc_enabled)
+ return 0;
+
+ if (is_mbm_total_enabled())
+ ret = resctrl_grp_unassign(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+ if (!ret && is_mbm_local_enabled())
+ ret = resctrl_grp_unassign(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
+ return ret;
+}
+
static int rdt_get_tree(struct fs_context *fc)
{
struct rdt_fs_context *ctx = rdt_fc2context(fc);
@@ -2941,6 +2979,14 @@ static int rdt_get_tree(struct fs_context *fc)
if (ret < 0)
goto out_mongrp;
rdtgroup_default.mon.mon_data_kn = kn_mondata;
+
+ /*
+ * Assign the monitor counters if it is available. If it fails,
+ * report the warnings and continue. It is not nessaccery to
+ * fail here.
+ */
+ if (resctrl_assign_events(&rdtgroup_default) < 0)
+ rdt_last_cmd_puts("Monitor assignment failed\n");
}
ret = rdt_pseudo_lock_init();
@@ -3214,6 +3260,8 @@ static void rdt_kill_sb(struct super_block *sb)
cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
+ resctrl_unassign_events(&rdtgroup_default);
+
rdt_disable_ctx();
/*Put everything back to default values. */
@@ -3752,6 +3800,14 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
goto out_unlock;
}
+ /*
+ * Assign the monitor counters if it is available. If it fails,
+ * report the warnings and continue. It is not nessaccery to
+ * fail here.
+ */
+ if (resctrl_assign_events(rdtgrp) < 0)
+ rdt_last_cmd_puts("Monitor assignment failed\n");
+
kernfs_activate(rdtgrp->kn);
/*
@@ -3796,6 +3852,14 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
if (ret)
goto out_closid_free;
+ /*
+ * Assign the monitor counters if it is available. If it fails,
+ * report the warnings and continue. It is not nessaccery to
+ * fail here.
+ */
+ if (resctrl_assign_events(rdtgrp) < 0)
+ rdt_last_cmd_puts("Monitor assignment failed\n");
+
kernfs_activate(rdtgrp->kn);
ret = rdtgroup_init_alloc(rdtgrp);
@@ -3891,6 +3955,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
update_closid_rmid(tmpmask, NULL);
rdtgrp->flags = RDT_DELETED;
+
+ resctrl_unassign_events(rdtgrp);
+
free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
/*
@@ -3937,6 +4004,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
update_closid_rmid(tmpmask, NULL);
+ resctrl_unassign_events(rdtgrp);
+
free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
closid_free(rdtgrp->closid);
--
2.34.1
Introduce interface to switch between ABMC and legacy_mbm modes.
By default ABMC is enabled on resctrl mount if the feature is available.
However, user will have the option to go back to legacy_mbm if required.
$ cat /sys/fs/resctrl/info/L3_MON/mbm_assign
[abmc]
mbm_legacy
To enable the legacy monitoring feature:
$ echo "mbm_legacy" > /sys/fs/resctrl/info/L3_MON/mbm_assign
Signed-off-by: Babu Moger <[email protected]>
---
v4: Minor commit text changes. Keep the default to ABMC when supported.
v3: New patch to address the review comments from upstream.
---
Documentation/arch/x86/resctrl.rst | 10 +++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 40 +++++++++++++++++++++++++-
2 files changed, 49 insertions(+), 1 deletion(-)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index ab3cde61a124..fd050d4d22cd 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -271,6 +271,16 @@ with the following files:
[abmc]
mbm_legacy
+ * To enable ABMC feature:
+ ::
+
+ # echo "abmc" > /sys/fs/resctrl/info/L3_MON/mbm_assign
+
+ * To enable the legacy monitoring feature:
+ ::
+
+ # echo "mbm_legacy" > /sys/fs/resctrl/info/L3_MON/mbm_assign
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f452b6d9bb99..d77ff059269a 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -908,6 +908,43 @@ static int rdtgroup_mbm_assign_show(struct kernfs_open_file *of,
return 0;
}
+/*
+ * rdtgroup_mode_write - Modify the resource group's mode
+ */
+static ssize_t rdtgroup_mbm_assign_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes,
+ loff_t off)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+ int ret = 0;
+
+ if (!r->abmc_capable)
+ return -EINVAL;
+
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n')
+ return -EINVAL;
+
+ buf[nbytes - 1] = '\0';
+
+ cpus_read_lock();
+ mutex_lock(&rdtgroup_mutex);
+
+ rdt_last_cmd_clear();
+
+ if (!strcmp(buf, "mbm_legacy"))
+ resctrl_abmc_disable(RDT_RESOURCE_L3);
+ else if (!strcmp(buf, "abmc"))
+ ret = resctrl_abmc_enable(RDT_RESOURCE_L3);
+ else
+ ret = -EINVAL;
+
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+
+ return ret ?: nbytes;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -2087,9 +2124,10 @@ static struct rftype res_common_files[] = {
},
{
.name = "mbm_assign",
- .mode = 0444,
+ .mode = 0644,
.kf_ops = &rdtgroup_kf_single_ops,
.seq_show = rdtgroup_mbm_assign_show,
+ .write = rdtgroup_mbm_assign_write,
},
{
.name = "cpus",
--
2.34.1
Introduce the interface to list the monitor states of all the resctrl groups.
Example:
$cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
List follows the following format:
"<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
Format for specific type of groups:
- Default CTRL_MON group:
"//<domain_id>=<assignment_flags>"
- Non-default CTRL_MON group:
"<CTRL_MON group>//<domain_id>=<assignment_flags>"
- Child MON group of default CTRL_MON group:
"/<MON group>/<domain_id>=<assignment_flags>"
- Child MON group of non-default CTRL_MON group:
"<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
Assignment flags can be one of the following:
t MBM total event is enabled
l MBM local event is enabled
tl Both total and local MBM events are enabled
_ None of the MBM events are enabled
Signed-off-by: Babu Moger <[email protected]>
---
v4: Added functionality to query domain specific assigment in.
rdtgroup_abmc_dom_state().
v3: New patch.
Addresses the feedback to provide the global assignment interface.
https://lore.kernel.org/lkml/[email protected]/
---
Documentation/arch/x86/resctrl.rst | 55 +++++++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 129 +++++++++++++++++++++++++
3 files changed, 185 insertions(+)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index fd050d4d22cd..255087de568d 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -281,6 +281,61 @@ with the following files:
# echo "mbm_legacy" > /sys/fs/resctrl/info/L3_MON/mbm_assign
+"mbm_assign_control":
+ Available when ABMC features are supported.
+ Reports the resctrl group and monitor status of each group.
+
+ List follows the following format:
+ "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
+
+ Format for specific type of grpups:
+
+ * Default CTRL_MON group:
+ "//<domain_id>=<assignment_flags>"
+
+ * Non-default CTRL_MON group:
+ "<CTRL_MON group>//<domain_id>=<assignment_flags>"
+
+ * Child MON group of default CTRL_MON group:
+ "/<MON group>/<domain_id>=<assignment_flags>"
+
+ * Child MON group of non-default CTRL_MON group:
+ "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
+
+ Assignment flags can be one of the following:
+ ::
+
+ t MBM total event is enabled
+ l MBM local event is enabled
+ tl Both total and local MBM events are enabled
+ _ None of the MBM events are enabled
+
+ Examples:
+ ::
+
+ # mkdir /sys/fs/resctrl/mon_groups/child_default_mon_grp
+ # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp
+ # mkdir /sys/fs/resctrl/non_default_ctrl_mon_grp/mon_groups/child_non_default_mon_grp
+
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+ //0=tl;1=tl;
+ /child_default_mon_grp/0=tl;1=tl;
+
+ There are four resctrl groups. All the groups have total and local events are
+ enabled on domain 0 and 1.
+
+ non_default_ctrl_mon_grp// - This is a non default control mon group.
+
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/ - This is a child monitor
+ group of the non default control mon group.
+
+ // - This is a default control mon group.
+
+ /child_default_mon_grp/ - This is a child monitor group of the default control
+ mon group.
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index ab0f4bb49bd9..f4dba02db9c0 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1083,6 +1083,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
r->mon.num_cntrs = 64;
}
resctrl_file_fflags_init("num_cntrs", RFTYPE_MON_INFO);
+ resctrl_file_fflags_init("mbm_assign_control", RFTYPE_MON_INFO);
}
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d77ff059269a..bd3a54405402 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -945,6 +945,129 @@ static ssize_t rdtgroup_mbm_assign_write(struct kernfs_open_file *of,
return ret ?: nbytes;
}
+static void rdtgroup_abmc_dom_cfg(void *info)
+{
+ u64 *msrval = info;
+
+ wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *msrval);
+ rdmsrl(MSR_IA32_L3_QOS_ABMC_DSC, *msrval);
+}
+
+/*
+ * Writing the counter id with CfgEn=0 on L3_QOS_ABMC_CFG and reading
+ * L3_QOS_ABMC_DSC back will return configuration of the counter
+ * specified.
+ */
+static int rdtgroup_abmc_dom_state(struct rdt_domain *d, u32 ctr_id, u32 rmid)
+{
+ union l3_qos_abmc_cfg abmc_cfg = { 0 };
+
+ abmc_cfg.split.cfg_en = 0;
+ abmc_cfg.split.ctr_id = ctr_id;
+
+ smp_call_function_any(&d->cpu_mask, rdtgroup_abmc_dom_cfg,
+ &abmc_cfg, 1);
+
+ if (abmc_cfg.split.ctr_en && abmc_cfg.split.bw_src == rmid)
+ return 0;
+ else
+ return -1;
+}
+
+static char *mon_state_to_str(struct rdtgroup *rdtgrp,
+ struct rdt_domain *d, char *str)
+{
+ char *tmp = str;
+ int dom_state = ASSIGN_NONE;
+
+ /*
+ * Query the monitor state for the domain.
+ * Index 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
+ * Index 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
+ */
+ if (rdtgrp->mon.mon_state & ASSIGN_TOTAL)
+ if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.ctr_id[0], rdtgrp->mon.rmid))
+ dom_state |= ASSIGN_TOTAL;
+
+ if (rdtgrp->mon.mon_state & ASSIGN_LOCAL)
+ if (!rdtgroup_abmc_dom_state(d, rdtgrp->mon.ctr_id[1], rdtgrp->mon.rmid))
+ dom_state |= ASSIGN_LOCAL;
+
+ switch (dom_state) {
+ case ASSIGN_NONE:
+ *tmp++ = '_';
+ break;
+ case (ASSIGN_TOTAL | ASSIGN_LOCAL):
+ *tmp++ = 't';
+ *tmp++ = 'l';
+ break;
+ case ASSIGN_TOTAL:
+ *tmp++ = 't';
+ break;
+ case ASSIGN_LOCAL:
+ *tmp++ = 'l';
+ break;
+ default:
+ break;
+ }
+
+ *tmp = '\0';
+ return str;
+}
+
+static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of,
+ struct seq_file *s, void *v)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ struct rdt_domain *dom;
+ struct rdtgroup *rdtg;
+ int grp_default = 0;
+ char str[10];
+
+ if (!hw_res->abmc_enabled) {
+ rdt_last_cmd_puts("ABMC feature is not enabled\n");
+ return -EINVAL;
+ }
+
+ mutex_lock(&rdtgroup_mutex);
+
+ list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
+ struct rdtgroup *crg;
+
+ if (rdtg == &rdtgroup_default) {
+ grp_default = 1;
+ seq_puts(s, "//");
+ } else {
+ grp_default = 0;
+ seq_printf(s, "%s//", rdtg->kn->name);
+ }
+
+ list_for_each_entry(dom, &r->domains, list)
+ seq_printf(s, "%d=%s;", dom->id,
+ mon_state_to_str(rdtg, dom, str));
+ seq_putc(s, '\n');
+
+ list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
+ mon.crdtgrp_list) {
+ if (grp_default)
+ seq_printf(s, "/%s/", crg->kn->name);
+ else
+ seq_printf(s, "%s/%s/", rdtg->kn->name,
+ crg->kn->name);
+
+ list_for_each_entry(dom, &r->domains, list)
+ seq_printf(s, "%d=%s;", dom->id,
+ mon_state_to_str(crg, dom, str));
+ seq_putc(s, '\n');
+ }
+ }
+
+ mutex_unlock(&rdtgroup_mutex);
+
+ return 0;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -2143,6 +2266,12 @@ static struct rftype res_common_files[] = {
.kf_ops = &rdtgroup_kf_single_ops,
.seq_show = rdtgroup_num_cntrs_show,
},
+ {
+ .name = "mbm_assign_control",
+ .mode = 0444,
+ .kf_ops = &rdtgroup_kf_single_ops,
+ .seq_show = rdtgroup_mbm_assign_control_show,
+ },
{
.name = "cpus_list",
.mode = 0644,
--
2.34.1
Introduce the interface to enable events in ABMC mode.
Events can be enabled or disabled by writing to file
/sys/fs/resctrl/info/L3_MON/mbm_assign_control
Format is similar to the list format with addition of op-code for the
assignment operation.
"<CTRL_MON group>/<MON group>/<op-code><assignment_flags>"
Format for specific type of groups:
* Default CTRL_MON group:
"//<domain_id><op-code><assignment_flags>"
* Non-default CTRL_MON group:
"<CTRL_MON group>//<domain_id><op-code><assignment_flags>"
* Child MON group of default CTRL_MON group:
"/<MON group>/<domain_id><op-code><assignment_flags>"
* Child MON group of non-default CTRL_MON group:
"<CTRL_MON group>/<MON group>/<domain_id><op-code><assignment_flags>"
Op-code can be one of the following:
= Update the assignment to match the flags
+ enable a new state
- disable a new state
Assignment flags can be one of the following:
t MBM total event is enabled
l MBM local event is enabled
tl Both total and local MBM events are enabled
_ None of the MBM events are enabled
Signed-off-by: Babu Moger <[email protected]>
---
v4: Added domain specific assignments. Fixed the opcode parsing.
v3: New patch.
Addresses the feedback to provide the global assignment interface.
https://lore.kernel.org/lkml/[email protected]/
---
Documentation/arch/x86/resctrl.rst | 81 +++++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 242 ++++++++++++++++++++++++-
2 files changed, 322 insertions(+), 1 deletion(-)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index 255087de568d..89af91805b38 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -336,6 +336,87 @@ with the following files:
/child_default_mon_grp/ - This is a child monitor group of the default control
mon group.
+ Assignment state can be updated by writing to the interface.
+
+ Format is similar to the list format with addition of op-code for the
+ assignment operation.
+
+ "<CTRL_MON group>/<MON group>/<op-code><assignment_flags>"
+
+ Format for each type of groups:
+
+ * Default CTRL_MON group:
+ "//<domain_id><op-code><assignment_flags>"
+
+ * Non-default CTRL_MON group:
+ "<CTRL_MON group>//<domain_id><op-code><assignment_flags>"
+
+ * Child MON group of default CTRL_MON group:
+ "/<MON group>/<domain_id><op-code><assignment_flags>"
+
+ * Child MON group of non-default CTRL_MON group:
+ "<CTRL_MON group>/<MON group>/<domain_id><op-code><assignment_flags>"
+
+ Op-code can be one of the following:
+ ::
+
+ = Update the assignment to match the flags
+ + Add a new state
+ - delete a new state
+
+ Examples:
+ ::
+
+ Initial group status:
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+ //0=tl;1=tl;
+ /child_default_mon_grp/0=tl;1=tl;
+
+ To update the default group to enable only total event on domain 0:
+ # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+ Assignment status after the update:
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+ //0=t;1=tl;2=tl;
+ /child_default_mon_grp/0=tl;1=tl;
+
+ To update the MON group child_default_mon_grp to remove total event on domain 1:
+ # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+ Assignment status after the update:
+ $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
+ //0=t;1=l;
+ /child_default_mon_grp/0=t;1=tl;
+
+ To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
+ remove both local and total events on domain 1:
+ # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
+ /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+ Assignment status after the update:
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+ //0=t;1=l;
+ /child_default_mon_grp/0=t;1=tl;
+
+ To update the default group to add a total event domain 1.
+ # echo "//1+t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+
+ Assignment status after the update:
+
+ # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
+ non_default_ctrl_mon_grp//0=tl;1=tl;
+ non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
+ //0=t;1=tl;
+ /child_default_mon_grp/0=t;1=tl;
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index bd3a54405402..b74c18927bd2 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1068,6 +1068,245 @@ static int rdtgroup_mbm_assign_control_show(struct kernfs_open_file *of,
return 0;
}
+static int str_to_mon_state(char *flag)
+{
+ int i, mon_state = 0;
+
+ for (i = 0; i < strlen(flag); i++) {
+ switch (*(flag + i)) {
+ case 't':
+ mon_state |= ASSIGN_TOTAL;
+ break;
+ case 'l':
+ mon_state |= ASSIGN_LOCAL;
+ break;
+ case '_':
+ mon_state = ASSIGN_NONE;
+ break;
+ default:
+ mon_state = ASSIGN_NONE;
+ break;
+ }
+ }
+
+ return mon_state;
+}
+
+static struct rdtgroup *resctrl_get_rdtgroup(enum rdt_group_type rtype, char *p_grp, char *c_grp)
+{
+ struct rdtgroup *rdtg, *crg;
+
+ if (rtype == RDTCTRL_GROUP && *p_grp == '\0') {
+ return &rdtgroup_default;
+ } else if (rtype == RDTCTRL_GROUP) {
+ list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list)
+ if (!strcmp(p_grp, rdtg->kn->name))
+ return rdtg;
+ } else if (rtype == RDTMON_GROUP) {
+ list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
+ if (!strcmp(p_grp, rdtg->kn->name)) {
+ list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
+ mon.crdtgrp_list) {
+ if (!strcmp(c_grp, crg->kn->name))
+ return crg;
+ }
+ }
+ }
+ }
+
+ return NULL;
+}
+
+static int resctrl_process_flags(enum rdt_group_type rtype, char *p_grp, char *c_grp, char *tok)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ int op, mon_state, assign_state, unassign_state;
+ char *dom_str, *id_str, *op_str;
+ struct rdtgroup *rdtgrp;
+ struct rdt_domain *d;
+ unsigned long dom_id;
+ int ret, found = 0;
+
+ rdtgrp = resctrl_get_rdtgroup(rtype, p_grp, c_grp);
+
+ if (!rdtgrp) {
+ rdt_last_cmd_puts("Not a valid resctrl group\n");
+ return -EINVAL;
+ }
+
+next:
+ if (!tok || tok[0] == '\0')
+ return 0;
+
+ /* Start processing the strings for each domain */
+ dom_str = strim(strsep(&tok, ";"));
+
+ op_str = strpbrk(dom_str, "=+-");
+
+ if (op_str) {
+ op = *op_str;
+ } else {
+ rdt_last_cmd_puts("Missing operation =, +, -, _ character\n");
+ return -EINVAL;
+ }
+
+ id_str = strsep(&dom_str, "=+-");
+
+ if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
+ rdt_last_cmd_puts("Missing domain id\n");
+ return -EINVAL;
+ }
+
+ /* Verify if the dom_id is valid */
+ list_for_each_entry(d, &r->domains, list) {
+ if (d->id == dom_id) {
+ found = 1;
+ break;
+ }
+ }
+ if (!found) {
+ rdt_last_cmd_printf("Invalid domain id %ld\n", dom_id);
+ return -EINVAL;
+ }
+
+ mon_state = str_to_mon_state(dom_str);
+
+ assign_state = 0;
+ unassign_state = 0;
+
+ switch (op) {
+ case '+':
+ assign_state = mon_state;
+ break;
+ case '-':
+ unassign_state = mon_state;
+ break;
+ case '=':
+ assign_state = mon_state;
+ unassign_state = (ASSIGN_TOTAL | ASSIGN_LOCAL) & ~assign_state;
+ break;
+ default:
+ break;
+ }
+
+ if (assign_state & ASSIGN_TOTAL)
+ ret = resctrl_arch_assign(d, QOS_L3_MBM_TOTAL_EVENT_ID,
+ rdtgrp->mon.rmid,
+ rdtgrp->mon.ctr_id[0],
+ rdtgrp->closid, 1);
+ if (ret)
+ goto out_fail;
+
+ if (assign_state & ASSIGN_LOCAL)
+ ret = resctrl_arch_assign(d, QOS_L3_MBM_LOCAL_EVENT_ID,
+ rdtgrp->mon.rmid,
+ rdtgrp->mon.ctr_id[1],
+ rdtgrp->closid, 1);
+
+ if (ret)
+ goto out_fail;
+
+ if (unassign_state & ASSIGN_TOTAL)
+ ret = resctrl_arch_assign(d, QOS_L3_MBM_TOTAL_EVENT_ID,
+ rdtgrp->mon.rmid,
+ rdtgrp->mon.ctr_id[0],
+ rdtgrp->closid, 0);
+
+ if (ret)
+ goto out_fail;
+
+ if (unassign_state & ASSIGN_LOCAL)
+ ret = resctrl_arch_assign(d, QOS_L3_MBM_LOCAL_EVENT_ID,
+ rdtgrp->mon.rmid,
+ rdtgrp->mon.ctr_id[1],
+ rdtgrp->closid, 0);
+ if (ret)
+ goto out_fail;
+
+ goto next;
+
+out_fail:
+
+ return -EINVAL;
+}
+
+static ssize_t rdtgroup_mbm_assign_control_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes,
+ loff_t off)
+{
+ struct rdt_resource *r = of->kn->parent->priv;
+ char *token, *cmon_grp, *mon_grp;
+ struct rdt_hw_resource *hw_res;
+ int ret;
+
+ hw_res = resctrl_to_arch_res(r);
+ if (!hw_res->abmc_enabled)
+ return -EINVAL;
+
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n')
+ return -EINVAL;
+
+ buf[nbytes - 1] = '\0';
+ rdt_last_cmd_clear();
+
+ cpus_read_lock();
+ mutex_lock(&rdtgroup_mutex);
+
+ while ((token = strsep(&buf, "\n")) != NULL) {
+ if (strstr(token, "//")) {
+ /*
+ * The control mon group processing:
+ * default CTRL_MON group: "//<flags>"
+ * non-default CTRL_MON group: "<CTRL_MON group>//flags"
+ * The CTRL_MON group will be empty string if it is a
+ * default group.
+ */
+ cmon_grp = strsep(&token, "//");
+
+ /*
+ * strsep returns empty string for contiguous delimiters.
+ * Make sure check for two consicutive delimiters and
+ * advance the token.
+ */
+ mon_grp = strsep(&token, "//");
+ if (*mon_grp != '\0') {
+ rdt_last_cmd_printf("Invalid CTRL_MON group format %s\n", token);
+ ret = -EINVAL;
+ break;
+ }
+
+ ret = resctrl_process_flags(RDTCTRL_GROUP, cmon_grp, mon_grp, token);
+ if (ret)
+ break;
+ } else if (strstr(token, "/")) {
+ /*
+ * Mon group processing:
+ * MON_GROUP inside default CTRL_MON group: "/<MON group>/<flags>"
+ * MON_GROUP within CTRL_MON group: "<CTRL_MON group>/<MON group>/<flags>"
+ */
+ cmon_grp = strsep(&token, "/");
+
+ /* Extract the MON_GROUP. It cannot be empty string */
+ mon_grp = strsep(&token, "/");
+ if (*mon_grp == '\0') {
+ rdt_last_cmd_printf("Invalid MON_GROUP format %s\n", token);
+ ret = -EINVAL;
+ break;
+ }
+
+ ret = resctrl_process_flags(RDTMON_GROUP, cmon_grp, mon_grp, token);
+ if (ret)
+ break;
+ }
+ }
+
+ mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+
+ return ret ?: nbytes;
+}
+
#ifdef CONFIG_PROC_CPU_RESCTRL
/*
@@ -2268,9 +2507,10 @@ static struct rftype res_common_files[] = {
},
{
.name = "mbm_assign_control",
- .mode = 0444,
+ .mode = 0644,
.kf_ops = &rdtgroup_kf_single_ops,
.seq_show = rdtgroup_mbm_assign_control_show,
+ .write = rdtgroup_mbm_assign_control_write,
},
{
.name = "cpus_list",
--
2.34.1
The ABMC feature provides an option to the user to assign a hardware
counter to an RMID and monitor the bandwidth as long as the counter
is assigned. The bandwidth events will be tracked by the hardware until
the user changes the configuration. Each resctrl group can configure
maximum two counters, one for total event and one for local event.
The counters are configured by writing to MSR L3_QOS_ABMC_CFG.
Configuration is done by setting the counter id, bandwidth source (RMID),
bandwidth types. Reading L3_QOS_ABMC_DSC returns the configuration of
the counter id specified in L3_QOS_ABMC_CFG.
Attempts to read or write these MSRs when ABMC is not enabled will result
in a #GP(0) exception.
MSR L3_QOS_ABMC_CFG (0xC000_03FDh) and L3_QOS_ABMC_DSC (0xC000_03FEh)
details.
==========================================================================
Bits Mnemonic Description Access Type Reset Value
==========================================================================
63 CfgEn Configuration Enable R/W 0
62 CtrEn Counter Enable R/W 0
61:53 – Reserved MBZ 0
52:48 CtrID Counter Identifier R/W 0
47 IsCOS BwSrc field is a COS R/W 0
(not an RMID)
46:44 – Reserved MBZ 0
43:32 BwSrc Bandwidth Source R/W 0
(RMID or COS)
31:0 BwType Bandwidth types to R/W 0
track for this counter
==========================================================================
The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Signed-off-by: Babu Moger <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---
v4: Added more descriptions.
Changed the name abmc_ctr_id to ctr_id.
Added L3_QOS_ABMC_DSC. Used for reading the configuration.
v3: No changes.
v2: No changes.
---
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/kernel/cpu/resctrl/internal.h | 30 ++++++++++++++++++++++++++
2 files changed, 32 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 5f9a0139e98c..6d2fe39ac68f 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1172,6 +1172,8 @@
#define MSR_IA32_SMBA_BW_BASE 0xc0000280
#define MSR_IA32_EVT_CFG_BASE 0xc0000400
#define MSR_IA32_L3_QOS_EXT_CFG 0xc00003ff
+#define MSR_IA32_L3_QOS_ABMC_CFG 0xc00003fd
+#define MSR_IA32_L3_QOS_ABMC_DSC 0xc00003fe
/* MSR_IA32_VMX_MISC bits */
#define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 60a1ca0a11a7..45ed33f4f0ff 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -107,6 +107,9 @@ cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
#define ASSIGN_TOTAL BIT(QOS_L3_MBM_TOTAL_EVENT_ID)
#define ASSIGN_LOCAL BIT(QOS_L3_MBM_LOCAL_EVENT_ID)
+/* Maximum assignable counters per resctrl group */
+#define MAX_CNTRS 2
+
struct rdt_fs_context {
struct kernfs_fs_context kfc;
bool enable_cdpl2;
@@ -211,6 +214,7 @@ enum rdtgrp_mode {
* @crdtgrp_list: child rdtgroup node list
* @rmid: rmid for this rdtgroup
* @mon_state: Assignment state of the group
+ * @ctr_id: ABMC counter ids assigned to this group
*/
struct mongroup {
struct kernfs_node *mon_data_kn;
@@ -218,6 +222,7 @@ struct mongroup {
struct list_head crdtgrp_list;
u32 rmid;
u32 mon_state;
+ u32 ctr_id[MAX_CNTRS];
};
/**
@@ -568,6 +573,31 @@ union cpuid_0x10_x_edx {
unsigned int full;
};
+/*
+ * ABMC counters can be configured by writing to L3_QOS_ABMC_CFG.
+ * @bw_type : Bandwidth types to track for this counter
+ * @bw_src : Bandwidth Source (RMID or CLOSID)
+ * @reserved1 : Reserved
+ * @is_cos : BwSrc field is a COS (not an RMID)
+ * @ctr_id : Counter Identifier
+ * @reserved : Reserved
+ * @ctr_en : Counter Enable bit
+ * @cfg_en : Configuration Enable bit
+ */
+union l3_qos_abmc_cfg {
+ struct {
+ unsigned long bw_type :32,
+ bw_src :12,
+ reserved1: 3,
+ is_cos : 1,
+ ctr_id : 5,
+ reserved : 9,
+ ctr_en : 1,
+ cfg_en : 1;
+ } split;
+ unsigned long full;
+};
+
void rdt_last_cmd_clear(void);
void rdt_last_cmd_puts(const char *s);
__printf(1, 2)
--
2.34.1
Hi Babu,
On 5/24/24 5:23 AM, Babu Moger wrote:
>
>
> d. This series adds a new interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> to list and modify the group's assignment states.
There was a lot of discussion resulting in this centralized file. At first glance this
file appears to be very complicated and I believe any reasonable person would wonder if
all of this is necessary. I recommend that you add a motivation for why this file is needed.
Some items I recall are : it makes it easier for user space to learn how counters are used (no
need to traverse resctrl and open()/close() many files), on the resctrl side it makes
it possible to support counter re-assignment with a single IPI. There may be other motivations
that I am forgetting now.
Also, could the name just be "mbm_control"? What is enabled at this time are "assignable
counters" but in the future we may want to add support for other flags that have nothing to
do with "assignable counters".
>
> The list follows the following format:
>
> "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
"assignment_flags" -> "flags" ? (throughout)
>
>
> Format for specific type of groups:
>
> * Default CTRL_MON group:
> "//<domain_id>=<assignment_flags>"
>
> * Non-default CTRL_MON group:
> "<CTRL_MON group>//<domain_id>=<assignment_flags>"
>
> * Child MON group of default CTRL_MON group:
> "/<MON group>/<domain_id>=<assignment_flags>"
>
> * Child MON group of non-default CTRL_MON group:
> "<CTRL_MON group>/<MON group>/<domain_id>=<assignment_flags>"
>
> Assignment flags can be one of the following:
>
> t MBM total event is enabled
> l MBM local event is enabled
> tl Both total and local MBM events are enabled
> _ None of the MBM events are enabled
>
> Examples:
>
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> //0=tl;1=tl;
> /child_default_mon_grp/0=tl;1=tl;
>
> There are four groups and all the groups have local and total
> event enabled on domain 0 and 1.
>
> =tl means both total and local events are enabled.
>
> "//" - This is a default CONTROL MON group
>
> "non_default_ctrl_mon_grp//" - This is non default CONTROL MON group
Be consistent with "non-default" (vs non default) as well as "CTRL_MON" (vs
CONTROL MON).
>
> "/child_default_mon_grp/" - This is Child MON group of the defult group
"Child" -> "child"
"defult" -> "default"
>
> "non_default_ctrl_mon_grp/child_non_default_mon_grp/" - This is child
> MON group of the non default group
non-default
>
> e. Update the group assignment states using the interface file /sys/fs/resctrl/info/L3_MON/mbm_assign_control.
>
> The write format is similar to the above list format with addition of
> op-code for the assignment operation.
>
> * Default CTRL_MON group:
> "//<domain_id><op-code><assignment_flags>"
>
> * Non-default CTRL_MON group:
> "<CTRL_MON group>//<domain_id><op-code><assignment_flags>"
>
> * Child MON group of default CTRL_MON group:
> "/<MON group>/<domain_id><op-code><assignment_flags>"
>
> * Child MON group of non-default CTRL_MON group:
> "<CTRL_MON group>/<MON group>/<domain_id><op-code><assignment_flags>"
>
> Op-code can be one of the following:
>
> = Update the assignment to match the flags
> + Assign a new state
> - Unassign a new state
Looking here and the implementation it seems that "+_" and "-_" is supported.
I think that should be invalid. Only "=_" seems appropriate to me.
Also please take care to not have a catchall "default" that does an
unassign. Doing something like that will prevent us from ever being
able to add any flags in the future.
>
>
> Initial group status:
>
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> //0=tl;1=tl;
> /child_default_mon_grp/0=tl;1=tl;
>
> To update the default group to enable only total event on domain 0:
> # echo "//0=t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>
> Assignment status after the update:
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> //0=t;1=tl;
> /child_default_mon_grp/0=tl;1=tl;
>
> To update the MON group child_default_mon_grp to remove total event on domain 1:
> # echo "/child_default_mon_grp/1-t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>
> Assignment status after the update:
> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=tl;
> //0=t;1=l;
> /child_default_mon_grp/0=t;1=tl;
This does not look right. Why did domain #1 of the default CTRL_MON group change also?
>
> To update the MON group non_default_ctrl_mon_grp/child_non_default_mon_grp to
> remove both local and total events on domain 1:
> # echo "non_default_ctrl_mon_grp/child_non_default_mon_grp/1=_" >
> /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>
> Assignment status after the update:
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> //0=t;1=l;
> /child_default_mon_grp/0=t;1=tl;
>
> To update the default group to add a total event domain 1.
> # echo "//1+t" > /sys/fs/resctrl/info/L3_MON/mbm_assign_control
>
Unclear where "t" flag was removed.
> Assignment status after the update:
>
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_control
> non_default_ctrl_mon_grp//0=tl;1=tl;
> non_default_ctrl_mon_grp/child_non_default_mon_grp/0=tl;1=_;
> //0=t;1=tl;
> /child_default_mon_grp/0=t;1=tl;
>
> f. Read the event mbm_total_bytes and mbm_local_bytes of the default group.
> There is no change in reading the evetns with ABMC. If the event is unassigned
"evetns" -> "events"
> when reading, then the read will come back as Unavailable.
Should this not rather be "Unassigned"? According to the docs the counters
will return "Unavailable" right after reconfigure so it seems that there
are scenarios where an "assigned" counter returns "Unavailable". It seems more
useful to return "Unassigned" that will have a new specific meaning that
overloading existing "Unavailable" that has original meaning of "try again" ....
but in this case trying again will be futile.
>
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 779247936
> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> 765207488
>
> g. Users will have the option to go back to legacy_mbm mode if required.
> This can be done using the following command.
>
> # echo "legacy_mbm" > /sys/fs/resctrl/info/L3_MON/mbm_assign
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign
> abmc
> [mbm_legacy]
It is confusing for the value written by user space to be different from
the value displayed: "legacy_mbm" vs "mbm_legacy.
This is still missing information about what happens to the counters/events on
such a switch. Will events just keep counting? Will they be reset? ...?
I also think we should try to find a more generic name for this file.
"mbm_cntr_mode" or "mbm_mode" maybe?
>
> h. Check the bandwidth configuration for the group. Note that bandwidth
> configuration has a domain scope. Total event defaults to 0x7F (to
> count all the events) and local event defaults to 0x15 (to count all
> the local numa events). The event bitmap decoding is available at
> https://www.kernel.org/doc/Documentation/x86/resctrl.rst
> in section "mbm_total_bytes_config", "mbm_local_bytes_config":
>
> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 0=0x7f;1=0x7f
>
> #cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> 0=0x15;1=0x15
>
> j. Change the bandwidth source for domain 0 for the total event to count only reads.
> Note that this change effects total events on the domain 0.
>
> #echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 0=0x33;1=0x7F
>
> k. Now read the total event again. The mbm_total_bytes should display
> only the read events.
>
> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 314101
According to doc, right after a BMEC change the counter will read "Unavailable"
is this not the case here?
>
> l. Unmount the resctrl
>
> #umount /sys/fs/resctrl/
Reinette
Hi Babu,
Could you please change shortlog prefix to be "x86/cpufeatures:" to
make it obvious where this patch is headed?
Reinette
Hi Babu,
On 5/24/24 5:23 AM, Babu Moger wrote:
> Consolidate all the data related to monitoring into separate structure
> for better readability.
This needs a better motivation because "better readability" is subjective.
You can motivate this work by noting that this is re-organization in preparation
for more monitoring specific properties that will clobber the existing resource
struct more. To support this organization the pattern of the cache allocation
and memory bandwidth allocation features that consolidate the feature specific
properties into a struct is followed.
Reinette
Hi Babu,
On 5/24/24 5:23 AM, Babu Moger wrote:
> ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
> Bits Description
> 15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
> Monitoring Counter ID + 1
>
> The feature details are documented in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <[email protected]>
> ---
> v4: Removed resctrl_arch_has_abmc(). Added all the code inline. We dont
> need to separate this as arch code.
>
> v3: Removed changes related to mon_features.
> Moved rdt_cpu_has to core.c and added new function resctrl_arch_has_abmc.
> Also moved the fields mbm_assign_capable and mbm_assign_cntrs to
> rdt_resource. (James)
>
> v2: Changed the field name to mbm_assign_capable from abmc_capable.
> ---
> arch/x86/kernel/cpu/resctrl/monitor.c | 14 ++++++++++++++
> include/linux/resctrl.h | 4 ++++
> 2 files changed, 18 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index b35d04fc761b..1602b58ba23d 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1066,6 +1066,20 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
> mbm_local_event.configurable = true;
> mbm_config_rftype_init("mbm_local_bytes_config");
> }
> +
> + if (rdt_cpu_has(X86_FEATURE_ABMC)) {
> + r->abmc_capable = true;
> + /*
> + * Query CPUID_Fn80000020_EBX_x05 for number of
> + * ABMC counters
> + */
> + cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
> + r->mon.num_cntrs = (ebx & 0xFFFF) + 1;
> + if (r->mon.num_cntrs > 64) {
> + WARN(1, "Cannot support more than 64 ABMC counters\n");
if (WARN_ON(...))
> + r->mon.num_cntrs = 64;
> + }
> + }
> }
>
> l3_mon_evt_init(r);
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index bf99eb9c6ce4..24087e6efbb6 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -153,10 +153,12 @@ struct resctrl_schema;
> /**
> * struct resctrl_mon - Monitoring related data
> * @num_rmid: Number of RMIDs available
> + * @num_cntrs: Maximum number of abmc counters
Maximum implies some limit and it is not clear what limit this applies to.
Can this just be "Number of monitoring counters".
Thinking ahead ... when switching between different "MBM modes" it
may be that one mode has a different number of counters than the other.
Since "ABMC" is just one mode it does not seem appropriate to
connect the "num_cntrs" property to ABMC.
> * @evt_list: List of monitoring events
> */
> struct resctrl_mon {
> int num_rmid;
> + int num_cntrs;
> struct list_head evt_list;
> };
>
> @@ -177,6 +179,7 @@ struct resctrl_mon {
> * @parse_ctrlval: Per resource function pointer to parse control values
> * @fflags: flags to choose base and info files
> * @cdp_capable: Is the CDP feature available on this resource
> + * @abmc_capable: Is system capable of supporting monitor assignment?
> */
> struct rdt_resource {
> int rid;
> @@ -196,6 +199,7 @@ struct rdt_resource {
> struct rdt_domain *d);
> unsigned long fflags;
> bool cdp_capable;
> + bool abmc_capable;
Shouldn't abmc_capable be a property of the new struct resctrl_mon?
> };
>
> /**
Reinette
Hi Babu,
Shortlog: "Introduce resctrl_file_fflags_init() ..."
On 5/24/24 5:23 AM, Babu Moger wrote:
> The functions thread_throttle_mode_init() and mbm_config_rftype_init()
You can drop "The functions".
> both initialize fflags for resctrl files. Adding the new files will
"Adding the new files" -> "Adding new files"? (this should be in new
paragraph)
> involve adding another function to initialize the fflags. All of this
Solution should be in new paragraph and written in imperative mood.
> can be simplified by adding a new function resctrl_file_fflags_init()
> and passing the file name and flags to be initialized.
>
> Consolidate multiple fflags initialization into resctrl_file_fflags_init()
> and remove thread_throttle_mode_init() and mbm_config_rftype_init().
>
> Signed-off-by: Babu Moger <[email protected]>
> ---
Patch looks good.
Reinette
Hi Babu,
On 5/24/24 5:23 AM, Babu Moger wrote:
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID and monitor the bandwidth as long as the counter
> is assigned. Number of assignments depend on number of ABMC counters
> available.
Take care that this interface will not just be used by ABMC. I assumed that
when the user switches to "soft-RMID" then the value of "num_cntrs" will change?
>
> Provide the interface to display the number of ABMC counters supported.
>
> Signed-off-by: Babu Moger <[email protected]>
> ---
> v4: Changed the counter name to num_cntrs. And few text changes.
>
> v3: Changed the field name to mbm_assign_cntrs.
>
> v2: Changed the field name to mbm_assignable_counters from abmc_counters.
> ---
> Documentation/arch/x86/resctrl.rst | 4 ++++
> arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 16 ++++++++++++++++
> 3 files changed, 21 insertions(+)
>
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 02790efaabcc..7ab8172ef208 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -257,6 +257,10 @@ with the following files:
> # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> 0=0x30;1=0x30;3=0x15;4=0x15
>
> +"num_cntrs":
> + Available when ABMC feature is supported. The number of ABMC counters
> + available for configuration.
This can only be understood by folks already familiar with AMD's ABMC feature. There is
no information about what "ABMC feature" is, what an "ABMC counter" is and what
"configuration" can be done with it.
Do you think this num_cntrs will only be used by ABMC? What will happen when user
enables "soft-RMID" or some other mode?
Reinette
Hi Babu,
On 5/24/24 5:23 AM, Babu Moger wrote:
> Add the functionality to enable/disable ABMC feature.
>
> ABMC feature is enabled by setting enabled bit(0) in MSR L3_QOS_EXT_CFG.
> When the state of ABMC is changed, the MSR needs to be updated on all
> the logical processors in the QOS Domain.
>
> Hardware counters will reset when ABMC state is changed. Kernel internal
> counters need to be reset to avoid overflow condition in the next update.
Please note that there are two "ABMC" features introduced in this series.
First, there is the "abmc" resctrl fs feature that just happens to have
the same name as AMD's "ABMC" (which may be a good motivation to change
this name). Second, there is the architecture (AMD) specific "ABMC" feature
that is enabled in response to user's request to enable the
resctrl "abmc" feature.
Other architectures need to support resctrl fs "abmc" feature with something
entirely different.
Please consider this distinction during this series because it is often
blurred.
>
> The ABMC feature details are documented in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
>
> Signed-off-by: Babu Moger <[email protected]>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> ---
> v4: Removed resctrl_arch_set_abmc_enabled and resctrl_arch_get_abmc_enabled.
> Directly calling resctrl_abmc_enable and resctrl_abmc_disable.
> Renamed couple of functions.
> resctrl_abmc_msrwrite() -> resctrl_abmc_set_one()
> resctrl_abmc_setup() -> resctrl_abmc_set_all()
> Added rdtgroup_mutex lockdep asserts.
> Updated commit log and code comments.
>
> v3: No changes.
>
> v2: Few text changes in commit message.
> ---
> arch/x86/include/asm/msr-index.h | 1 +
> arch/x86/kernel/cpu/resctrl/internal.h | 8 ++++
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 62 ++++++++++++++++++++++++++
> 3 files changed, 71 insertions(+)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index e022e6eb766c..5f9a0139e98c 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1171,6 +1171,7 @@
> #define MSR_IA32_MBA_BW_BASE 0xc0000200
> #define MSR_IA32_SMBA_BW_BASE 0xc0000280
> #define MSR_IA32_EVT_CFG_BASE 0xc0000400
> +#define MSR_IA32_L3_QOS_EXT_CFG 0xc00003ff
>
> /* MSR_IA32_VMX_MISC bits */
> #define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index d566251094b2..fabe40304798 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -97,6 +97,9 @@ cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
> return cpu;
> }
>
> +/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature */
> +#define ABMC_ENABLE BIT(0)
> +
> struct rdt_fs_context {
> struct kernfs_fs_context kfc;
> bool enable_cdpl2;
> @@ -436,6 +439,7 @@ struct rdt_parse_data {
> * @mbm_cfg_mask: Bandwidth sources that can be tracked when Bandwidth
> * Monitoring Event Configuration (BMEC) is supported.
> * @cdp_enabled: CDP state of this resource
> + * @abmc_enabled: ABMC feature is enabled
> *
> * Members of this structure are either private to the architecture
> * e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
> @@ -450,6 +454,7 @@ struct rdt_hw_resource {
> unsigned int mbm_width;
> unsigned int mbm_cfg_mask;
> bool cdp_enabled;
> + bool abmc_enabled;
> };
>
ok, so here by making "abmc_enabled" a member of struct rdt_hw_resource this is
an architecture specific property. This is reasonable since every architecture
will look different. What is _not_ ok is that this causes the rest of the
series to change resctrl fs to reach into the architecture code. For example,
this work causes mbm_config_show() to now need to peek into struct rdt_hw_resource
to see this value. That is not ok. All of the interactions between this
field and resctrl fs needs to be via arch helpers: resctrl_arch_abmc_enable()/
resctrl_arch_abmc_disable() and resctrl_arch_get_abmc_enabled()/resctrl_arch_set_abmc_enabled().
> static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r)
> @@ -493,6 +498,9 @@ static inline bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level l)
>
> int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
>
> +int resctrl_abmc_enable(enum resctrl_res_level l);
> +void resctrl_abmc_disable(enum resctrl_res_level l);
Why do these need enum resctrl_res_level parameter?
> +
> /*
> * To return the common struct rdt_resource, which is contained in struct
> * rdt_hw_resource, walk the resctrl member of struct rdt_hw_resource.
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index ca692712b393..9148d1234ede 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2416,6 +2416,68 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable)
> return 0;
> }
>
> +static void resctrl_abmc_set_one(void *arg)
> +{
> + bool *enable = arg;
> + u64 msrval;
> +
> + rdmsrl(MSR_IA32_L3_QOS_EXT_CFG, msrval);
> +
> + if (*enable)
> + msrval |= ABMC_ENABLE;
> + else
> + msrval &= ~ABMC_ENABLE;
> +
> + wrmsrl(MSR_IA32_L3_QOS_EXT_CFG, msrval);
> +}
> +
> +static int resctrl_abmc_set_all(enum resctrl_res_level l, bool enable)
Should this function and resctrl_abmc_set_one() perhaps have "amd" in the
name just to enforce that this is not filesystem code at all and specific
and unique to AMD.
> +{
> + struct rdt_resource *r = &rdt_resources_all[l].r_resctrl;
> + struct rdt_domain *d;
> +
> + /*
> + * Update QOS_CFG MSR on all the CPUs associated with the resource
end of sentence needs "."
This comment about the specific register seems more appropriate to
resctrl_abmc_set_one() though. This function is a higher level
enable/disable of the hardware feature.
> + * Hardware counters will reset after switching the monotor mode.
monotor -> monitor
> + * Reset the internal counters so that it is not considered as
> + * an overflow in next update.
For the first time the term "internal counters" is introduced. What does it
mean?
> + */
> + list_for_each_entry(d, &r->domains, list) {
> + on_each_cpu_mask(&d->cpu_mask, resctrl_abmc_set_one, &enable, 1);
> + resctrl_arch_reset_rmid_all(r, d);
> + }
> +
> + return 0;
> +}
> +
> +int resctrl_abmc_enable(enum resctrl_res_level l)
> +{
> + struct rdt_hw_resource *hw_res = &rdt_resources_all[l];
> + int ret = 0;
> +
> + lockdep_assert_held(&rdtgroup_mutex);
> +
> + if (!hw_res->abmc_enabled) {
> + ret = resctrl_abmc_set_all(l, true);
> + if (!ret)
> + hw_res->abmc_enabled = true;
This error handling seems useless since resctrl_abmc_set_all() always returns
0 ... perhaps it should return void instead and this error handling dropped?
With that this function can never fail either and it can just return void,
but this is probably not what we want as the architecture call since other
architectures may fail.
> + }
> +
> + return ret;
> +}
> +
> +void resctrl_abmc_disable(enum resctrl_res_level l)
> +{
> + struct rdt_hw_resource *hw_res = &rdt_resources_all[l];
> +
> + lockdep_assert_held(&rdtgroup_mutex);
> +
> + if (hw_res->abmc_enabled) {
> + resctrl_abmc_set_all(l, false);
> + hw_res->abmc_enabled = false;
> + }
> +}
> +
> /*
> * We don't allow rdtgroup directories to be created anywhere
> * except the root directory. Thus when looking for the rdtgroup
Reinette
Hi Babu,
On 5/24/24 5:23 AM, Babu Moger wrote:
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID and monitor the bandwidth as long as it is assigned.
> ABMC mode is enabled by default when supported. System can be one mode
> at a time (Legacy monitor mode or ABMC mode).
>
> Provide an interface to display the monitor mode on the system.
> $cat /sys/fs/resctrl/info/L3_MON/mbm_assign
> [abmc]
> legacy
Output is different from cover letter and what this patch implements.
>
> Signed-off-by: Babu Moger <[email protected]>
> ---
> v4: Fixed the checks for legacy and abmc mode. Default it ABMC.
>
> v3: New patch to display ABMC capability.
> ---
> Documentation/arch/x86/resctrl.rst | 10 ++++++++++
> arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 23 +++++++++++++++++++++++
> 3 files changed, 34 insertions(+)
>
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 7ab8172ef208..ab3cde61a124 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -261,6 +261,16 @@ with the following files:
> Available when ABMC feature is supported. The number of ABMC counters
> available for configuration.
>
> +"mbm_assign":
This name is not ideal but I am having trouble finding a better one ... I have
seen you use "monitor mode" a couple of times (even in shortlog), so maybe that
could be the start of a more generic name? "mbm_mode"?
> + Available when ABMC feature is supported. Reports the list of assignable
Why not always make this file available? At least it will display that
legacy mode is supported and it gives user space an always present file to query to
determine support.
> + monitoring features supported. The enclosed brackets indicate which
> + feature is enabled.
> + ::
> +
> + cat /sys/fs/resctrl/info/L3_MON/mbm_assign
> + [abmc]
> + mbm_legacy
> +
> "max_threshold_occupancy":
> Read/write file provides the largest value (in
> bytes) at which a previously used LLC_occupancy
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index e75a6146068b..b1d002e5e93d 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1071,6 +1071,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>
> if (rdt_cpu_has(X86_FEATURE_ABMC)) {
> r->abmc_capable = true;
> + resctrl_file_fflags_init("mbm_assign", RFTYPE_MON_INFO);
> /*
> * Query CPUID_Fn80000020_EBX_x05 for number of
> * ABMC counters
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 9148d1234ede..3071bbb7a15e 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -856,6 +856,23 @@ static int rdtgroup_num_cntrs_show(struct kernfs_open_file *of,
> return 0;
> }
>
> +static int rdtgroup_mbm_assign_show(struct kernfs_open_file *of,
> + struct seq_file *s, void *v)
> +{
> + struct rdt_resource *r = of->kn->parent->priv;
> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
Please use arch helper to just get abmc state instead of fs code
digging into arch structures.
> +
> + if (hw_res->abmc_enabled) {
> + seq_puts(s, "[abmc]\n");
> + seq_puts(s, "mbm_legacy\n");
> + } else {
> + seq_puts(s, "abmc\n");
> + seq_puts(s, "[mbm_legacy]\n");
> + }
> +
> + return 0;
> +}
> +
> #ifdef CONFIG_PROC_CPU_RESCTRL
>
> /*
> @@ -1913,6 +1930,12 @@ static struct rftype res_common_files[] = {
> .seq_show = mbm_local_bytes_config_show,
> .write = mbm_local_bytes_config_write,
> },
> + {
> + .name = "mbm_assign",
> + .mode = 0444,
> + .kf_ops = &rdtgroup_kf_single_ops,
> + .seq_show = rdtgroup_mbm_assign_show,
> + },
> {
> .name = "cpus",
> .mode = 0644,
Reinette
Hi Babu,
On 5/24/24 5:23 AM, Babu Moger wrote:
> Hardware provides a set of counters when the ABMC feature is supported.
> These counters are used for enabling the events in resctrl group when
> the feature is supported.
>
> Introduce num_cntrs_free_map bitmap to track available and free counters.
>
> Signed-off-by: Babu Moger <[email protected]>
> ---
> v4: Changed the name to num_cntrs where applicable.
> Used bitmap apis.
> Added more comments for the globals.
>
> v3: Changed the bitmap name to assign_cntrs_free_map. Removed abmc
> from the name.
>
> v2: Changed the bitmap name to assignable_counter_free_map from
> abmc_counter_free_map.
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 3071bbb7a15e..400ae405e10e 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -186,6 +186,23 @@ bool closid_allocated(unsigned int closid)
> return !test_bit(closid, &closid_free_map);
> }
>
> +/*
> + * ABMC Counter bitmap and length for tracking available counters.
> + * ABMC feature provides set of hardware counters for enabling events.
> + * Each event takes one hardware counter. Kernel needs to keep track
> + * of number of available counters.
> + */
> +static unsigned long num_cntrs_free_map;
Why does variable have "num" in its name? That seems strange. How
about just "mon_cntrs_free_map
> +static u32 num_cntrs_free_map_len;
Same comment about "num" ... also, any special reason why u32 is needed?
> +
> +static void num_cntrs_init(void)
mon_cntrs_init() ?
> +{
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +
> + bitmap_fill(&num_cntrs_free_map, r->mon.num_cntrs);
> + num_cntrs_free_map_len = r->mon.num_cntrs;
> +}
> +
> /**
> * rdtgroup_mode_by_closid - Return mode of resource group with closid
> * @closid: closid if the resource group
> @@ -2459,6 +2476,12 @@ static int resctrl_abmc_set_all(enum resctrl_res_level l, bool enable)
resctrl_abmc_set_all() was initially created as a complement of
resctrl_abmc_set_one() ... but with more initialization added to
resctrl_abmc_set_all() this relationship becomes vague.
> struct rdt_resource *r = &rdt_resources_all[l].r_resctrl;
> struct rdt_domain *d;
>
> + /*
> + * The bitmap num_cntrs_free_map needs to be reset when switching
> + * the monitor mode.
This comment puts in words what can be understood by just looking at the code.
What would be useful would be to explain _why_ this needs to be done ...
what happens to the previously assigned counters?
> + */
> + num_cntrs_init();
> +
> /*
> * Update QOS_CFG MSR on all the CPUs associated with the resource
> * Hardware counters will reset after switching the monotor mode.
Reinette
Hi Babu,
On 5/24/24 5:23 AM, Babu Moger wrote:
> If the BMEC (Bandwidth Monitoring Event Configuration) feature is
> supported, the bandwidth events can be configured to track specific
> events. The event configuration is domain specific. ABMC (Assignable
> Bandwidth Monitoring Counters) feature needs event configuration
> information to assign hardware counter to an RMID. Event configurations
> are not stored in resctrl but instead always read from or written to
> hardware directly when prompted by user space.
>
> Read the event configuration from the hardware during the domain
> initialization. Save the configuration information in the rdt_hw_domain,
> so it can be used for counter assignment.
>
> Signed-off-by: Babu Moger <[email protected]>
> ---
> v4: Read the configuration information from the hardware to initialize.
> Added few commit messages.
> Fixed the tab spaces.
>
> v3: Minor changes related to rebase in mbm_config_write_domain.
>
> v2: No changes.
> ---
> arch/x86/kernel/cpu/resctrl/core.c | 2 ++
> arch/x86/kernel/cpu/resctrl/internal.h | 5 +++++
> arch/x86/kernel/cpu/resctrl/monitor.c | 21 +++++++++++++++++++++
> 3 files changed, 28 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index ec93f6a50308..856c46d12177 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -542,6 +542,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
> return;
> }
>
> + arch_domain_mbm_evt_config(hw_dom);
> +
> list_add_tail_rcu(&d->list, add_pos);
>
> err = resctrl_online_domain(r, d);
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 5e7e76cd512f..60a1ca0a11a7 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -373,6 +373,8 @@ struct arch_mbm_state {
> * @ctrl_val: array of cache or mem ctrl values (indexed by CLOSID)
> * @arch_mbm_total: arch private state for MBM total bandwidth
> * @arch_mbm_local: arch private state for MBM local bandwidth
> + * @mbm_total_cfg: MBM total bandwidth configuration
> + * @mbm_local_cfg: MBM local bandwidth configuration
> *
> * Members of this structure are accessed via helpers that provide abstraction.
> */
> @@ -381,6 +383,8 @@ struct rdt_hw_domain {
> u32 *ctrl_val;
> struct arch_mbm_state *arch_mbm_total;
> struct arch_mbm_state *arch_mbm_local;
> + u32 mbm_total_cfg;
> + u32 mbm_local_cfg;
> };
Similar to the abmc_enabled member of rdt_hw_resource, these new
members of rdt_hw_domain are architecture specific and should never be
touched directly by resctrl fs code, for example, from mbm_config_show().
>
> static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
> @@ -622,6 +626,7 @@ void __check_limbo(struct rdt_domain *d, bool force_free);
> void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
> void __init resctrl_file_fflags_init(const char *config,
> unsigned long fflags);
> +void arch_domain_mbm_evt_config(struct rdt_hw_domain *hw_dom);
> void rdt_staged_configs_clear(void);
> bool closid_allocated(unsigned int closid);
> int resctrl_find_cleanest_closid(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index b1d002e5e93d..ab0f4bb49bd9 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -1093,6 +1093,27 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
> return 0;
> }
>
> +void arch_domain_mbm_evt_config(struct rdt_hw_domain *hw_dom)
> +{
> + u64 msrval;
> +
> + /*
> + * Read the configuration registers QOS_EVT_CFG_n, where <n> is
> + * the BMEC event number (EvtID).
> + * 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
> + * 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
This is what mon_event_config_index_get() is for, no?
> + */
> + if (mbm_total_event.configurable) {
> + rdmsrl(MSR_IA32_EVT_CFG_BASE, msrval);
> + hw_dom->mbm_total_cfg = msrval & MAX_EVT_CONFIG_BITS;
> + }
> +
> + if (mbm_local_event.configurable) {
> + rdmsrl(MSR_IA32_EVT_CFG_BASE + 1, msrval);
> + hw_dom->mbm_local_cfg = msrval & MAX_EVT_CONFIG_BITS;
> + }
> +}
> +
> void __exit rdt_put_mon_l3_config(void)
> {
> dom_data_exit();
Reinette
Hi Babu,
On 5/24/24 5:23 AM, Babu Moger wrote:
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID and monitor the bandwidth as long as the counter
> is assigned. The bandwidth events will be tracked by the hardware until
> the user changes the configuration. Each resctrl group can configure
> maximum two counters, one for total event and one for local event.
>
> The counters are configured by writing to MSR L3_QOS_ABMC_CFG.
> Configuration is done by setting the counter id, bandwidth source (RMID),
> bandwidth types. Reading L3_QOS_ABMC_DSC returns the configuration of
What is a "bandwidth type" here?
> the counter id specified in L3_QOS_ABMC_CFG.
>
> Attempts to read or write these MSRs when ABMC is not enabled will result
> in a #GP(0) exception.
>
> MSR L3_QOS_ABMC_CFG (0xC000_03FDh) and L3_QOS_ABMC_DSC (0xC000_03FEh)
> details.
> ==========================================================================
> Bits Mnemonic Description Access Type Reset Value
> ==========================================================================
> 63 CfgEn Configuration Enable R/W 0
>
> 62 CtrEn Counter Enable R/W 0
>
> 61:53 – Reserved MBZ 0
>
> 52:48 CtrID Counter Identifier R/W 0
>
> 47 IsCOS BwSrc field is a COS R/W 0
> (not an RMID)
>
> 46:44 – Reserved MBZ 0
>
> 43:32 BwSrc Bandwidth Source R/W 0
> (RMID or COS)
>
> 31:0 BwType Bandwidth types to R/W 0
> track for this counter
> ==========================================================================
>
This is copy&paste from AMD spec but needs some translation to map it to
resctrl.
> The feature details are documented in the APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
>
> Signed-off-by: Babu Moger <[email protected]>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> ---
> v4: Added more descriptions.
> Changed the name abmc_ctr_id to ctr_id.
> Added L3_QOS_ABMC_DSC. Used for reading the configuration.
>
> v3: No changes.
>
> v2: No changes.
> ---
> arch/x86/include/asm/msr-index.h | 2 ++
> arch/x86/kernel/cpu/resctrl/internal.h | 30 ++++++++++++++++++++++++++
> 2 files changed, 32 insertions(+)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 5f9a0139e98c..6d2fe39ac68f 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1172,6 +1172,8 @@
> #define MSR_IA32_SMBA_BW_BASE 0xc0000280
> #define MSR_IA32_EVT_CFG_BASE 0xc0000400
> #define MSR_IA32_L3_QOS_EXT_CFG 0xc00003ff
> +#define MSR_IA32_L3_QOS_ABMC_CFG 0xc00003fd
> +#define MSR_IA32_L3_QOS_ABMC_DSC 0xc00003fe
>
> /* MSR_IA32_VMX_MISC bits */
> #define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 60a1ca0a11a7..45ed33f4f0ff 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -107,6 +107,9 @@ cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
> #define ASSIGN_TOTAL BIT(QOS_L3_MBM_TOTAL_EVENT_ID)
> #define ASSIGN_LOCAL BIT(QOS_L3_MBM_LOCAL_EVENT_ID)
>
> +/* Maximum assignable counters per resctrl group */
> +#define MAX_CNTRS 2
> +
> struct rdt_fs_context {
> struct kernfs_fs_context kfc;
> bool enable_cdpl2;
> @@ -211,6 +214,7 @@ enum rdtgrp_mode {
> * @crdtgrp_list: child rdtgroup node list
> * @rmid: rmid for this rdtgroup
> * @mon_state: Assignment state of the group
> + * @ctr_id: ABMC counter ids assigned to this group
> */
> struct mongroup {
> struct kernfs_node *mon_data_kn;
> @@ -218,6 +222,7 @@ struct mongroup {
> struct list_head crdtgrp_list;
> u32 rmid;
> u32 mon_state;
> + u32 ctr_id[MAX_CNTRS];
To match "num_cntrs" it seems "cntr_id" may be more appropriate?
> };
>
> /**
> @@ -568,6 +573,31 @@ union cpuid_0x10_x_edx {
> unsigned int full;
> };
>
> +/*
> + * ABMC counters can be configured by writing to L3_QOS_ABMC_CFG.
extra space
> + * @bw_type : Bandwidth types to track for this counter
What is a "Bandwidth type"? What are possible values?
> + * @bw_src : Bandwidth Source (RMID or CLOSID)
> + * @reserved1 : Reserved
> + * @is_cos : BwSrc field is a COS (not an RMID)
Please be consistent wrt "CLOS"/"CLOSID" in comment and member name
> + * @ctr_id : Counter Identifier
Please be consistent with "cntr" vs "ctr". Earlier the series
introduces "num_cntrs" so cntr seems appropriate?
> + * @reserved : Reserved
> + * @ctr_en : Counter Enable bit
> + * @cfg_en : Configuration Enable bit
Why are there two enable bits? How are they used?
(please match other kernel doc wrt : usage)
> + */
> +union l3_qos_abmc_cfg {
> + struct {
> + unsigned long bw_type :32,
> + bw_src :12,
> + reserved1: 3,
> + is_cos : 1,
> + ctr_id : 5,
> + reserved : 9,
> + ctr_en : 1,
> + cfg_en : 1;
> + } split;
> + unsigned long full;
> +};
> +
> void rdt_last_cmd_clear(void);
> void rdt_last_cmd_puts(const char *s);
> __printf(1, 2)
Reinette
Hi Babu,
On 5/24/24 5:23 AM, Babu Moger wrote:
> ABMC feature requires to users to assign a hardware counter to an RMID
"requires to users" -> "requires users"?
> to monitor the events. Provide the interfaces to assign a counter.
>
> Individual counters are configured by writing to L3_QOS_ABMC_CFG MSR
> and specifying the counter id, bandwidth source, and bandwidth types.
Where is discription of what this patch does and why?
This and following patches seem to introduce building blocks that
are only used later in series. These building blocks are introduced
without any information about what they do and why and that makes it
difficult to understand how this work will fall into place.
>
> The feature details are documented in the APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
>
> Signed-off-by: Babu Moger <[email protected]>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> ---
> v4: Commit message update.
> User bitmap APIs where applicable.
> Changed the interfaces considering MPAM(arm).
> Added domain specific assignment.
>
> v3: Removed the static from the prototype of rdtgroup_assign_abmc.
> The function is not called directly from user anymore. These
> changes are related to global assignment interface.
>
> v2: Minor text changes in commit message.
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 3 +
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 101 +++++++++++++++++++++++++
> 2 files changed, 104 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 45ed33f4f0ff..a88c8fc5e4df 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -657,6 +657,9 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
> void __init resctrl_file_fflags_init(const char *config,
> unsigned long fflags);
> void arch_domain_mbm_evt_config(struct rdt_hw_domain *hw_dom);
> +int resctrl_arch_assign(struct rdt_domain *d, u32 evtid, u32 rmid,
> + u32 ctr_id, u32 closid, bool enable);
> +int resctrl_grp_assign(struct rdtgroup *rdtgrp, u32 evtid);
> void rdt_staged_configs_clear(void);
> bool closid_allocated(unsigned int closid);
> int resctrl_find_cleanest_closid(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 0e425c91fa46..48df76499a04 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -203,6 +203,19 @@ static void num_cntrs_init(void)
> num_cntrs_free_map_len = r->mon.num_cntrs;
> }
>
> +static int assign_cntrs_alloc(void)
Only one counter is allocated and "assign" is unexpected since it
seems to imply how this counter will be used.
It can just be "mon_cntr_alloc()"?
> +{
> + u32 ctr_id = find_first_bit(&num_cntrs_free_map,
> + num_cntrs_free_map_len);
> +
> + if (ctr_id >= num_cntrs_free_map_len)
> + return -ENOSPC;
> +
> + __clear_bit(ctr_id, &num_cntrs_free_map);
> +
> + return ctr_id;
> +}
> +
> /**
> * rdtgroup_mode_by_closid - Return mode of resource group with closid
> * @closid: closid if the resource group
> @@ -1830,6 +1843,94 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
> return ret ?: nbytes;
> }
>
> +static void rdtgroup_abmc_cfg(void *info)
> +{
> + u64 *msrval = info;
> +
> + wrmsrl(MSR_IA32_L3_QOS_ABMC_CFG, *msrval);
> +}
> +
Please add comments to these functions to summarize what
they do.
> +int resctrl_arch_assign(struct rdt_domain *d, u32 evtid, u32 rmid,
> + u32 ctr_id, u32 closid, bool enable)
> +{
> + struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
> + union l3_qos_abmc_cfg abmc_cfg = { 0 };
> + struct arch_mbm_state *arch_mbm;
> +
> + abmc_cfg.split.cfg_en = 1;
> + abmc_cfg.split.ctr_en = enable ? 1 : 0;
> + abmc_cfg.split.ctr_id = ctr_id;
> + abmc_cfg.split.bw_src = rmid;
> +
> + /*
> + * Read the event configuration from the domain and pass it as
> + * bw_type.
> + */
> + if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
> + abmc_cfg.split.bw_type = hw_dom->mbm_total_cfg;
> + arch_mbm = &hw_dom->arch_mbm_total[rmid];
> + } else {
> + abmc_cfg.split.bw_type = hw_dom->mbm_local_cfg;
> + arch_mbm = &hw_dom->arch_mbm_local[rmid];
> + }
> +
> + smp_call_function_any(&d->cpu_mask, rdtgroup_abmc_cfg, &abmc_cfg, 1);
> +
> + /* Reset the internal counters */
"internal counters"? This needs a definition ... but since this is not
a new data structure the comment can be more specific about what is done
and why.
> + if (arch_mbm)
> + memset(arch_mbm, 0, sizeof(struct arch_mbm_state));
> +
> + return 0;
If this function always succeeds then it can just be void?
> +}
> +
> +int resctrl_grp_assign(struct rdtgroup *rdtgrp, u32 evtid)
Please use consistent naming. Note how the filename is "rdtgroup.c" and this
function operates on "struct rdtgroup" and if you take a closer look at
the functions in this file and the header where you add this function you
will notice a distinct pattern of "rdtgroup_" used as prefix. There really
is no reason to start a new namespace for this.
> +{
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> + int ctr_id = 0, index;
> + struct rdt_domain *d;
> + u32 mon_state;
> +
> + index = mon_event_config_index_get(evtid);
> + if (index == INVALID_CONFIG_INDEX) {
> + rdt_last_cmd_puts("Invalid event id\n");
> + return -EINVAL;
> + }
> +
> + if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
> + mon_state = ASSIGN_TOTAL;
> + } else if (evtid == QOS_L3_MBM_LOCAL_EVENT_ID) {
> + mon_state = ASSIGN_LOCAL;
> + } else {
> + rdt_last_cmd_puts("Invalid event id\n");
> + return -EINVAL;
> + }
> +
> + /* Nothing to do if event has been assigned already */
> + if (rdtgrp->mon.mon_state & mon_state) {
Why is this mon_state needed? Is it not redundant when considering that
struct mongroup has an array of counter IDs that can already be used
to see if a counter is assigned or not?
What if there is a new:
#define MON_CNTR_UNSET U32_MAX
When cntr_id[index] == MON_CNTR_UNSET then it is not assigned.
> + rdt_last_cmd_puts("ABMC counter is assigned already\n");
> + return 0;
> + }
> +
> + /*
> + * Allocate a new counter and update domains
> + */
> + ctr_id = assign_cntrs_alloc();
> + if (ctr_id < 0) {
> + rdt_last_cmd_puts("Out of ABMC counters\n");
> + return -ENOSPC;
> + }
> +
> + rdtgrp->mon.ctr_id[index] = ctr_id;
> +
> + list_for_each_entry(d, &r->domains, list)
> + resctrl_arch_assign(d, evtid, rdtgrp->mon.rmid, ctr_id,
> + rdtgrp->closid, 1);
> +
> + rdtgrp->mon.mon_state |= mon_state;
> +
> + return 0;
> +}
> +
> /* rdtgroup information files for one cache resource. */
> static struct rftype res_common_files[] = {
> {
Reinette
Hi Babu,
On 5/24/24 5:23 AM, Babu Moger wrote:
> Hardware provides a limited number of ABMC counters. Once all the
> counters are exhausted, counters need to be freed for new assignments.
>
> Provide the interface to unassign the counter.
Please write a proper changelog. This needs information that explains
what this patch does and why.
>
> The feature details are documented in the APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC).
>
> Signed-off-by: Babu Moger <[email protected]>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> ---
> v4: Added domain specific unassign feature.
> Few name changes.
>
> v3: Removed the static from the prototype of rdtgroup_unassign_abmc.
> The function is not called directly from user anymore. These
> changes are related to global assignment interface.
>
> v2: No changes.
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 42 ++++++++++++++++++++++++++
> 2 files changed, 44 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index a88c8fc5e4df..e16244895350 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -660,6 +660,8 @@ void arch_domain_mbm_evt_config(struct rdt_hw_domain *hw_dom);
> int resctrl_arch_assign(struct rdt_domain *d, u32 evtid, u32 rmid,
> u32 ctr_id, u32 closid, bool enable);
> int resctrl_grp_assign(struct rdtgroup *rdtgrp, u32 evtid);
> +int resctrl_grp_unassign(struct rdtgroup *rdtgrp, u32 evtid);
> +void num_cntrs_free(u32 ctr_id);
> void rdt_staged_configs_clear(void);
> bool closid_allocated(unsigned int closid);
> int resctrl_find_cleanest_closid(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 48df76499a04..5ea1e58c7201 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -216,6 +216,11 @@ static int assign_cntrs_alloc(void)
> return ctr_id;
> }
>
> +void num_cntrs_free(u32 ctr_id)
The name does not reflect what it does. It neither frees the "num_cntrs" information
nor does it free "num_cntrs" of counters. How about "mon_cntr_free()"?
> +{
> + __set_bit(ctr_id, &num_cntrs_free_map);
> +}
> +
> /**
> * rdtgroup_mode_by_closid - Return mode of resource group with closid
> * @closid: closid if the resource group
> @@ -1931,6 +1936,43 @@ int resctrl_grp_assign(struct rdtgroup *rdtgrp, u32 evtid)
> return 0;
> }
>
> +int resctrl_grp_unassign(struct rdtgroup *rdtgrp, u32 evtid)
Same comment wrt namespace. Also this function needs a description.
> +{
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> + struct rdt_domain *d;
> + u32 mon_state;
> + int index;
> +
> + index = mon_event_config_index_get(evtid);
> + if (index == INVALID_CONFIG_INDEX) {
> + pr_warn_once("Invalid event id %d\n", evtid);
> + return -EINVAL;
> + }
> +
> + if (evtid == QOS_L3_MBM_TOTAL_EVENT_ID) {
> + mon_state = ASSIGN_TOTAL;
> + } else if (evtid == QOS_L3_MBM_LOCAL_EVENT_ID) {
> + mon_state = ASSIGN_LOCAL;
> + } else {
> + rdt_last_cmd_puts("Invalid event id\n");
> + return -EINVAL;
> + }
> +
> + if (rdtgrp->mon.mon_state & mon_state) {
> + list_for_each_entry(d, &r->domains, list)
> + resctrl_arch_assign(d, evtid, rdtgrp->mon.rmid,
> + rdtgrp->mon.ctr_id[index],
> + rdtgrp->closid, 0);
> +
> + /* Update the counter bitmap */
> + num_cntrs_free(rdtgrp->mon.ctr_id[index]);
> + }
> +
> + rdtgrp->mon.mon_state &= ~mon_state;
> +
> + return 0;
> +}
> +
> /* rdtgroup information files for one cache resource. */
> static struct rftype res_common_files[] = {
> {
Reinette
Hi Babu,
On 5/24/24 5:23 AM, Babu Moger wrote:
> Enable ABMC by default when feature is supported.
Why enable/disable it on every mount/unmount cycle though? Why not just
enable it at the beginning and only change if requested by user?
>
> Counters are assigned automatically on group creation. If the counters
> are exhausted, report the warnings and continue. It is not required to
> fail group creation for assignment failures. Users will have the option
> to modify the assignments.
>
> Signed-off-by: Babu Moger <[email protected]>
> ---
> v4: Few name changes based on the upstream discussion.
> Commit message update.
>
> v3: This is a new patch. Patch addresses the upstream comment to enable
> ABMC feature by default if the feature is available.
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 69 ++++++++++++++++++++++++++
> 1 file changed, 69 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 5ea1e58c7201..f452b6d9bb99 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2756,6 +2756,7 @@ static void rdt_disable_ctx(void)
> {
> resctrl_arch_set_cdp_enabled(RDT_RESOURCE_L3, false);
> resctrl_arch_set_cdp_enabled(RDT_RESOURCE_L2, false);
> + resctrl_abmc_disable(RDT_RESOURCE_L3);
> set_mba_sc(false);
>
> resctrl_debug = false;
> @@ -2786,6 +2787,8 @@ static int rdt_enable_ctx(struct rdt_fs_context *ctx)
> if (ctx->enable_debug)
> resctrl_debug = true;
>
> + resctrl_abmc_enable(RDT_RESOURCE_L3);
> +
> return 0;
>
> out_cdpl3:
> @@ -2882,6 +2885,41 @@ static void schemata_list_destroy(void)
> }
> }
>
> +static int resctrl_assign_events(struct rdtgroup *rdtgrp)
> +{
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> + int ret = 0;
> +
> + if (!hw_res->abmc_enabled)
> + return 0;
resctrl fs should not be peeking in arch structure
> +
> + if (is_mbm_total_enabled())
> + ret = resctrl_grp_assign(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
> +
> + if (!ret && is_mbm_local_enabled())
> + ret = resctrl_grp_assign(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
> +
This function needs a comment about why it is ok to return with error and
partial changes.
> + return ret;
> +}
> +
> +static int resctrl_unassign_events(struct rdtgroup *rdtgrp)
> +{
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> + struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> + int ret = 0;
> +
> + if (!hw_res->abmc_enabled)
> + return 0;
> +
> + if (is_mbm_total_enabled())
> + ret = resctrl_grp_unassign(rdtgrp, QOS_L3_MBM_TOTAL_EVENT_ID);
> +
> + if (!ret && is_mbm_local_enabled())
> + ret = resctrl_grp_unassign(rdtgrp, QOS_L3_MBM_LOCAL_EVENT_ID);
> + return ret;
> +}
> +
> static int rdt_get_tree(struct fs_context *fc)
> {
> struct rdt_fs_context *ctx = rdt_fc2context(fc);
> @@ -2941,6 +2979,14 @@ static int rdt_get_tree(struct fs_context *fc)
> if (ret < 0)
> goto out_mongrp;
> rdtgroup_default.mon.mon_data_kn = kn_mondata;
> +
> + /*
> + * Assign the monitor counters if it is available. If it fails,
> + * report the warnings and continue. It is not nessaccery to
> + * fail here.
> + */
nessaccery -> necessary
Please elaborate why it is not necessary to fail.
> + if (resctrl_assign_events(&rdtgroup_default) < 0)
> + rdt_last_cmd_puts("Monitor assignment failed\n");
How will user know that there is a warning? This does not return an error
so nothing will prompt user to check the status file. Perhaps add a comment
in the docs to help users at least know that this exists. Another helpful
thing will be to have the counter return "Unassigned" if it has not been assigned
since users familiar with old interface may see "Unavailable" without
knowing that user action is required to get a working event.
> }
>
> ret = rdt_pseudo_lock_init();
> @@ -3214,6 +3260,8 @@ static void rdt_kill_sb(struct super_block *sb)
> cpus_read_lock();
> mutex_lock(&rdtgroup_mutex);
>
> + resctrl_unassign_events(&rdtgroup_default);
> +
> rdt_disable_ctx();
>
> /*Put everything back to default values. */
> @@ -3752,6 +3800,14 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
> goto out_unlock;
> }
>
> + /*
> + * Assign the monitor counters if it is available. If it fails,
> + * report the warnings and continue. It is not nessaccery to
> + * fail here.
> + */
Fix copy&paste here and below.
> + if (resctrl_assign_events(rdtgrp) < 0)
> + rdt_last_cmd_puts("Monitor assignment failed\n");
> +
> kernfs_activate(rdtgrp->kn);
>
> /*
> @@ -3796,6 +3852,14 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
> if (ret)
> goto out_closid_free;
>
> + /*
> + * Assign the monitor counters if it is available. If it fails,
> + * report the warnings and continue. It is not nessaccery to
> + * fail here.
> + */
> + if (resctrl_assign_events(rdtgrp) < 0)
> + rdt_last_cmd_puts("Monitor assignment failed\n");
> +
> kernfs_activate(rdtgrp->kn);
>
> ret = rdtgroup_init_alloc(rdtgrp);
> @@ -3891,6 +3955,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
> update_closid_rmid(tmpmask, NULL);
>
> rdtgrp->flags = RDT_DELETED;
> +
> + resctrl_unassign_events(rdtgrp);
> +
> free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>
> /*
> @@ -3937,6 +4004,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
> cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
> update_closid_rmid(tmpmask, NULL);
>
> + resctrl_unassign_events(rdtgrp);
> +
> free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
> closid_free(rdtgrp->closid);
>
Reinette
Hi Babu,
On 5/24/24 5:23 AM, Babu Moger wrote:
> Introduce interface to switch between ABMC and legacy_mbm modes.
shortlog and first sentence of changelog do not match: mbm_legacy vs legacy_mbm?
>
> By default ABMC is enabled on resctrl mount if the feature is available.
> However, user will have the option to go back to legacy_mbm if required.
>
> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign
> [abmc]
> mbm_legacy
>
> To enable the legacy monitoring feature:
> $ echo "mbm_legacy" > /sys/fs/resctrl/info/L3_MON/mbm_assign
Missing information about user visible impact to counters/events
and any mitigations needed in implementation.
>
> Signed-off-by: Babu Moger <[email protected]>
> ---
> v4: Minor commit text changes. Keep the default to ABMC when supported.
>
> v3: New patch to address the review comments from upstream.
> ---
> Documentation/arch/x86/resctrl.rst | 10 +++++++
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 40 +++++++++++++++++++++++++-
> 2 files changed, 49 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index ab3cde61a124..fd050d4d22cd 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -271,6 +271,16 @@ with the following files:
> [abmc]
> mbm_legacy
>
> + * To enable ABMC feature:
> + ::
> +
> + # echo "abmc" > /sys/fs/resctrl/info/L3_MON/mbm_assign
> +
> + * To enable the legacy monitoring feature:
> + ::
> +
> + # echo "mbm_legacy" > /sys/fs/resctrl/info/L3_MON/mbm_assign
> +
No information about what the features are or what will happen on such a switch.
> "max_threshold_occupancy":
> Read/write file provides the largest value (in
> bytes) at which a previously used LLC_occupancy
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index f452b6d9bb99..d77ff059269a 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -908,6 +908,43 @@ static int rdtgroup_mbm_assign_show(struct kernfs_open_file *of,
> return 0;
> }
>
> +/*
> + * rdtgroup_mode_write - Modify the resource group's mode
Comment does not match function name. Seems like "mode" is the generic term
to use instead of "assign".
> + */
> +static ssize_t rdtgroup_mbm_assign_write(struct kernfs_open_file *of,
> + char *buf, size_t nbytes,
> + loff_t off)
> +{
> + struct rdt_resource *r = of->kn->parent->priv;
> + int ret = 0;
> +
> + if (!r->abmc_capable)
> + return -EINVAL;
> +
> + /* Valid input requires a trailing newline */
> + if (nbytes == 0 || buf[nbytes - 1] != '\n')
> + return -EINVAL;
> +
> + buf[nbytes - 1] = '\0';
> +
> + cpus_read_lock();
> + mutex_lock(&rdtgroup_mutex);
> +
> + rdt_last_cmd_clear();
> +
> + if (!strcmp(buf, "mbm_legacy"))
> + resctrl_abmc_disable(RDT_RESOURCE_L3);
> + else if (!strcmp(buf, "abmc"))
> + ret = resctrl_abmc_enable(RDT_RESOURCE_L3);
> + else
> + ret = -EINVAL;
> +
> + mutex_unlock(&rdtgroup_mutex);
> + cpus_read_unlock();
> +
> + return ret ?: nbytes;
> +}
> +
> #ifdef CONFIG_PROC_CPU_RESCTRL
>
> /*
> @@ -2087,9 +2124,10 @@ static struct rftype res_common_files[] = {
> },
> {
> .name = "mbm_assign",
> - .mode = 0444,
> + .mode = 0644,
> .kf_ops = &rdtgroup_kf_single_ops,
> .seq_show = rdtgroup_mbm_assign_show,
> + .write = rdtgroup_mbm_assign_write,
> },
> {
> .name = "cpus",
Reinette