2023-06-20 04:08:49

by Tony Luck

[permalink] [raw]
Subject: [RFC PATCH 0/2] Resctrl - rewrite (WIP)

Back in April I posted some RFC patches that added a "driver
registration" interface to the core resctrl code so that additional
resource control and monitor features could be added without further
complicating the core code. Link to that discussion:

https://lore.kernel.org/all/[email protected]/

Reinette gave the feedback that it would be better to base the module
registration on the resctrl resource structure. Reinette also pointed
me to work from James Morse, and some additional discussion happened
here:

https://lore.kernel.org/all/ZG%2FMZVrWYrCHm%2Ffr@agluck-desk3/

James provided details on where ARM's MPAM has similarities and
differences from the Intel Resource Director Technology and AMD's
similar implementation. Drew Fustini was also pulled into that
conversation to comment on RISC-V CBQRI.

From those discussions I believed we need a do-over on the core
/sys/fs/resctrl implementation to make it friendlier for architecural
variations. Here's what I have so far.

=========================================================================
| N.B. This is a general direction check. There are many obvious |
| rough edges (e.g. some careful thought needs to happen on locking |
| for the files in /sys/fs/resctrl that are "owned" by modules that |
| can be unloaded). I'm mostly looking for feedback from AMD, ARM and |
| RISCV on whether this is a foundation to build on, whether some small |
| tweaks could make it better, or if this is still going to be really |
| hard for architectures that have radical divergence from the Intel |
| model. |
=========================================================================

First patch is my attempt at architecture neutral code. All mention
of "RDT", "CLOSID" and "RMID" have been expunged. When creating a
new group this code calls arch_alloc_resctrl_ids() to allocate an
opaque "resctrl_ids" value.

Q: I made this a "u64" because that neatly allows storage of both an
x86 CLOSID and RMID (in a handy representation that matches the bit
layout of the Intel IA32_PQR_ASSOC model specific register). If other
architectures need something more complex it could be a "typedef
resctrl_id_t" ... there are a couple of places where we would need
a comparison function.

I broke the code into several source files that handle different
sub-functions of core code to make it easier to navigate. Much of
the code here should look familiar as I did a lot of
s/rdtgroup/resctrl_group/ on functions from the original resctrl
code.

By itself the core code is useless. Cannot even be built as the
controlling Kconfig option "CONFIG_RESCTRL2_FS" must be invoked by
a "select" request from architecture specific code that provides
the necessary "arch_*()" functions to make everything work.

Module registration is handled in fs/resctrl2/resources.c and
can be done before or after mounting /sys/fs/resctrl. Current
code won't let you make any new resource groups until a module
implementing a control function is loaded to supply the information
on how many groups the architecture supports.

Second patch is all the Intel X86 code (with some of the AMD bits
included, but by no means all of them).

I've implemented modules for most of the legacy Intel control
and monitor functions. Many of these share common code (by means
of a symlinked source file ... I couldn't figure out how to make
Kconfig build both rdt_l3_cat.ko and rdt_l3_cdp.ko from the same
source file with a different set of $(CFLAGS)).

Users can pick which features they want by loading modules that
implement the bits they want. E.g. CDP is enabled by loading
that rdt_l3_cdp.ko module instead of rdt_l3_cat.ko (there's some
code to prevent both being loaded together).

I started on the hooks for the "mba_MBps" feedback from MBM driver,
but in this code drop I just have a simple module that reports the
bandwidth for each group instead of the byte count. I just need to
create a module that has both MBA control and MBM monitoring resources
with a periodic comparison of actual bandwidth with desired, that
then tweaks the MBA controls up/down as needed.

I haven't ventured to read all the pseudo-locking code, but it looks
as though providing the driver with a way to tell core code that a
group is exclusive instead of shared (which tells core code not to
allow assignment of tasks or CPUs to the group) may be all the
surgery needed to core code. The x86 module will be more complex
that the toys I've produced so far, but should be able to leverage
much from the existing resctrl implementation.


Tony Luck (2):
resctrl2: Add all the generic code
resctrl2: Arch x86 modules for most of the legacy control/monitor
functions

include/linux/resctrl.h | 107 +++++
include/linux/sched.h | 3 +
arch/x86/include/asm/resctrl.h | 38 ++
fs/resctrl2/arch/x86/rdt.h | 22 +
fs/resctrl2/internal.h | 110 +++++
arch/x86/kernel/cpu/amd.c | 3 +
arch/x86/kernel/cpu/intel.c | 3 +
arch/x86/kernel/process_32.c | 1 +
arch/x86/kernel/process_64.c | 3 +
fs/resctrl2/arch/x86/alloc.c | 119 +++++
fs/resctrl2/arch/x86/rdt_l2_cat.c | 1 +
fs/resctrl2/arch/x86/rdt_l2_cdp.c | 1 +
fs/resctrl2/arch/x86/rdt_l3_cat.c | 349 +++++++++++++++
fs/resctrl2/arch/x86/rdt_l3_cdp.c | 1 +
fs/resctrl2/arch/x86/rdt_l3_mba.c | 251 +++++++++++
fs/resctrl2/arch/x86/rdt_llc_occupancy.c | 100 +++++
fs/resctrl2/arch/x86/rdt_mbm_adjust.c | 91 ++++
fs/resctrl2/arch/x86/rdt_mbm_local_bytes.c | 1 +
fs/resctrl2/arch/x86/rdt_mbm_local_rate.c | 1 +
fs/resctrl2/arch/x86/rdt_mbm_total_bytes.c | 1 +
fs/resctrl2/arch/x86/rdt_mbm_total_rate.c | 1 +
fs/resctrl2/arch/x86/rdt_monitor.c | 491 +++++++++++++++++++++
fs/resctrl2/cpu.c | 315 +++++++++++++
fs/resctrl2/directory.c | 295 +++++++++++++
fs/resctrl2/domain.c | 99 +++++
fs/resctrl2/info.c | 99 +++++
fs/resctrl2/kernfs.c | 58 +++
fs/resctrl2/locking.c | 52 +++
fs/resctrl2/resources.c | 85 ++++
fs/resctrl2/root.c | 173 ++++++++
fs/resctrl2/schemata.c | 110 +++++
fs/resctrl2/tasks.c | 193 ++++++++
arch/x86/Kconfig | 81 +++-
fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/resctrl2/Kconfig | 5 +
fs/resctrl2/Makefile | 14 +
fs/resctrl2/arch/x86/Makefile | 29 ++
38 files changed, 3306 insertions(+), 2 deletions(-)
create mode 100644 fs/resctrl2/arch/x86/rdt.h
create mode 100644 fs/resctrl2/internal.h
create mode 100644 fs/resctrl2/arch/x86/alloc.c
create mode 120000 fs/resctrl2/arch/x86/rdt_l2_cat.c
create mode 120000 fs/resctrl2/arch/x86/rdt_l2_cdp.c
create mode 100644 fs/resctrl2/arch/x86/rdt_l3_cat.c
create mode 120000 fs/resctrl2/arch/x86/rdt_l3_cdp.c
create mode 100644 fs/resctrl2/arch/x86/rdt_l3_mba.c
create mode 100644 fs/resctrl2/arch/x86/rdt_llc_occupancy.c
create mode 100644 fs/resctrl2/arch/x86/rdt_mbm_adjust.c
create mode 120000 fs/resctrl2/arch/x86/rdt_mbm_local_bytes.c
create mode 120000 fs/resctrl2/arch/x86/rdt_mbm_local_rate.c
create mode 120000 fs/resctrl2/arch/x86/rdt_mbm_total_bytes.c
create mode 120000 fs/resctrl2/arch/x86/rdt_mbm_total_rate.c
create mode 100644 fs/resctrl2/arch/x86/rdt_monitor.c
create mode 100644 fs/resctrl2/cpu.c
create mode 100644 fs/resctrl2/directory.c
create mode 100644 fs/resctrl2/domain.c
create mode 100644 fs/resctrl2/info.c
create mode 100644 fs/resctrl2/kernfs.c
create mode 100644 fs/resctrl2/locking.c
create mode 100644 fs/resctrl2/resources.c
create mode 100644 fs/resctrl2/root.c
create mode 100644 fs/resctrl2/schemata.c
create mode 100644 fs/resctrl2/tasks.c
create mode 100644 fs/resctrl2/Kconfig
create mode 100644 fs/resctrl2/Makefile
create mode 100644 fs/resctrl2/arch/x86/Makefile


base-commit: 45a3e24f65e90a047bef86f927ebdc4c710edaa1
--
2.40.1



2023-06-20 04:19:59

by Tony Luck

[permalink] [raw]
Subject: RE: [RFC PATCH 0/2] Resctrl - rewrite (WIP)

> 38 files changed, 3306 insertions(+), 2 deletions(-)

Lest this be too scary, I'll note the original resctrl code looks like:

$ find arch/x86/kernel/cpu/resctrl -type f | xargs wc -l
43 arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h
4 arch/x86/kernel/cpu/resctrl/Makefile
996 arch/x86/kernel/cpu/resctrl/core.c
581 arch/x86/kernel/cpu/resctrl/ctrlmondata.c
560 arch/x86/kernel/cpu/resctrl/internal.h
845 arch/x86/kernel/cpu/resctrl/monitor.c
1600 arch/x86/kernel/cpu/resctrl/pseudo_lock.c
3733 arch/x86/kernel/cpu/resctrl/rdtgroup.c
8362 total

I'm haven't included pseudo_lock ... but I have most of the rest, so
I'm confident this will end up as a net reduction in lines of code.

-Tony

2023-06-27 09:05:24

by Drew Fustini

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] Resctrl - rewrite (WIP)

On Mon, Jun 19, 2023 at 08:37:00PM -0700, Tony Luck wrote:
> Back in April I posted some RFC patches that added a "driver
> registration" interface to the core resctrl code so that additional
> resource control and monitor features could be added without further
> complicating the core code. Link to that discussion:
>
> https://lore.kernel.org/all/[email protected]/
>
> Reinette gave the feedback that it would be better to base the module
> registration on the resctrl resource structure. Reinette also pointed
> me to work from James Morse, and some additional discussion happened
> here:
>
> https://lore.kernel.org/all/ZG%2FMZVrWYrCHm%2Ffr@agluck-desk3/
>
> James provided details on where ARM's MPAM has similarities and
> differences from the Intel Resource Director Technology and AMD's
> similar implementation. Drew Fustini was also pulled into that
> conversation to comment on RISC-V CBQRI.
>
> >From those discussions I believed we need a do-over on the core
> /sys/fs/resctrl implementation to make it friendlier for architecural
> variations. Here's what I have so far.
>
> =========================================================================
> | N.B. This is a general direction check. There are many obvious |
> | rough edges (e.g. some careful thought needs to happen on locking |
> | for the files in /sys/fs/resctrl that are "owned" by modules that |
> | can be unloaded). I'm mostly looking for feedback from AMD, ARM and |
> | RISCV on whether this is a foundation to build on, whether some small |
> | tweaks could make it better, or if this is still going to be really |
> | hard for architectures that have radical divergence from the Intel |
> | model. |
> =========================================================================
>
> First patch is my attempt at architecture neutral code. All mention
> of "RDT", "CLOSID" and "RMID" have been expunged. When creating a
> new group this code calls arch_alloc_resctrl_ids() to allocate an
> opaque "resctrl_ids" value.
>
> Q: I made this a "u64" because that neatly allows storage of both an
> x86 CLOSID and RMID (in a handy representation that matches the bit
> layout of the Intel IA32_PQR_ASSOC model specific register). If other
> architectures need something more complex it could be a "typedef
> resctrl_id_t" ... there are a couple of places where we would need
> a comparison function.

This works okay for RISC-V. The Ssqosid extension defines a 32-bit
register sqoscfg (see chapter 2 of CBQRI spec [0]). This contains a
12-bit MCID field (similar to an RMID) and 12-bit RCID field (similar to
an CLOSID).

>
> I broke the code into several source files that handle different
> sub-functions of core code to make it easier to navigate. Much of
> the code here should look familiar as I did a lot of
> s/rdtgroup/resctrl_group/ on functions from the original resctrl
> code.
>
> By itself the core code is useless. Cannot even be built as the
> controlling Kconfig option "CONFIG_RESCTRL2_FS" must be invoked by
> a "select" request from architecture specific code that provides
> the necessary "arch_*()" functions to make everything work.

I would like to try to rebase the RISC-V CBQRI resctrl RFC [1] on top of
this patch series instead of the mpam snapshot branch [2].

I had a patch in my RFC that added config option RISCV_ISA_SSQOSID which
selects ARCH_HAS_CPU_RESCTRL and RESCTRL_FS [3]. It seems I would need
to change that to select CONFIG_RESCTRL2_FS ?

A patch [4] in that RFC adds the "arch_*()" functions in
arch/riscv/kernel/qos/qos_resctrl.c

thanks,
drew

[0] https://github.com/riscv-non-isa/riscv-cbqri/blob/main/riscv-cbqri.pdf
[1] https://lore.kernel.org/linux-riscv/[email protected]/
[2] https://gitlab.arm.com/linux-arm/linux-jm/-/tree/mpam/snaphot/20230406
[3] https://lore.kernel.org/linux-riscv/[email protected]/
[4] https://lore.kernel.org/linux-riscv/[email protected]/



2023-06-27 17:35:21

by Tony Luck

[permalink] [raw]
Subject: RE: [RFC PATCH 0/2] Resctrl - rewrite (WIP)

>> By itself the core code is useless. Cannot even be built as the
>> controlling Kconfig option "CONFIG_RESCTRL2_FS" must be invoked by
>> a "select" request from architecture specific code that provides
>> the necessary "arch_*()" functions to make everything work.
>
> I would like to try to rebase the RISC-V CBQRI resctrl RFC [1] on top of
> this patch series instead of the mpam snapshot branch [2].

Thanks. That should help shake out any places where I've left in Intel-isms, or
my abstraction is insufficient to handle your architecture.

I've made some significant changes since I posted those patches. I pushed
the latest version to:

git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git resctrl2_v64

> I had a patch in my RFC that added config option RISCV_ISA_SSQOSID which
> selects ARCH_HAS_CPU_RESCTRL and RESCTRL_FS [3]. It seems I would need
> to change that to select CONFIG_RESCTRL2_FS ?

Yes. Just have your architecture CONFIG option select RESCTRL2_FS

> A patch [4] in that RFC adds the "arch_*()" functions in
> arch/riscv/kernel/qos/qos_resctrl.c

Yes. This is an area that may need some tweaking to get the prototypes
for the arch_*() functions right.

I put all the x86 architecture code under fs/resctrl2/arch/x86/ .... mostly
so I can do quick test builds of both the common code and architecture
code with "make fs/resctrl2/" ... maybe in the end-game they should be
under arch/x86 rather than adding arch specific subdirs under generic
top-level directories (though I see a smattering of "x86" directories in
several places.

-Tony

2023-06-28 10:06:39

by Peter Newman

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] Resctrl - rewrite (WIP)

Hi Tony,

On Tue, Jun 20, 2023 at 5:37 AM Tony Luck <[email protected]> wrote:
>
> Back in April I posted some RFC patches that added a "driver
> registration" interface to the core resctrl code so that additional
> resource control and monitor features could be added without further
> complicating the core code. Link to that discussion:
>
> https://lore.kernel.org/all/[email protected]/
>
> Reinette gave the feedback that it would be better to base the module
> registration on the resctrl resource structure. Reinette also pointed
> me to work from James Morse, and some additional discussion happened
> here:
>
> https://lore.kernel.org/all/ZG%2FMZVrWYrCHm%2Ffr@agluck-desk3/
>
> James provided details on where ARM's MPAM has similarities and
> differences from the Intel Resource Director Technology and AMD's
> similar implementation. Drew Fustini was also pulled into that
> conversation to comment on RISC-V CBQRI.
>
> From those discussions I believed we need a do-over on the core
> /sys/fs/resctrl implementation to make it friendlier for architecural
> variations. Here's what I have so far.
>
> =========================================================================
> | N.B. This is a general direction check. There are many obvious |
> | rough edges (e.g. some careful thought needs to happen on locking |
> | for the files in /sys/fs/resctrl that are "owned" by modules that |
> | can be unloaded). I'm mostly looking for feedback from AMD, ARM and |
> | RISCV on whether this is a foundation to build on, whether some small |
> | tweaks could make it better, or if this is still going to be really |
> | hard for architectures that have radical divergence from the Intel |
> | model. |
> =========================================================================

Thanks for working on this! I played with these changes locally on
some of our machines and they seemed reasonably functional so far and
was happy to see dynamically adding and removing resources working.

I will need to try working with the code to give it a serious
evaluation, though. Would you consider it ready for me to try
re-implementing soft RMIDs on it?

I'm also very interested in James's opinion and what this means for
the ongoing MPAM upstreaming.

Thanks!
-Peter

2023-06-28 16:32:20

by Tony Luck

[permalink] [raw]
Subject: RE: [RFC PATCH 0/2] Resctrl - rewrite (WIP)

> Thanks for working on this! I played with these changes locally on
> some of our machines and they seemed reasonably functional so far and
> was happy to see dynamically adding and removing resources working.

Thanks for taking it for a spin on some additional systems.

> I will need to try working with the code to give it a serious
> evaluation, though. Would you consider it ready for me to try
> re-implementing soft RMIDs on it?

Current code is:
1) Lacking locking for access to files created on behalf of dynamic
loaded modules. So, I expect the system to crash if you unload a
module while simultaneously accessing one of those files.
2) Lacking error checking and cleanup code paths to undo
partial operations when things fail to allocate.
3) The unmount (sb_kill()) code has fallen far behind development
of other features. So expect memory leaks if you unmount and
remount.

It should be OK to play around with this version, but things will go
wrong when the system is under stress. Do not use in production!!

All of the RMID allocation and understanding of hierarchy between
control and monitor groups is localized in fs/resctrl2/arch/x86/rdt_monitor.c
I think I'm mostly done with the functionality I need in that file, so
work you do there for soft RMIDs is unlikely to need refactoring for
other changes that I'm making.

> I'm also very interested in James's opinion and what this means for
> the ongoing MPAM upstreaming.

Me too. I'm hopeful that my code can be a better base than the legacy
resctrl code. But it needs an MPAM expert to really assess that.

-Tony

2023-06-30 00:23:29

by Tony Luck

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] Resctrl - rewrite (WIP)

On Tue, Jun 27, 2023 at 04:33:52PM +0000, Luck, Tony wrote:
> I've made some significant changes since I posted those patches. I pushed
> the latest version to:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git resctrl2_v64

I just pushed one big commit with all the bits I've updated so far
this week. Fixes some serious issues as well as general cleanup.

HEAD is now:

afb7cdd4d640 resctrl2: Many cleanups, fixes, and new functionality

If you've started writing your own architecture specific modules there
are some small interface changes. Most should be found by the compiler
barfing, but the new ".reset()" resource function called during unmount
of /sys/fs/resctrl might be less obvious.

-Tony

2023-07-26 02:52:54

by Drew Fustini

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] Resctrl - rewrite (WIP)

On Thu, Jun 29, 2023 at 05:06:45PM -0700, Tony Luck wrote:
> On Tue, Jun 27, 2023 at 04:33:52PM +0000, Luck, Tony wrote:
> > I've made some significant changes since I posted those patches. I pushed
> > the latest version to:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git resctrl2_v64
>
> I just pushed one big commit with all the bits I've updated so far
> this week. Fixes some serious issues as well as general cleanup.
>
> HEAD is now:
>
> afb7cdd4d640 resctrl2: Many cleanups, fixes, and new functionality
>
> If you've started writing your own architecture specific modules there
> are some small interface changes. Most should be found by the compiler
> barfing, but the new ".reset()" resource function called during unmount
> of /sys/fs/resctrl might be less obvious.
>
> -Tony

I have access to a Xeon Silver 4310 machine which reports to have
cat_l3, cqm_mbm_local, cqm_mbm_total and mba.

I would like to test resctrl2 on it so I can better understand how it
works. I think that will help me understand how to adapt the RISC-V
CBQRI resctrl proof-of-concept to use resctrl2.

Would you be able to provide an example of how you loaded the necessary
resctrl2 kernel modules?

Also, is resctrl2_v65rc1 the latest to branch to test?

Thank you,
Drew

2023-07-26 14:32:08

by Tony Luck

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] Resctrl - rewrite (WIP)

On Tue, Jul 25, 2023 at 07:27:25PM -0700, Drew Fustini wrote:
> I have access to a Xeon Silver 4310 machine which reports to have
> cat_l3, cqm_mbm_local, cqm_mbm_total and mba.
>
> I would like to test resctrl2 on it so I can better understand how it
> works. I think that will help me understand how to adapt the RISC-V
> CBQRI resctrl proof-of-concept to use resctrl2.
>
> Would you be able to provide an example of how you loaded the necessary
> resctrl2 kernel modules?

Drew,

Sure. You simply mount the filesystem, and then load modules for
whichever features you'd like to use. This will enable everything
you list above:

# mount -t resctrl resctrl /sys/fs/resctrl
# modprobe rdt_l3_cat
# modprobe rdt_llc_occupancy
# modprobe rdt_mbm_local_bytes
# modprobe rdt_mbm_total_bytes
# modprobe rdt_l3_mba

There are some experimental extras. E.g.

# modprobe rdt_mbm_total_rate
# modprobe rdt_mbm_local_rate

Will each add an extra file to the mon_data directories to
report the data rate in MB/s. The value reported is calculated
by the once-per-second counter roll-over code in the kernel.
So it might be up to one second out of date, but it is very cheap
to read since it doesn't involve MSR access (or cross processor
interrupts if you are reading from a CPU in a different scope).

You can unload modules without unmounting the filesystem and
load different ones to get different data/control. E.g. to
switch from L3CAT to L3CDP (which you don't list as supported,
so this may not work for you:

# rmmod rdt_l3_cat
# modprobe rdt_l3_cdp

Or to switch from the default MBA that uses percentages to
specify throttling to the MBM->MBA feedback code that uses
MB/s in the schemata file:

# rmmod rdt_l3_mba
# modprobe rdt_l3_mba_MBps
>
> Also, is resctrl2_v65rc1 the latest to branch to test?

Yes. That's the latest. There haven't been any updates for a
few days because I'm working on a module to support pseudo-locking.
I'm half-way there (can do most of the bits to set a group into
pseudo-locked mode ... about to work on the cleanup when the
group is removed, the filesystem unmounted, or the module unloaded).

-Tony

2023-08-01 02:02:57

by Tony Luck

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] Resctrl - rewrite (WIP)

On Wed, Jul 26, 2023 at 06:52:48AM -0700, Tony Luck wrote:
> On Tue, Jul 25, 2023 at 07:27:25PM -0700, Drew Fustini wrote:
> >
> > Also, is resctrl2_v65rc1 the latest to branch to test?
>
> Yes. That's the latest. There haven't been any updates for a
> few days because I'm working on a module to support pseudo-locking.
> I'm half-way there (can do most of the bits to set a group into
> pseudo-locked mode ... about to work on the cleanup when the
> group is removed, the filesystem unmounted, or the module unloaded).

Updated version available at:

git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git

Branch: resctrl2_v65rc4

Some minor fixes to core code, also changes to support pseudo-locking
(core code for the "mode" file plus some new functions in the resource
structure to call into modules to support this).

-Tony