2023-03-30 14:10:35

by Peter Newman

[permalink] [raw]
Subject: [PATCH v5 0/3] Subject: x86/resctrl: Implement rename to help move containers' tasks

Hi Reinette, Fenghua,

This patch series implements the solution Reinette suggested in the
earlier RFD thread[1] for the problem of moving a container's tasks to a
different control group on systems that don't provide enough CLOSIDs to
give every container its own control group.

This patch series assumes that a MON group's CLOSID can simply be
changed to that of a new parent CTRL_MON group. This is allowed on Intel
and AMD, but not MPAM implementations. While we (Google) only foresee
needing this functionality on Intel and AMD systems, this series should
hopefully be a good starting point for supporting MPAM.

Thanks!
-Peter

Updates:

v5:
- rebase to v6.3-rc4
- dropped rdt_move_group_tasks() task filter patch
- code/comment clarifications and errno updates requested by Reinette
- added Documentation patch

v4:
- rebase to v6.2
- commit message updates suggested by Reinette
- replace rdt_move_one_task() patch with rdt_move_group_tasks() filter
function patch
- prevent rename on files or renaming to "mon_groups"
- optimize simple rename case
- disallow renaming groups with non-empty cpumask
- ensure source is a proper MON group directory
- fix missing rdtgrp->closid update
- add more last_command_status output

v3: use revised task CLOSID/RMID update IPI sync method from [3]
v2: reworded change logs based on what I've learned from review comments
in another patch series[2]

[v1] https://lore.kernel.org/lkml/[email protected]/
[v2] https://lore.kernel.org/lkml/[email protected]/
[v3] https://lore.kernel.org/lkml/[email protected]/
[v4] https://lore.kernel.org/lkml/[email protected]/

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/
[3] https://lore.kernel.org/lkml/[email protected]/

Peter Newman (3):
x86/resctrl: Factor rdtgroup lock for multi-file ops
x86/resctrl: Implement rename op for mon groups
Documentation/x86: Documentation for MON group move feature

Documentation/x86/resctrl.rst | 7 ++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 163 +++++++++++++++++++++++--
2 files changed, 157 insertions(+), 13 deletions(-)


base-commit: 197b6b60ae7bc51dd0814953c562833143b292aa
--
2.40.0.348.gf938b09366-goog


2023-03-30 14:11:40

by Peter Newman

[permalink] [raw]
Subject: [PATCH v5 3/3] Documentation/x86: Documentation for MON group move feature

Describe new support for moving MON groups to a new parent CTRL_MON
group and its restrictions.

Signed-off-by: Peter Newman <[email protected]>
---
Documentation/x86/resctrl.rst | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/Documentation/x86/resctrl.rst b/Documentation/x86/resctrl.rst
index 387ccbcb558f..cb05d90111b4 100644
--- a/Documentation/x86/resctrl.rst
+++ b/Documentation/x86/resctrl.rst
@@ -287,6 +287,13 @@ Removing a directory will move all tasks and cpus owned by the group it
represents to the parent. Removing one of the created CTRL_MON groups
will automatically remove all MON groups below it.

+Moving MON group directories to a new parent CTRL_MON group is supported
+for the purpose of changing the resource allocations of a MON group
+without impacting its monitoring data or assigned tasks. This operation
+is not allowed for MON groups which monitor CPUs. No other move
+operation is currently allowed other than simply renaming a CTRL_MON or
+MON group.
+
All groups contain the following files:

"tasks":
--
2.40.0.348.gf938b09366-goog

2023-03-30 14:13:17

by Peter Newman

[permalink] [raw]
Subject: [PATCH v5 1/3] x86/resctrl: Factor rdtgroup lock for multi-file ops

rdtgroup_kn_lock_live() can only release a kernfs reference for a single
file before waiting on the rdtgroup_mutex, limiting its usefulness for
operations on multiple files, such as rename.

Factor the work needed to respectively break and unbreak active
protection on an individual file into rdtgroup_kn_{get,put}().

No functional change.

Signed-off-by: Peter Newman <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 35 ++++++++++++++++----------
1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 6ad33f355861..51b869149e76 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2301,6 +2301,26 @@ static struct rdtgroup *kernfs_to_rdtgroup(struct kernfs_node *kn)
}
}

+static void rdtgroup_kn_get(struct rdtgroup *rdtgrp, struct kernfs_node *kn)
+{
+ atomic_inc(&rdtgrp->waitcount);
+ kernfs_break_active_protection(kn);
+}
+
+static void rdtgroup_kn_put(struct rdtgroup *rdtgrp, struct kernfs_node *kn)
+{
+ if (atomic_dec_and_test(&rdtgrp->waitcount) &&
+ (rdtgrp->flags & RDT_DELETED)) {
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
+ rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
+ rdtgroup_pseudo_lock_remove(rdtgrp);
+ kernfs_unbreak_active_protection(kn);
+ rdtgroup_remove(rdtgrp);
+ } else {
+ kernfs_unbreak_active_protection(kn);
+ }
+}
+
struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn)
{
struct rdtgroup *rdtgrp = kernfs_to_rdtgroup(kn);
@@ -2308,8 +2328,7 @@ struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn)
if (!rdtgrp)
return NULL;

- atomic_inc(&rdtgrp->waitcount);
- kernfs_break_active_protection(kn);
+ rdtgroup_kn_get(rdtgrp, kn);

mutex_lock(&rdtgroup_mutex);

@@ -2328,17 +2347,7 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn)
return;

mutex_unlock(&rdtgroup_mutex);
-
- if (atomic_dec_and_test(&rdtgrp->waitcount) &&
- (rdtgrp->flags & RDT_DELETED)) {
- if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
- rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
- rdtgroup_pseudo_lock_remove(rdtgrp);
- kernfs_unbreak_active_protection(kn);
- rdtgroup_remove(rdtgrp);
- } else {
- kernfs_unbreak_active_protection(kn);
- }
+ rdtgroup_kn_put(rdtgrp, kn);
}

static int mkdir_mondata_all(struct kernfs_node *parent_kn,
--
2.40.0.348.gf938b09366-goog

2023-03-30 17:35:16

by Moger, Babu

[permalink] [raw]
Subject: Re: [PATCH v5 3/3] Documentation/x86: Documentation for MON group move feature

Hi Peter,

On 3/30/23 08:55, Peter Newman wrote:
> Describe new support for moving MON groups to a new parent CTRL_MON
> group and its restrictions.

Sorry for coming in late here. I am planning to test these patches. It
would be helpful to give a simple example to test this feature.
Thanks
Babu

>
> Signed-off-by: Peter Newman <[email protected]>
> ---
> Documentation/x86/resctrl.rst | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/Documentation/x86/resctrl.rst b/Documentation/x86/resctrl.rst
> index 387ccbcb558f..cb05d90111b4 100644
> --- a/Documentation/x86/resctrl.rst
> +++ b/Documentation/x86/resctrl.rst
> @@ -287,6 +287,13 @@ Removing a directory will move all tasks and cpus owned by the group it
> represents to the parent. Removing one of the created CTRL_MON groups
> will automatically remove all MON groups below it.
>
> +Moving MON group directories to a new parent CTRL_MON group is supported
> +for the purpose of changing the resource allocations of a MON group
> +without impacting its monitoring data or assigned tasks. This operation
> +is not allowed for MON groups which monitor CPUs. No other move
> +operation is currently allowed other than simply renaming a CTRL_MON or
> +MON group.
> +
> All groups contain the following files:
>
> "tasks":

2023-03-31 08:15:22

by Peter Newman

[permalink] [raw]
Subject: Re: [PATCH v5 3/3] Documentation/x86: Documentation for MON group move feature

Hi Babu,

On Thu, Mar 30, 2023 at 7:33 PM Moger, Babu <[email protected]> wrote:
> On 3/30/23 08:55, Peter Newman wrote:
> > Describe new support for moving MON groups to a new parent CTRL_MON
> > group and its restrictions.
>
> Sorry for coming in late here. I am planning to test these patches. It
> would be helpful to give a simple example to test this feature.

Do you mean inline in the documentation?

For now, you can also try the patch below. These are the testcases I
used.

I'm planning to convert many of our internal, shell script-based test
cases into kernel selftests so I can try to upstream them.

Thanks!
-Peter


---8<-------
From f6d215e90db3c416bd39889b9fa1143d798245e0 Mon Sep 17 00:00:00 2001
From: Peter Newman <[email protected]>
Date: Tue, 7 Mar 2023 11:57:03 +0100
Subject: [PATCH] selftests/resctrl: Test for MON group renaming

Signed-off-by: Peter Newman <[email protected]>
---
tools/testing/selftests/resctrl/Makefile | 2 +
.../selftests/resctrl/test_mongrp_move.sh | 120 ++++++++++++++++++
2 files changed, 122 insertions(+)
create mode 100755 tools/testing/selftests/resctrl/test_mongrp_move.sh

diff --git a/tools/testing/selftests/resctrl/Makefile b/tools/testing/selftests/resctrl/Makefile
index 73d53257df42..0b696e7cf19b 100644
--- a/tools/testing/selftests/resctrl/Makefile
+++ b/tools/testing/selftests/resctrl/Makefile
@@ -5,6 +5,8 @@ CFLAGS += $(KHDR_INCLUDES)

TEST_GEN_PROGS := resctrl_tests

+TEST_PROGS := test_mongrp_move.sh
+
include ../lib.mk

$(OUTPUT)/resctrl_tests: $(wildcard *.c)
diff --git a/tools/testing/selftests/resctrl/test_mongrp_move.sh b/tools/testing/selftests/resctrl/test_mongrp_move.sh
new file mode 100755
index 000000000000..6d9bfc4e0c8d
--- /dev/null
+++ b/tools/testing/selftests/resctrl/test_mongrp_move.sh
@@ -0,0 +1,120 @@
+#!/bin/sh
+
+set -e
+
+rc=0
+
+cleanup()
+{
+ rmdir _test_*
+ rmdir mon_groups/_test_*
+}
+
+skip_all()
+{
+ echo Bail out! $1
+
+ cleanup
+
+ # SKIP code is 4.
+ exit 4
+}
+
+expect_success()
+{
+ if [ "$1" -eq 0 ]; then
+ echo ok $2
+ else
+ echo not ok $2
+ rc=1
+ fi
+}
+
+expect_fail()
+{
+ if [ "$1" -eq 0 ]; then
+ echo not ok $2
+ rc=1
+ else
+ echo ok $2
+ fi
+}
+
+if [ "$(id -u)" != 0 ]; then
+ skip_all "must be run as root"
+fi
+
+if [ ! -d /sys/fs/resctrl/info ]; then
+ mount -t resctrl resctrl /sys/fs/resctrl || skip_all "no resctrlfs"
+fi
+
+cd /sys/fs/resctrl
+
+if [ ! -f info/L3_MON/mon_features ]; then
+ skip_all "no monitoring support"
+fi
+
+if [ ! -f schemata ]; then
+ skip_all "no allocation support"
+fi
+
+echo "1..11"
+
+if [ -d _test_c1 ] || [ -d _test_c2 ] || [ -d mon_groups/_test_m1 ]; then
+ skip_all "test directories already exist"
+fi
+
+mkdir _test_c1
+mkdir _test_c2
+mkdir _test_c3
+
+mkdir mon_groups/_test_m1
+echo 1 > mon_groups/_test_m1/cpus
+
+mkdir _test_c1/mon_groups/_test_m1
+echo $$ > _test_c1/tasks
+echo $$ > _test_c1/mon_groups/_test_m1/tasks
+
+if mv _test_c1/mon_groups/_test_m1 _test_c2/mon_groups; then
+ echo "ok 1 # MON group move to new parent succeeded"
+else
+ echo "1..0 # skip because MON group move to new parent not supported"
+ cleanup
+ exit 4
+fi
+
+set +e
+
+grep -q $$ _test_c2/tasks
+expect_success $? "2 # PID in new CTRL_MON group"
+
+grep -q $$ _test_c2/mon_groups/_test_m1/tasks
+expect_success $? "3 # PID remains in MON group after move"
+
+grep -q $$ _test_c1/tasks
+expect_fail $? "4 # PID no longer in previous CTRL_MON group"
+
+mv _test_c2/mon_groups/_test_m1/cpus mon_groups
+expect_fail $? "5 # moving files not allowed"
+
+mv _test_c2/mon_groups/_test_m1 _test_c2/mon_groups/_test_m2
+expect_success $? "6 # simple MON directory rename"
+
+mv _test_c2/mon_groups/_test_m2 info
+expect_fail $? "7 # move to info not allowed"
+
+mv _test_c2/mon_groups/_test_m2 _test_c2/mon_groups/mon_groups
+expect_fail $? "8 # rename to mon_groups not allowed"
+
+mv mon_groups/_test_m1 _test_c1/mon_groups
+expect_fail $? "9 # cannot move groups monitoring CPUs"
+
+mv mon_groups/_test_m1 mon_groups/_test_m2
+expect_success $? "10 # groups monitoring CPUs can be renamed"
+
+mv mon_groups/_test_m2/mon_data _test_c1/mon_groups
+expect_fail $? "11 # cannot move subdirectories of a mon_group"
+
+cleanup
+
+exit $rc
--
2.40.0.423.gd6c402a77b-goog

2023-03-31 21:07:36

by Moger, Babu

[permalink] [raw]
Subject: RE: [PATCH v5 3/3] Documentation/x86: Documentation for MON group move feature

[AMD Official Use Only - General]

Hi Peter,

> -----Original Message-----
> From: Peter Newman <[email protected]>
> Sent: Friday, March 31, 2023 3:12 AM
> To: Moger, Babu <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH v5 3/3] Documentation/x86: Documentation for MON
> group move feature
>
> Hi Babu,
>
> On Thu, Mar 30, 2023 at 7:33 PM Moger, Babu <[email protected]>
> wrote:
> > On 3/30/23 08:55, Peter Newman wrote:
> > > Describe new support for moving MON groups to a new parent CTRL_MON
> > > group and its restrictions.
> >
> > Sorry for coming in late here. I am planning to test these patches. It
> > would be helpful to give a simple example to test this feature.
>
> Do you mean inline in the documentation?

Yes. It would be helpful. I used these simple steps from your script.
# mount -t resctrl resctrl /sys/fs/resctrl/
# cd /sys/fs/resctrl/
# mkdir _test_c1
# mkdir _test_c2
l# mkdir mon_groups/_test_m1
# echo 1 > mon_groups/_test_m1/cpus
# mkdir _test_c1/mon_groups/_test_c1_m1
l# echo $$ > _test_c1/tasks
l# echo $$ > _test_c1/mon_groups/_test_c1_m1/tasks
l# mv _test_c1/mon_groups/_test_c1_m1/ _test_c2/mon_groups/
# cat info/last_cmd_status
ok
l# mv mon_groups/_test_m1 _test_c1/mon_groups/
mv: cannot move 'mon_groups/_test_m1' to '_test_c1/mon_groups/_test_m1': Operation not permitted
# cat info/last_cmd_status
Cannot move a MON group that monitors CPUs

>
> For now, you can also try the patch below. These are the testcases I used.
>
> I'm planning to convert many of our internal, shell script-based test cases into
> kernel selftests so I can try to upstream them.

Yes. That will be great.
Thanks
Babu


>
> Thanks!
> -Peter
>
>
> ---8<-------
> From f6d215e90db3c416bd39889b9fa1143d798245e0 Mon Sep 17 00:00:00
> 2001
> From: Peter Newman <[email protected]>
> Date: Tue, 7 Mar 2023 11:57:03 +0100
> Subject: [PATCH] selftests/resctrl: Test for MON group renaming
>
> Signed-off-by: Peter Newman <[email protected]>
> ---
> tools/testing/selftests/resctrl/Makefile | 2 +
> .../selftests/resctrl/test_mongrp_move.sh | 120 ++++++++++++++++++
> 2 files changed, 122 insertions(+)
> create mode 100755 tools/testing/selftests/resctrl/test_mongrp_move.sh
>
> diff --git a/tools/testing/selftests/resctrl/Makefile
> b/tools/testing/selftests/resctrl/Makefile
> index 73d53257df42..0b696e7cf19b 100644
> --- a/tools/testing/selftests/resctrl/Makefile
> +++ b/tools/testing/selftests/resctrl/Makefile
> @@ -5,6 +5,8 @@ CFLAGS += $(KHDR_INCLUDES)
>
> TEST_GEN_PROGS := resctrl_tests
>
> +TEST_PROGS := test_mongrp_move.sh
> +
> include ../lib.mk
>
> $(OUTPUT)/resctrl_tests: $(wildcard *.c) diff --git
> a/tools/testing/selftests/resctrl/test_mongrp_move.sh
> b/tools/testing/selftests/resctrl/test_mongrp_move.sh
> new file mode 100755
> index 000000000000..6d9bfc4e0c8d
> --- /dev/null
> +++ b/tools/testing/selftests/resctrl/test_mongrp_move.sh
> @@ -0,0 +1,120 @@
> +#!/bin/sh
> +
> +set -e
> +
> +rc=0
> +
> +cleanup()
> +{
> + rmdir _test_*
> + rmdir mon_groups/_test_*
> +}
> +
> +skip_all()
> +{
> + echo Bail out! $1
> +
> + cleanup
> +
> + # SKIP code is 4.
> + exit 4
> +}
> +
> +expect_success()
> +{
> + if [ "$1" -eq 0 ]; then
> + echo ok $2
> + else
> + echo not ok $2
> + rc=1
> + fi
> +}
> +
> +expect_fail()
> +{
> + if [ "$1" -eq 0 ]; then
> + echo not ok $2
> + rc=1
> + else
> + echo ok $2
> + fi
> +}
> +
> +if [ "$(id -u)" != 0 ]; then
> + skip_all "must be run as root"
> +fi
> +
> +if [ ! -d /sys/fs/resctrl/info ]; then
> + mount -t resctrl resctrl /sys/fs/resctrl || skip_all "no resctrlfs"
> +fi
> +
> +cd /sys/fs/resctrl
> +
> +if [ ! -f info/L3_MON/mon_features ]; then
> + skip_all "no monitoring support"
> +fi
> +
> +if [ ! -f schemata ]; then
> + skip_all "no allocation support"
> +fi
> +
> +echo "1..11"
> +
> +if [ -d _test_c1 ] || [ -d _test_c2 ] || [ -d mon_groups/_test_m1 ]; then
> + skip_all "test directories already exist"
> +fi
> +
> +mkdir _test_c1
> +mkdir _test_c2
> +mkdir _test_c3
> +
> +mkdir mon_groups/_test_m1
> +echo 1 > mon_groups/_test_m1/cpus
> +
> +mkdir _test_c1/mon_groups/_test_m1
> +echo $$ > _test_c1/tasks
> +echo $$ > _test_c1/mon_groups/_test_m1/tasks
> +
> +if mv _test_c1/mon_groups/_test_m1 _test_c2/mon_groups; then
> + echo "ok 1 # MON group move to new parent succeeded"
> +else
> + echo "1..0 # skip because MON group move to new parent not
> supported"
> + cleanup
> + exit 4
> +fi
> +
> +set +e
> +
> +grep -q $$ _test_c2/tasks
> +expect_success $? "2 # PID in new CTRL_MON group"
> +
> +grep -q $$ _test_c2/mon_groups/_test_m1/tasks
> +expect_success $? "3 # PID remains in MON group after move"
> +
> +grep -q $$ _test_c1/tasks
> +expect_fail $? "4 # PID no longer in previous CTRL_MON group"
> +
> +mv _test_c2/mon_groups/_test_m1/cpus mon_groups expect_fail $? "5 #
> +moving files not allowed"
> +
> +mv _test_c2/mon_groups/_test_m1 _test_c2/mon_groups/_test_m2
> +expect_success $? "6 # simple MON directory rename"
> +
> +mv _test_c2/mon_groups/_test_m2 info
> +expect_fail $? "7 # move to info not allowed"
> +
> +mv _test_c2/mon_groups/_test_m2 _test_c2/mon_groups/mon_groups
> +expect_fail $? "8 # rename to mon_groups not allowed"
> +
> +mv mon_groups/_test_m1 _test_c1/mon_groups expect_fail $? "9 # cannot
> +move groups monitoring CPUs"
> +
> +mv mon_groups/_test_m1 mon_groups/_test_m2 expect_success $? "10 #
> +groups monitoring CPUs can be renamed"
> +
> +mv mon_groups/_test_m2/mon_data _test_c1/mon_groups expect_fail $? "11
> +# cannot move subdirectories of a mon_group"
> +
> +cleanup
> +
> +exit $rc
> --
> 2.40.0.423.gd6c402a77b-goog

2023-04-18 22:15:05

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v5 3/3] Documentation/x86: Documentation for MON group move feature

Hi Peter,

On 3/30/2023 6:55 AM, Peter Newman wrote:
> Describe new support for moving MON groups to a new parent CTRL_MON
> group and its restrictions.
>
> Signed-off-by: Peter Newman <[email protected]>
> ---
> Documentation/x86/resctrl.rst | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/Documentation/x86/resctrl.rst b/Documentation/x86/resctrl.rst
> index 387ccbcb558f..cb05d90111b4 100644
> --- a/Documentation/x86/resctrl.rst
> +++ b/Documentation/x86/resctrl.rst
> @@ -287,6 +287,13 @@ Removing a directory will move all tasks and cpus owned by the group it
> represents to the parent. Removing one of the created CTRL_MON groups
> will automatically remove all MON groups below it.
>
> +Moving MON group directories to a new parent CTRL_MON group is supported
> +for the purpose of changing the resource allocations of a MON group
> +without impacting its monitoring data or assigned tasks. This operation
> +is not allowed for MON groups which monitor CPUs. No other move
> +operation is currently allowed other than simply renaming a CTRL_MON or
> +MON group.
> +
> All groups contain the following files:
>
> "tasks":

Thank you.

Reviewed-by: Reinette Chatre <[email protected]>

Reinette