Hi Reinette, Fenghua,
This patch series implements the solution Reinette suggested in the
earlier RFD thread[1] for the problem of moving a container's tasks to a
different control group on systems that don't provide enough CLOSIDs to
give every container its own control group.
This patch series assumes that a MON group's CLOSID can simply be
changed to that of a new parent CTRL_MON group. This is allowed on Intel
and AMD, but not MPAM implementations. While we (Google) only foresee
needing this functionality on Intel and AMD systems, this series should
hopefully be a good starting point for supporting MPAM.
Thanks!
-Peter
Updates:
v4:
- rebase to v6.2
- commit message updates suggested by Reinette
- replace rdt_move_one_task() patch with rdt_move_group_tasks() filter
function patch
- prevent rename on files or renaming to "mon_groups"
- optimize simple rename case
- disallow renaming groups with non-empty cpumask
- ensure source is a proper MON group directory
- fix missing rdtgrp->closid update
- add more last_command_status output
v3: use revised task CLOSID/RMID update IPI sync method from [3]
v2: reworded change logs based on what I've learned from review comments
in another patch series[2]
[v1] https://lore.kernel.org/lkml/[email protected]/
[v2] https://lore.kernel.org/lkml/[email protected]/
[v3] https://lore.kernel.org/lkml/[email protected]/
[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/
[3] https://lore.kernel.org/lkml/[email protected]/
Peter Newman (3):
x86/resctrl: Factor rdtgroup lock for multi-file ops
x86/resctrl: Parameterize rdt_move_group_tasks() task matching
x86/resctrl: Implement rename op for mon groups
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 196 ++++++++++++++++++++++---
1 file changed, 174 insertions(+), 22 deletions(-)
base-commit: c9c3395d5e3dcc6daee66c6908354d47bf98cb0c
--
2.40.0.rc0.216.gc4246ad0f0-goog
rdtgroup_kn_lock_live() can only release a kernfs reference for a single
file before waiting on the rdtgroup_mutex, limiting its usefulness for
operations on multiple files, such as rename.
Factor the work needed to respectively break and unbreak active
protection on an individual file into rdtgroup_kn_{get,put}().
No functional change.
Signed-off-by: Peter Newman <[email protected]>
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 35 ++++++++++++++++----------
1 file changed, 22 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 5993da21d822..c3fb525d52e9 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2028,6 +2028,26 @@ static struct rdtgroup *kernfs_to_rdtgroup(struct kernfs_node *kn)
}
}
+static void rdtgroup_kn_get(struct rdtgroup *rdtgrp, struct kernfs_node *kn)
+{
+ atomic_inc(&rdtgrp->waitcount);
+ kernfs_break_active_protection(kn);
+}
+
+static void rdtgroup_kn_put(struct rdtgroup *rdtgrp, struct kernfs_node *kn)
+{
+ if (atomic_dec_and_test(&rdtgrp->waitcount) &&
+ (rdtgrp->flags & RDT_DELETED)) {
+ if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
+ rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
+ rdtgroup_pseudo_lock_remove(rdtgrp);
+ kernfs_unbreak_active_protection(kn);
+ rdtgroup_remove(rdtgrp);
+ } else {
+ kernfs_unbreak_active_protection(kn);
+ }
+}
+
struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn)
{
struct rdtgroup *rdtgrp = kernfs_to_rdtgroup(kn);
@@ -2035,8 +2055,7 @@ struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn)
if (!rdtgrp)
return NULL;
- atomic_inc(&rdtgrp->waitcount);
- kernfs_break_active_protection(kn);
+ rdtgroup_kn_get(rdtgrp, kn);
mutex_lock(&rdtgroup_mutex);
@@ -2055,17 +2074,7 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn)
return;
mutex_unlock(&rdtgroup_mutex);
-
- if (atomic_dec_and_test(&rdtgrp->waitcount) &&
- (rdtgrp->flags & RDT_DELETED)) {
- if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
- rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
- rdtgroup_pseudo_lock_remove(rdtgrp);
- kernfs_unbreak_active_protection(kn);
- rdtgroup_remove(rdtgrp);
- } else {
- kernfs_unbreak_active_protection(kn);
- }
+ rdtgroup_kn_put(rdtgrp, kn);
}
static int mkdir_mondata_all(struct kernfs_node *parent_kn,
--
2.40.0.rc0.216.gc4246ad0f0-goog
Allow rdt_move_group_tasks() to be used for new group-scope operations.
This function is currently only used to implement rmdir on a group or
unmounting resctrlfs.
Callers now provide a filtering function to indicate which tasks should
be moved.
No functional change.
Signed-off-by: Peter Newman <[email protected]>
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 34 +++++++++++++++++++-------
1 file changed, 25 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index c3fb525d52e9..84af23a29612 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2393,22 +2393,29 @@ static int reset_all_ctrls(struct rdt_resource *r)
}
/*
- * Move tasks from one to the other group. If @from is NULL, then all tasks
- * in the systems are moved unconditionally (used for teardown).
+ * Move tasks from one to the other group.
+ *
+ * @from: passed unmodified to task_match_fn() for each task
+ * @to: group providing new config values for matching tasks
+ * @task_match_fn: callback returning true when a task requires update
+ * @mask: output-parameter indicating set of CPUs impacted by this
+ * operation
*
* If @mask is not NULL the cpus on which moved tasks are running are set
* in that mask so the update smp function call is restricted to affected
* cpus.
*/
-static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *to,
- struct cpumask *mask)
+static void rdt_move_group_tasks(struct rdtgroup *from,
+ struct rdtgroup *to,
+ struct cpumask *mask,
+ bool task_match_fn(struct task_struct *,
+ struct rdtgroup *))
{
struct task_struct *p, *t;
read_lock(&tasklist_lock);
for_each_process_thread(p, t) {
- if (!from || is_closid_match(t, from) ||
- is_rmid_match(t, from)) {
+ if (task_match_fn(t, from)) {
WRITE_ONCE(t->closid, to->closid);
WRITE_ONCE(t->rmid, to->mon.rmid);
@@ -2451,6 +2458,15 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
}
}
+/*
+ * If @from is NULL, then all tasks in the systems are moved unconditionally
+ * (used for teardown).
+ */
+static bool rmdir_match(struct task_struct *t, struct rdtgroup *from)
+{
+ return !from || is_closid_match(t, from) || is_rmid_match(t, from);
+}
+
/*
* Forcibly remove all of subdirectories under root.
*/
@@ -2459,7 +2475,7 @@ static void rmdir_all_sub(void)
struct rdtgroup *rdtgrp, *tmp;
/* Move all tasks to the default resource group */
- rdt_move_group_tasks(NULL, &rdtgroup_default, NULL);
+ rdt_move_group_tasks(NULL, &rdtgroup_default, NULL, rmdir_match);
list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) {
/* Free any child rmids */
@@ -3124,7 +3140,7 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
int cpu;
/* Give any tasks back to the parent group */
- rdt_move_group_tasks(rdtgrp, prdtgrp, tmpmask);
+ rdt_move_group_tasks(rdtgrp, prdtgrp, tmpmask, rmdir_match);
/* Update per cpu rmid of the moved CPUs first */
for_each_cpu(cpu, &rdtgrp->cpu_mask)
@@ -3164,7 +3180,7 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
int cpu;
/* Give any tasks back to the default group */
- rdt_move_group_tasks(rdtgrp, &rdtgroup_default, tmpmask);
+ rdt_move_group_tasks(rdtgrp, &rdtgroup_default, tmpmask, rmdir_match);
/* Give any CPUs back to the default group */
cpumask_or(&rdtgroup_default.cpu_mask,
--
2.40.0.rc0.216.gc4246ad0f0-goog
To change the class of service for a large group of tasks, such as an
application container, a container manager must write all of the tasks'
IDs into the tasks file interface of the new control group.
If a container manager is tracking containers' bandwidth usage by
placing tasks from each into their own monitoring group, it must first
move the tasks to the default monitoring group of the new control group
before it can move the tasks into their new monitoring groups. This is
undesirable because it makes bandwidth usage during the move
unattributable to the correct tasks and resets monitoring event counters
and cache usage information for the group.
To address this, implement the rename operation for resctrlfs mon groups
to effect a change in CLOSID for a MON group while otherwise leaving the
monitoring group intact.
Signed-off-by: Peter Newman <[email protected]>
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 127 +++++++++++++++++++++++++
1 file changed, 127 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 84af23a29612..6d576013fc16 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3256,6 +3256,132 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
return ret;
}
+static bool mongrp_reparent_match(struct task_struct *t, struct rdtgroup *from)
+{
+ WARN_ON(from->type != RDTMON_GROUP);
+
+ /* RMID match implies CLOSID match. */
+ return is_rmid_match(t, from);
+}
+
+/**
+ * mongrp_reparent() - replace parent CTRL_MON group of a MON group
+ * @rdtgrp: the MON group whose parent should be replaced
+ * @new_prdtgrp: replacement parent CTRL_MON group for @rdtgrp
+ * @cpus: cpumask provided by the caller for use during this call
+ *
+ * Replaces the parent CTRL_MON group for a MON group, resulting in all member
+ * tasks' CLOSID immediately changing to that of the new parent group.
+ * Monitoring data for the group is unaffected by this operation.
+ */
+static void mongrp_reparent(struct rdtgroup *rdtgrp,
+ struct rdtgroup *new_prdtgrp,
+ cpumask_var_t cpus)
+{
+ struct rdtgroup *prdtgrp = rdtgrp->mon.parent;
+
+ WARN_ON(rdtgrp->type != RDTMON_GROUP);
+ WARN_ON(new_prdtgrp->type != RDTCTRL_GROUP);
+
+ /* Nothing to do when simply renaming a MON group. */
+ if (prdtgrp == new_prdtgrp)
+ return;
+
+ WARN_ON(list_empty(&prdtgrp->mon.crdtgrp_list));
+ list_del(&rdtgrp->mon.crdtgrp_list);
+
+ list_add_tail(&rdtgrp->mon.crdtgrp_list,
+ &new_prdtgrp->mon.crdtgrp_list);
+ rdtgrp->mon.parent = new_prdtgrp;
+ rdtgrp->closid = new_prdtgrp->closid;
+
+ /* Propagate updated closid to all tasks in this group. */
+ rdt_move_group_tasks(rdtgrp, rdtgrp, cpus, mongrp_reparent_match);
+
+ update_closid_rmid(cpus, NULL);
+}
+
+static int rdtgroup_rename(struct kernfs_node *kn,
+ struct kernfs_node *new_parent, const char *new_name)
+{
+ struct rdtgroup *new_prdtgrp;
+ struct rdtgroup *rdtgrp;
+ cpumask_var_t tmpmask;
+ int ret;
+
+ /*
+ * Don't allow kernfs_to_rdtgroup() to return a parent rdtgroup if
+ * either kernfs_node is a file.
+ */
+ if (kernfs_type(kn) != KERNFS_DIR ||
+ kernfs_type(new_parent) != KERNFS_DIR)
+ return -EPERM;
+
+ rdtgrp = kernfs_to_rdtgroup(kn);
+ new_prdtgrp = kernfs_to_rdtgroup(new_parent);
+ if (!rdtgrp || !new_prdtgrp)
+ return -EPERM;
+
+ if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
+ return -ENOMEM;
+
+ /* Release both kernfs active_refs before obtaining rdtgroup mutex. */
+ rdtgroup_kn_get(rdtgrp, kn);
+ rdtgroup_kn_get(new_prdtgrp, new_parent);
+
+ mutex_lock(&rdtgroup_mutex);
+
+ rdt_last_cmd_clear();
+
+ if ((rdtgrp->flags & RDT_DELETED) || (new_prdtgrp->flags & RDT_DELETED)) {
+ ret = -ESRCH;
+ goto out;
+ }
+
+ if (rdtgrp->type != RDTMON_GROUP || !kn->parent ||
+ !is_mon_groups(kn->parent, kn->name)) {
+ rdt_last_cmd_puts("source must be a MON group\n");
+ ret = -EPERM;
+ goto out;
+ }
+
+ if (!is_mon_groups(new_parent, new_name)) {
+ rdt_last_cmd_puts("destination must be a mon_groups subdirectory\n");
+ ret = -EPERM;
+ goto out;
+ }
+
+ /*
+ * If the MON group is monitoring CPUs, they must be assigned to the
+ * current parent CTRL_MON group and therefore cannot be assigned to
+ * the new parent, making the move illegal.
+ */
+ if (!cpumask_empty(&rdtgrp->cpu_mask) &&
+ (rdtgrp->mon.parent != new_prdtgrp)) {
+ rdt_last_cmd_puts("cannot move a MON group that monitors CPUs\n");
+ ret = -EPERM;
+ goto out;
+ }
+
+ /*
+ * Perform all input validation needed to ensure mongrp_reparent() will
+ * succeed before calling kernfs_rename(), otherwise it would be
+ * necessary to revert this call if mongrp_reparent() failed.
+ */
+ ret = kernfs_rename(kn, new_parent, new_name);
+ if (ret)
+ goto out;
+
+ mongrp_reparent(rdtgrp, new_prdtgrp, tmpmask);
+
+out:
+ mutex_unlock(&rdtgroup_mutex);
+ rdtgroup_kn_put(rdtgrp, kn);
+ rdtgroup_kn_put(new_prdtgrp, new_parent);
+ free_cpumask_var(tmpmask);
+ return ret;
+}
+
static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf)
{
if (resctrl_arch_get_cdp_enabled(RDT_RESOURCE_L3))
@@ -3273,6 +3399,7 @@ static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf)
static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops = {
.mkdir = rdtgroup_mkdir,
.rmdir = rdtgroup_rmdir,
+ .rename = rdtgroup_rename,
.show_options = rdtgroup_show_options,
};
--
2.40.0.rc0.216.gc4246ad0f0-goog
Hi Peter,
On 3/8/2023 5:14 AM, Peter Newman wrote:
> rdtgroup_kn_lock_live() can only release a kernfs reference for a single
> file before waiting on the rdtgroup_mutex, limiting its usefulness for
> operations on multiple files, such as rename.
>
> Factor the work needed to respectively break and unbreak active
> protection on an individual file into rdtgroup_kn_{get,put}().
>
> No functional change.
>
> Signed-off-by: Peter Newman <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reinette
Hi Peter,
On 3/8/2023 5:14 AM, Peter Newman wrote:
> Allow rdt_move_group_tasks() to be used for new group-scope operations.
This changelog jumps right into the solution. By doing so it makes what
follows hard to parse. Could you please start with the context, then
the problem and end with the solution?
> This function is currently only used to implement rmdir on a group or
> unmounting resctrlfs.
>
> Callers now provide a filtering function to indicate which tasks should
> be moved.
>
> No functional change.
>
> Signed-off-by: Peter Newman <[email protected]>
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 34 +++++++++++++++++++-------
> 1 file changed, 25 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index c3fb525d52e9..84af23a29612 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2393,22 +2393,29 @@ static int reset_all_ctrls(struct rdt_resource *r)
> }
>
> /*
> - * Move tasks from one to the other group. If @from is NULL, then all tasks
> - * in the systems are moved unconditionally (used for teardown).
> + * Move tasks from one to the other group.
> + *
> + * @from: passed unmodified to task_match_fn() for each task
When using this format it usually starts with a description of the parameter
before moving to how the parameter is used. Perhaps prefix the above with
something like "Resource group tasks are moved from." (Please feel free to
improve.) When starting to provide multiple sentences it helps formatting
to have each sentence start with a capital letter and end with a period.
> + * @to: group providing new config values for matching tasks
> + * @task_match_fn: callback returning true when a task requires update
Could this order please match the order of the parameters in the function?
> + * @mask: output-parameter indicating set of CPUs impacted by this
> + * operation
> *
> * If @mask is not NULL the cpus on which moved tasks are running are set
> * in that mask so the update smp function call is restricted to affected
> * cpus.
Could the above be merged with the earlier description of @mask? Please change
cpus to CPUs if you do.
> */
> -static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *to,
> - struct cpumask *mask)
> +static void rdt_move_group_tasks(struct rdtgroup *from,
> + struct rdtgroup *to,
> + struct cpumask *mask,
> + bool task_match_fn(struct task_struct *,
> + struct rdtgroup *))
> {
> struct task_struct *p, *t;
>
> read_lock(&tasklist_lock);
> for_each_process_thread(p, t) {
> - if (!from || is_closid_match(t, from) ||
> - is_rmid_match(t, from)) {
> + if (task_match_fn(t, from)) {
> WRITE_ONCE(t->closid, to->closid);
> WRITE_ONCE(t->rmid, to->mon.rmid);
>
> @@ -2451,6 +2458,15 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
> }
> }
>
> +/*
> + * If @from is NULL, then all tasks in the systems are moved unconditionally
> + * (used for teardown).
Could this description be expanded to describe what the matching does? Just jumping
in with the above sentence is quite cryptic.
> + */
> +static bool rmdir_match(struct task_struct *t, struct rdtgroup *from)
Could the function's name please reflect what the function does as opposed to
what the current users are doing at the time they call it? Perhaps
something like "task_in_any_group()" (thinking ahead about a possible
"task_in_mon_group()" for the next patch, please feel free to change).
Also note that the "from" is another naming that reflects the usage as
opposed to what the function does. It could just be "rdtgrp".
> +{
> + return !from || is_closid_match(t, from) || is_rmid_match(t, from);
> +}
> +
> /*
> * Forcibly remove all of subdirectories under root.
> */
> @@ -2459,7 +2475,7 @@ static void rmdir_all_sub(void)
> struct rdtgroup *rdtgrp, *tmp;
>
> /* Move all tasks to the default resource group */
> - rdt_move_group_tasks(NULL, &rdtgroup_default, NULL);
> + rdt_move_group_tasks(NULL, &rdtgroup_default, NULL, rmdir_match);
>
> list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) {
> /* Free any child rmids */
> @@ -3124,7 +3140,7 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
> int cpu;
>
> /* Give any tasks back to the parent group */
> - rdt_move_group_tasks(rdtgrp, prdtgrp, tmpmask);
> + rdt_move_group_tasks(rdtgrp, prdtgrp, tmpmask, rmdir_match);
>
> /* Update per cpu rmid of the moved CPUs first */
> for_each_cpu(cpu, &rdtgrp->cpu_mask)
> @@ -3164,7 +3180,7 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
> int cpu;
>
> /* Give any tasks back to the default group */
> - rdt_move_group_tasks(rdtgrp, &rdtgroup_default, tmpmask);
> + rdt_move_group_tasks(rdtgrp, &rdtgroup_default, tmpmask, rmdir_match);
>
> /* Give any CPUs back to the default group */
> cpumask_or(&rdtgroup_default.cpu_mask,
This looks good. Thanks for creating the match function. I think it
turned out well.
Reinette
Hi Peter,
On 3/8/2023 5:14 AM, Peter Newman wrote:
> To change the class of service for a large group of tasks, such as an
how about "class of service for" -> "resources allocated to" (see later
comment)
> application container, a container manager must write all of the tasks'
> IDs into the tasks file interface of the new control group.
>
> If a container manager is tracking containers' bandwidth usage by
"is tracking" -> "is additionally"
> placing tasks from each into their own monitoring group, it must first
> move the tasks to the default monitoring group of the new control group
> before it can move the tasks into their new monitoring groups. This is
> undesirable because it makes bandwidth usage during the move
> unattributable to the correct tasks and resets monitoring event counters
> and cache usage information for the group.
>
> To address this, implement the rename operation for resctrlfs mon groups
> to effect a change in CLOSID for a MON group while otherwise leaving the
> monitoring group intact.
As I understand the main motivation of this work is to change the
resources allocated to a large number of tasks without impacting the
monitoring data. You do mention "changing the class of service" and "change
in CLOSID" but I think a simpler motivation can help to support this work.
Together with the earlier snippets, how about the final paragraph reads:
"Implement the rename operation only for resctrlfs monitor groups to enable
users to move a monitoring group from one control group to another. This
effects a change in resources allocated to all the tasks in the monitoring
group while otherwise leaving the monitoring data intact."
>
> Signed-off-by: Peter Newman <[email protected]>
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 127 +++++++++++++++++++++++++
> 1 file changed, 127 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 84af23a29612..6d576013fc16 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -3256,6 +3256,132 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
> return ret;
> }
>
> +static bool mongrp_reparent_match(struct task_struct *t, struct rdtgroup *from)
As a follow-up on comments in previous patch, please name the function to reflect
what the does as opposed to what the caller does when using it.
> +{
> + WARN_ON(from->type != RDTMON_GROUP);
> +
> + /* RMID match implies CLOSID match. */
I find this comment confusing considering how this function is used. The resource
group's CLOSID was changed and this function is used to find tasks that need their
CLOSID updated. Stating here that the RMID implies a CLOSID match when the
CLOSID is not expected to match in the usage is unclear to me.
> + return is_rmid_match(t, from);
> +}
> +
> +/**
> + * mongrp_reparent() - replace parent CTRL_MON group of a MON group
> + * @rdtgrp: the MON group whose parent should be replaced
> + * @new_prdtgrp: replacement parent CTRL_MON group for @rdtgrp
> + * @cpus: cpumask provided by the caller for use during this call
> + *
> + * Replaces the parent CTRL_MON group for a MON group, resulting in all member
> + * tasks' CLOSID immediately changing to that of the new parent group.
> + * Monitoring data for the group is unaffected by this operation.
> + */
> +static void mongrp_reparent(struct rdtgroup *rdtgrp,
> + struct rdtgroup *new_prdtgrp,
> + cpumask_var_t cpus)
> +{
> + struct rdtgroup *prdtgrp = rdtgrp->mon.parent;
> +
> + WARN_ON(rdtgrp->type != RDTMON_GROUP);
> + WARN_ON(new_prdtgrp->type != RDTCTRL_GROUP);
> +
> + /* Nothing to do when simply renaming a MON group. */
> + if (prdtgrp == new_prdtgrp)
> + return;
> +
> + WARN_ON(list_empty(&prdtgrp->mon.crdtgrp_list));
> + list_del(&rdtgrp->mon.crdtgrp_list);
> +
> + list_add_tail(&rdtgrp->mon.crdtgrp_list,
> + &new_prdtgrp->mon.crdtgrp_list);
Could list_move_tail() be used here?
> + rdtgrp->mon.parent = new_prdtgrp;
> + rdtgrp->closid = new_prdtgrp->closid;
> +
> + /* Propagate updated closid to all tasks in this group. */
> + rdt_move_group_tasks(rdtgrp, rdtgrp, cpus, mongrp_reparent_match);
> +
> + update_closid_rmid(cpus, NULL);
> +}
> +
> +static int rdtgroup_rename(struct kernfs_node *kn,
> + struct kernfs_node *new_parent, const char *new_name)
> +{
> + struct rdtgroup *new_prdtgrp;
> + struct rdtgroup *rdtgrp;
> + cpumask_var_t tmpmask;
> + int ret;
> +
> + /*
> + * Don't allow kernfs_to_rdtgroup() to return a parent rdtgroup if
> + * either kernfs_node is a file.
> + */
> + if (kernfs_type(kn) != KERNFS_DIR ||
> + kernfs_type(new_parent) != KERNFS_DIR)
> + return -EPERM;
This would be one scenario where the user may attempt an interaction
with resctrl that results in an error while peeking at "last_cmd_status"
will report "ok". This is not the only case in which this may happen and
I think the code is ok. To help users to not need to read the kernel code
to understand what is going on, could a snippet about this feature be added
to the "Resource alloc and monitor groups" section in
Documentation/x86/rescrl.rst. It does not have to be elaborate
but in the area where directory removal is discussed there could be
a snippet that documents this new feature.
> +
> + rdtgrp = kernfs_to_rdtgroup(kn);
> + new_prdtgrp = kernfs_to_rdtgroup(new_parent);
> + if (!rdtgrp || !new_prdtgrp)
> + return -EPERM;
Can this be ENOENT to be consistent with this error encountered in
existing resctrl interactions? Although I see that rmdir currently
returns EPERM also but it is an outlier with that usage. ENODEV may
work also to match with the mkdir return - I do not know why it
was done differently.
> +
> + if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
> + return -ENOMEM;
It remains strange to do the allocation here. I understand its usage so maybe
just a comment like: "Perform early allocation as part
of ensuring the later resource group move cannot fail."
> +
> + /* Release both kernfs active_refs before obtaining rdtgroup mutex. */
> + rdtgroup_kn_get(rdtgrp, kn);
> + rdtgroup_kn_get(new_prdtgrp, new_parent);
> +
> + mutex_lock(&rdtgroup_mutex);
> +
> + rdt_last_cmd_clear();
> +
> + if ((rdtgrp->flags & RDT_DELETED) || (new_prdtgrp->flags & RDT_DELETED)) {
> + ret = -ESRCH;
> + goto out;
> + }
Same here, ENOENT?
> +
> + if (rdtgrp->type != RDTMON_GROUP || !kn->parent ||
> + !is_mon_groups(kn->parent, kn->name)) {
> + rdt_last_cmd_puts("source must be a MON group\n");
Please start the message with a capital letter.
> + ret = -EPERM;
> + goto out;
> + }
> +
> + if (!is_mon_groups(new_parent, new_name)) {
> + rdt_last_cmd_puts("destination must be a mon_groups subdirectory\n");
Also here a capital letter please.
> + ret = -EPERM;
> + goto out;
> + }
> +
> + /*
> + * If the MON group is monitoring CPUs, they must be assigned to the
Could this please be specific about what is meant with "they"?
> + * current parent CTRL_MON group and therefore cannot be assigned to
> + * the new parent, making the move illegal.
> + */
> + if (!cpumask_empty(&rdtgrp->cpu_mask) &&
> + (rdtgrp->mon.parent != new_prdtgrp)) {
> + rdt_last_cmd_puts("cannot move a MON group that monitors CPUs\n");
Also here a capital letter please.
> + ret = -EPERM;
> + goto out;
> + }
> +
> + /*
> + * Perform all input validation needed to ensure mongrp_reparent() will
> + * succeed before calling kernfs_rename(), otherwise it would be
> + * necessary to revert this call if mongrp_reparent() failed.
> + */
> + ret = kernfs_rename(kn, new_parent, new_name);
> + if (ret)
> + goto out;
> +
> + mongrp_reparent(rdtgrp, new_prdtgrp, tmpmask);
> +
> +out:
> + mutex_unlock(&rdtgroup_mutex);
> + rdtgroup_kn_put(rdtgrp, kn);
> + rdtgroup_kn_put(new_prdtgrp, new_parent);
> + free_cpumask_var(tmpmask);
> + return ret;
> +}
> +
> static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf)
> {
> if (resctrl_arch_get_cdp_enabled(RDT_RESOURCE_L3))
> @@ -3273,6 +3399,7 @@ static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf)
> static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops = {
> .mkdir = rdtgroup_mkdir,
> .rmdir = rdtgroup_rmdir,
> + .rename = rdtgroup_rename,
> .show_options = rdtgroup_show_options,
> };
>
This looks good. Thank you.
Reinette
Hi Reinette,
On Thu, Mar 23, 2023 at 7:02 PM Reinette Chatre
<[email protected]> wrote:
> On 3/8/2023 5:14 AM, Peter Newman wrote:
> > +/*
> > + * If @from is NULL, then all tasks in the systems are moved unconditionally
> > + * (used for teardown).
>
> Could this description be expanded to describe what the matching does? Just jumping
> in with the above sentence is quite cryptic.
>
> > + */
> > +static bool rmdir_match(struct task_struct *t, struct rdtgroup *from)
>
> Could the function's name please reflect what the function does as opposed to
> what the current users are doing at the time they call it? Perhaps
> something like "task_in_any_group()" (thinking ahead about a possible
> "task_in_mon_group()" for the next patch, please feel free to change).
> Also note that the "from" is another naming that reflects the usage as
> opposed to what the function does. It could just be "rdtgrp".
I read over the behavior of this function more carefully to try to come
up with a more descriptive name and realized that for MON groups it
behaves identically to the new one I created for rename().
The original rdt_move_group_tasks() was actually fine all along so I'm
just going to drop this patch.
-Peter
Hi Reinette,
On Thu, Mar 23, 2023 at 7:08 PM Reinette Chatre
<[email protected]> wrote:
> On 3/8/2023 5:14 AM, Peter Newman wrote:
> > application container, a container manager must write all of the tasks'
> > IDs into the tasks file interface of the new control group.
> >
> > If a container manager is tracking containers' bandwidth usage by
>
> "is tracking" -> "is additionally"
That doesn't sound right. Did you mean "is additionally tracking"?
> > + WARN_ON(list_empty(&prdtgrp->mon.crdtgrp_list));
> > + list_del(&rdtgrp->mon.crdtgrp_list);
> > +
> > + list_add_tail(&rdtgrp->mon.crdtgrp_list,
> > + &new_prdtgrp->mon.crdtgrp_list);
>
> Could list_move_tail() be used here?
Yes
>
> > + rdtgrp->mon.parent = new_prdtgrp;
> > + rdtgrp->closid = new_prdtgrp->closid;
> > +
> > + /* Propagate updated closid to all tasks in this group. */
> > + rdt_move_group_tasks(rdtgrp, rdtgrp, cpus, mongrp_reparent_match);
> > +
> > + update_closid_rmid(cpus, NULL);
> > +}
> > +
> > +static int rdtgroup_rename(struct kernfs_node *kn,
> > + struct kernfs_node *new_parent, const char *new_name)
> > +{
> > + struct rdtgroup *new_prdtgrp;
> > + struct rdtgroup *rdtgrp;
> > + cpumask_var_t tmpmask;
> > + int ret;
> > +
> > + /*
> > + * Don't allow kernfs_to_rdtgroup() to return a parent rdtgroup if
> > + * either kernfs_node is a file.
> > + */
> > + if (kernfs_type(kn) != KERNFS_DIR ||
> > + kernfs_type(new_parent) != KERNFS_DIR)
> > + return -EPERM;
>
> This would be one scenario where the user may attempt an interaction
> with resctrl that results in an error while peeking at "last_cmd_status"
> will report "ok". This is not the only case in which this may happen and
> I think the code is ok. To help users to not need to read the kernel code
> to understand what is going on, could a snippet about this feature be added
> to the "Resource alloc and monitor groups" section in
> Documentation/x86/rescrl.rst. It does not have to be elaborate
> but in the area where directory removal is discussed there could be
> a snippet that documents this new feature.
I only skipped the last_cmd_status update here because it was before
obtaining the lock. Even though it doesn't need the lock for this check,
I don't see anything wrong moving it until after the lock has been
obtained.
I'll also add some documentation.
> > +
> > + if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
> > + return -ENOMEM;
>
> It remains strange to do the allocation here. I understand its usage so maybe
> just a comment like: "Perform early allocation as part
> of ensuring the later resource group move cannot fail."
I think the other reason I put it so early was so all the other error
exits could free it unconditionally. The other clean approach is
allocating it last before calling kernfs_rename(). That should be easier
to follow.
I will try to address all of your other comments.
Thanks again for your review!
-Peter
On 3/30/2023 6:43 AM, Peter Newman wrote:
> On Thu, Mar 23, 2023 at 7:08 PM Reinette Chatre
> <[email protected]> wrote:
>> On 3/8/2023 5:14 AM, Peter Newman wrote:
>>> application container, a container manager must write all of the tasks'
>>> IDs into the tasks file interface of the new control group.
>>>
>>> If a container manager is tracking containers' bandwidth usage by
>>
>> "is tracking" -> "is additionally"
>
> That doesn't sound right. Did you mean "is additionally tracking"?
>
Yes, sorry, I did mean "is additionally tracking". The first
paragraph introduced the container manager's role as one of
allocating resources. I found this adverb to help with the
transition to the container manager's monitoring role.
Reinette