2024-03-25 18:09:55

by Peter Newman

[permalink] [raw]
Subject: [PATCH v1 6/6] x86/resctrl: Don't search tasklist in mongroup rename

Iterating over all task_structs while read-locking the tasklist_lock
results in significant task creation/destruction latency. Back-to-back
move operations can thus be disastrous to the responsiveness of
threadpool-based services.

Now that the CLOSID is determined indirectly through a reference to the
task's current rdtgroup, it is not longer necessary to update the CLOSID
in all tasks belonging to the moved mongroup. The context switch handler
just needs to be prepared for concurrent writes to the parent pointer.

Signed-off-by: Peter Newman <[email protected]>
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 30 +++++++-------------------
1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index bd067f7ed5b6..a007c0ec478f 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -388,8 +388,11 @@ void __resctrl_sched_in(struct task_struct *tsk)
* by a full barrier and synchronous IPI
* broadcast before proceeding to free the
* group.
+ *
+ * parent can be concurrently updated to a new
+ * group as a result of mongrp_reparent().
*/
- closid = rgrp->mon.parent->closid;
+ closid = READ_ONCE(rgrp->mon.parent)->closid;
} else {
closid = rgrp->closid;
}
@@ -3809,8 +3812,7 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
* Monitoring data for the group is unaffected by this operation.
*/
static void mongrp_reparent(struct rdtgroup *rdtgrp,
- struct rdtgroup *new_prdtgrp,
- cpumask_var_t cpus)
+ struct rdtgroup *new_prdtgrp)
{
struct rdtgroup *prdtgrp = rdtgrp->mon.parent;

@@ -3825,13 +3827,10 @@ static void mongrp_reparent(struct rdtgroup *rdtgrp,
list_move_tail(&rdtgrp->mon.crdtgrp_list,
&new_prdtgrp->mon.crdtgrp_list);

- rdtgrp->mon.parent = new_prdtgrp;
+ WRITE_ONCE(rdtgrp->mon.parent, new_prdtgrp);
rdtgrp->closid = new_prdtgrp->closid;

- /* Propagate updated closid to all tasks in this group. */
- rdt_move_group_tasks(rdtgrp, rdtgrp, cpus);
-
- update_closid_rmid(cpus, NULL);
+ update_closid_rmid(cpu_online_mask, NULL);
}

static int rdtgroup_rename(struct kernfs_node *kn,
@@ -3839,7 +3838,6 @@ static int rdtgroup_rename(struct kernfs_node *kn,
{
struct rdtgroup *new_prdtgrp;
struct rdtgroup *rdtgrp;
- cpumask_var_t tmpmask;
int ret;

rdtgrp = kernfs_to_rdtgroup(kn);
@@ -3909,16 +3907,6 @@ static int rdtgroup_rename(struct kernfs_node *kn,
goto out;
}

- /*
- * Allocate the cpumask for use in mongrp_reparent() to avoid the
- * possibility of failing to allocate it after kernfs_rename() has
- * succeeded.
- */
- if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL)) {
- ret = -ENOMEM;
- goto out;
- }
-
/*
* Perform all input validation and allocations needed to ensure
* mongrp_reparent() will succeed before calling kernfs_rename(),
@@ -3927,9 +3915,7 @@ static int rdtgroup_rename(struct kernfs_node *kn,
*/
ret = kernfs_rename(kn, new_parent, new_name);
if (!ret)
- mongrp_reparent(rdtgrp, new_prdtgrp, tmpmask);
-
- free_cpumask_var(tmpmask);
+ mongrp_reparent(rdtgrp, new_prdtgrp);

out:
mutex_unlock(&rdtgroup_mutex);
--
2.44.0.396.g6e790dbe36-goog



2024-04-04 23:19:10

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v1 6/6] x86/resctrl: Don't search tasklist in mongroup rename

Hi Peter,

On 3/25/2024 10:27 AM, Peter Newman wrote:
> Iterating over all task_structs while read-locking the tasklist_lock
> results in significant task creation/destruction latency. Back-to-back
> move operations can thus be disastrous to the responsiveness of
> threadpool-based services.

Please be specific with claims.

>
> Now that the CLOSID is determined indirectly through a reference to the
> task's current rdtgroup, it is not longer necessary to update the CLOSID
> in all tasks belonging to the moved mongroup. The context switch handler
> just needs to be prepared for concurrent writes to the parent pointer.

(insert text explanation how context switch handler is prepared for
concurrent writes)

>
> Signed-off-by: Peter Newman <[email protected]>
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 30 +++++++-------------------
> 1 file changed, 8 insertions(+), 22 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index bd067f7ed5b6..a007c0ec478f 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -388,8 +388,11 @@ void __resctrl_sched_in(struct task_struct *tsk)
> * by a full barrier and synchronous IPI
> * broadcast before proceeding to free the
> * group.
> + *
> + * parent can be concurrently updated to a new
> + * group as a result of mongrp_reparent().
> */
> - closid = rgrp->mon.parent->closid;
> + closid = READ_ONCE(rgrp->mon.parent)->closid;
> } else {
> closid = rgrp->closid;
> }
> @@ -3809,8 +3812,7 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
> * Monitoring data for the group is unaffected by this operation.
> */
> static void mongrp_reparent(struct rdtgroup *rdtgrp,
> - struct rdtgroup *new_prdtgrp,
> - cpumask_var_t cpus)
> + struct rdtgroup *new_prdtgrp)
> {
> struct rdtgroup *prdtgrp = rdtgrp->mon.parent;
>
> @@ -3825,13 +3827,10 @@ static void mongrp_reparent(struct rdtgroup *rdtgrp,
> list_move_tail(&rdtgrp->mon.crdtgrp_list,
> &new_prdtgrp->mon.crdtgrp_list);
>
> - rdtgrp->mon.parent = new_prdtgrp;
> + WRITE_ONCE(rdtgrp->mon.parent, new_prdtgrp);
> rdtgrp->closid = new_prdtgrp->closid;
>
> - /* Propagate updated closid to all tasks in this group. */
> - rdt_move_group_tasks(rdtgrp, rdtgrp, cpus);
> -
> - update_closid_rmid(cpus, NULL);
> + update_closid_rmid(cpu_online_mask, NULL);

This deserves a mention in changelog.

There is a section in the documentation, "Resource alloc and monitor groups"
that describes moving monitor groups. Unless you receive better ideas to address
the concern about this impact on CPU-isolated realtime workloads I would like
to suggest that you add a snippet there about the consequences of a move.

Reinette