Hi folks,
These two patches fix a couple of CPU affinity issues involving cpusets
on heterogeneous systems. A concrete example of this is running 32-bit
tasks on recent arm64 SoCs, where some of the cores are only capable of
64-bit execution.
The first patch (from Peter) fixes a regression introduced during the
recent merge window which is causing test failures in Android where the
problematic patches have been backported. The second patch fixes a
longer-standing issue, which I noticed while testing fixes for the
initial regression.
Ideally, both of these would land together, but fixing the regression
for 6.2 is my main concern.
Anyway, I don't think either Peter or I would call ourselves cpuset
experts (far from it!), so please have a look.
Cheers,
Will
Cc: Peter Zijlstra <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: [email protected]
--->8
Peter Zijlstra (1):
cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs
Will Deacon (1):
cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task
kernel/cgroup/cpuset.c | 57 +++++++++++++++++++++++++++++++++---------
1 file changed, 45 insertions(+), 12 deletions(-)
--
2.39.1.456.gfc5497dd1b-goog
From: Peter Zijlstra <[email protected]>
There is a difference in behaviour between CPUSET={y,n} that is now
wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().
Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
user requested cpumask") relax_compatible_cpus_allowed_ptr() is
calling __sched_setaffinity() unconditionally.
But the underlying problem goes back a lot further, possibly to
commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which
switched cpuset_cpus_allowed() from cs->cpus_allowed to
cs->effective_cpus.
The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out
all offline CPUs. For tasks that are part of a (!root) cpuset this is
then later fixed up by the cpuset hotplug notifiers that re-evaluate
and re-apply cs->effective_cpus, but for (normal) tasks in the root
cpuset this does not happen and they will forever after be excluded
from CPUs onlined later.
As such, rewrite cpuset_cpus_allowed() to return a wider mask,
including the offline CPUs.
Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask")
Reported-by: Will Deacon <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
Signed-off-by: Will Deacon <[email protected]>
---
kernel/cgroup/cpuset.c | 39 ++++++++++++++++++++++++++++++++++-----
1 file changed, 34 insertions(+), 5 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index a29c0b13706b..8552cc2c586a 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3683,23 +3683,52 @@ void __init cpuset_init_smp(void)
BUG_ON(!cpuset_migrate_mm_wq);
}
+static const struct cpumask *__cs_cpus_allowed(struct cpuset *cs)
+{
+ const struct cpumask *cs_mask = cs->cpus_allowed;
+ if (!parent_cs(cs))
+ cs_mask = cpu_possible_mask;
+ return cs_mask;
+}
+
+static void cs_cpus_allowed(struct cpuset *cs, struct cpumask *pmask)
+{
+ do {
+ cpumask_and(pmask, pmask, __cs_cpus_allowed(cs));
+ cs = parent_cs(cs);
+ } while (cs);
+}
+
/**
* cpuset_cpus_allowed - return cpus_allowed mask from a tasks cpuset.
* @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed.
* @pmask: pointer to struct cpumask variable to receive cpus_allowed set.
*
- * Description: Returns the cpumask_var_t cpus_allowed of the cpuset
- * attached to the specified @tsk. Guaranteed to return some non-empty
- * subset of cpu_online_mask, even if this means going outside the
- * tasks cpuset.
+ * Description: Returns the cpumask_var_t cpus_allowed of the cpuset attached
+ * to the specified @tsk. Guaranteed to return some non-empty intersection
+ * with cpu_online_mask, even if this means going outside the tasks cpuset.
**/
void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
{
unsigned long flags;
+ struct cpuset *cs;
spin_lock_irqsave(&callback_lock, flags);
- guarantee_online_cpus(tsk, pmask);
+ rcu_read_lock();
+
+ cs = task_cs(tsk);
+ do {
+ cpumask_copy(pmask, task_cpu_possible_mask(tsk));
+ cs_cpus_allowed(cs, pmask);
+
+ if (cpumask_intersects(pmask, cpu_online_mask))
+ break;
+
+ cs = parent_cs(cs);
+ } while (cs);
+
+ rcu_read_unlock();
spin_unlock_irqrestore(&callback_lock, flags);
}
--
2.39.1.456.gfc5497dd1b-goog
set_cpus_allowed_ptr() will fail with -EINVAL if the requested
affinity mask is not a subset of the task_cpu_possible_mask() for the
task being updated. Consequently, on a heterogeneous system with cpusets
spanning the different CPU types, updates to the cgroup hierarchy can
silently fail to update task affinities when the effective affinity
mask for the cpuset is expanded.
For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
the only cores capable of executing 32-bit tasks. Attaching a 32-bit
task to a cpuset containing CPUs 0-2 will correctly affine the task to
CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
the affinity mask of the 32-bit task because update_tasks_cpumask() will
pass the full 0-3 mask to set_cpus_allowed_ptr().
Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
and use it to mask the 'effective_cpus' mask with the possible mask for
each task being updated.
Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
Signed-off-by: Will Deacon <[email protected]>
---
Note: We wondered whether it was worth calling guarantee_online_cpus()
if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
this path is only called when the effective mask changes, it didn't
seem appropriate. Ultimately, if you have 32-bit tasks attached to a
cpuset containing only 64-bit cpus, then the affinity is going to be
forced.
kernel/cgroup/cpuset.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 8552cc2c586a..f15fb0426707 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1205,12 +1205,13 @@ void rebuild_sched_domains(void)
/**
* update_tasks_cpumask - Update the cpumasks of tasks in the cpuset.
* @cs: the cpuset in which each task's cpus_allowed mask needs to be changed
+ * @new_cpus: the temp variable for the new effective_cpus mask
*
* Iterate through each task of @cs updating its cpus_allowed to the
* effective cpuset's. As this function is called with cpuset_rwsem held,
* cpuset membership stays stable.
*/
-static void update_tasks_cpumask(struct cpuset *cs)
+static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
{
struct css_task_iter it;
struct task_struct *task;
@@ -1224,7 +1225,10 @@ static void update_tasks_cpumask(struct cpuset *cs)
if (top_cs && (task->flags & PF_KTHREAD) &&
kthread_is_per_cpu(task))
continue;
- set_cpus_allowed_ptr(task, cs->effective_cpus);
+
+ cpumask_and(new_cpus, cs->effective_cpus,
+ task_cpu_possible_mask(task));
+ set_cpus_allowed_ptr(task, new_cpus);
}
css_task_iter_end(&it);
}
@@ -1509,7 +1513,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
spin_unlock_irq(&callback_lock);
if (adding || deleting)
- update_tasks_cpumask(parent);
+ update_tasks_cpumask(parent, tmp->new_cpus);
/*
* Set or clear CS_SCHED_LOAD_BALANCE when partcmd_update, if necessary.
@@ -1661,7 +1665,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
WARN_ON(!is_in_v2_mode() &&
!cpumask_equal(cp->cpus_allowed, cp->effective_cpus));
- update_tasks_cpumask(cp);
+ update_tasks_cpumask(cp, tmp->new_cpus);
/*
* On legacy hierarchy, if the effective cpumask of any non-
@@ -2309,7 +2313,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
}
}
- update_tasks_cpumask(parent);
+ update_tasks_cpumask(parent, tmpmask.new_cpus);
if (parent->child_ecpus_count)
update_sibling_cpumasks(parent, cs, &tmpmask);
@@ -3347,7 +3351,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
* as the tasks will be migrated to an ancestor.
*/
if (cpus_updated && !cpumask_empty(cs->cpus_allowed))
- update_tasks_cpumask(cs);
+ update_tasks_cpumask(cs, new_cpus);
if (mems_updated && !nodes_empty(cs->mems_allowed))
update_tasks_nodemask(cs);
@@ -3384,7 +3388,7 @@ hotplug_update_tasks(struct cpuset *cs,
spin_unlock_irq(&callback_lock);
if (cpus_updated)
- update_tasks_cpumask(cs);
+ update_tasks_cpumask(cs, new_cpus);
if (mems_updated)
update_tasks_nodemask(cs);
}
--
2.39.1.456.gfc5497dd1b-goog
On 1/31/23 17:17, Will Deacon wrote:
> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> affinity mask is not a subset of the task_cpu_possible_mask() for the
> task being updated. Consequently, on a heterogeneous system with cpusets
> spanning the different CPU types, updates to the cgroup hierarchy can
> silently fail to update task affinities when the effective affinity
> mask for the cpuset is expanded.
>
> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> task to a cpuset containing CPUs 0-2 will correctly affine the task to
> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> the affinity mask of the 32-bit task because update_tasks_cpumask() will
> pass the full 0-3 mask to set_cpus_allowed_ptr().
>
> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> and use it to mask the 'effective_cpus' mask with the possible mask for
> each task being updated.
>
> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> Signed-off-by: Will Deacon <[email protected]>
> ---
>
> Note: We wondered whether it was worth calling guarantee_online_cpus()
> if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
> this path is only called when the effective mask changes, it didn't
> seem appropriate. Ultimately, if you have 32-bit tasks attached to a
> cpuset containing only 64-bit cpus, then the affinity is going to be
> forced.
Now I see how the sched_setaffinity() change is impacting arm64. Instead
of putting in the bandage in cpuset. I would suggest doing another cpu
masking in __set_cpus_allowed_ptr() similar to what is now done for
user_cpus_ptr.
Another suggestion that I have is to add a helper like
has_task_cpu_possible_mask() that returns a true/false value. In this
way, we only need to do an additional masking if we have some mismatched
32-bit only cpus available in the system running 32-bit binaries so that
we can skip this step in most cases compiling them out in non-arm64 systems.
By doing that, we may be able to remove some of the existing usages of
task_cpu_possible_mask().
Thought?
Cheers,
Longman
On 1/31/23 17:17, Will Deacon wrote:
> From: Peter Zijlstra <[email protected]>
>
> There is a difference in behaviour between CPUSET={y,n} that is now
> wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().
>
> Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
> user requested cpumask") relax_compatible_cpus_allowed_ptr() is
> calling __sched_setaffinity() unconditionally.
>
> But the underlying problem goes back a lot further, possibly to
> commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which
> switched cpuset_cpus_allowed() from cs->cpus_allowed to
> cs->effective_cpus.
>
> The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out
> all offline CPUs. For tasks that are part of a (!root) cpuset this is
> then later fixed up by the cpuset hotplug notifiers that re-evaluate
> and re-apply cs->effective_cpus, but for (normal) tasks in the root
> cpuset this does not happen and they will forever after be excluded
> from CPUs onlined later.
>
> As such, rewrite cpuset_cpus_allowed() to return a wider mask,
> including the offline CPUs.
>
> Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask")
> Reported-by: Will Deacon <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
> Signed-off-by: Will Deacon <[email protected]>
Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only
tracked online cpus and ignored the offline ones. It behaves more like
effective_cpus in cpuset v2. With v2, we have 2 cpumasks - cpus_allowed
and effective_cpus. When cpuset v1 is mounted, cpus_allowed and
effective_cpus are effectively the same and track online cpus. With
cpuset v2, cpus_allowed contains what the user has written into and it
won't be changed until another write happen. However, what the user
written may not be what the system can give it and effective_cpus is
what the system decides a cpuset can use.
Cpuset v2 is able to handle hotplug correctly and update the task's
cpumask accordingly. So missing previously offline cpus won't happen
with v2.
Since v1 keeps the old behavior, previously offlined cpus are lost in
the cpuset's cpus_allowed. However tasks in the root cpuset will still
be fine with cpu hotplug as its cpus_allowed should track
cpu_online_mask. IOW, only tasks in a non-root cpuset suffer this problem.
It was a known issue in v1 and I believe is one of the major reasons of
the cpuset v2 redesign.
A major concern I have is the overhead of creating a poor man version of
v2 cpus_allowed. This issue can be worked around even for cpuset v1 if
it is mounted with the cpuset_v2_mode option to behave more like v2 in
its cpumask handling. Alternatively we may be able to provide a config
option to make this the default for v1 without the special mount option,
if necessary.
Cheers,
Longman
On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:
> On 1/31/23 17:17, Will Deacon wrote:
> > From: Peter Zijlstra <[email protected]>
> >
> > There is a difference in behaviour between CPUSET={y,n} that is now
> > wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().
> >
> > Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
> > user requested cpumask") relax_compatible_cpus_allowed_ptr() is
> > calling __sched_setaffinity() unconditionally.
> >
> > But the underlying problem goes back a lot further, possibly to
> > commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which
> > switched cpuset_cpus_allowed() from cs->cpus_allowed to
> > cs->effective_cpus.
> >
> > The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out
> > all offline CPUs. For tasks that are part of a (!root) cpuset this is
> > then later fixed up by the cpuset hotplug notifiers that re-evaluate
> > and re-apply cs->effective_cpus, but for (normal) tasks in the root
> > cpuset this does not happen and they will forever after be excluded
> > from CPUs onlined later.
> >
> > As such, rewrite cpuset_cpus_allowed() to return a wider mask,
> > including the offline CPUs.
> >
> > Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask")
> > Reported-by: Will Deacon <[email protected]>
> > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> > Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
> > Signed-off-by: Will Deacon <[email protected]>
>
> Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only
> tracked online cpus and ignored the offline ones. It behaves more like
> effective_cpus in cpuset v2. With v2, we have 2 cpumasks - cpus_allowed and
> effective_cpus. When cpuset v1 is mounted, cpus_allowed and effective_cpus
> are effectively the same and track online cpus. With cpuset v2, cpus_allowed
> contains what the user has written into and it won't be changed until
> another write happen. However, what the user written may not be what the
> system can give it and effective_cpus is what the system decides a cpuset
> can use.
>
> Cpuset v2 is able to handle hotplug correctly and update the task's cpumask
> accordingly. So missing previously offline cpus won't happen with v2.
>
> Since v1 keeps the old behavior, previously offlined cpus are lost in the
> cpuset's cpus_allowed. However tasks in the root cpuset will still be fine
> with cpu hotplug as its cpus_allowed should track cpu_online_mask. IOW, only
> tasks in a non-root cpuset suffer this problem.
>
> It was a known issue in v1 and I believe is one of the major reasons of the
> cpuset v2 redesign.
>
> A major concern I have is the overhead of creating a poor man version of v2
> cpus_allowed. This issue can be worked around even for cpuset v1 if it is
> mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask
> handling. Alternatively we may be able to provide a config option to make
> this the default for v1 without the special mount option, if necessary.
You're still not getting it -- even cpuset (be it v1 or v2) *MUST* *NOT*
mask offline cpus for root cgroup tasks, ever. (And the only reason it
gets away with masking offline for !root is that it re-applies the mask
every time it changes.)
Yes it did that for a fair while -- but it is wrong and broken and a
very big behavioural difference between CONFIG_CPUSET={y,n}. This must
not be.
Arguably cpuset-v2 is still wrong for masking offline cpus in it's
effective_cpus mask, but I really didn't want to go rewrite cpuset.c for
something that needs to go into /urgent *now*.
Hence this minimal patch that at least lets sched_setaffinity() work as
intended.
On Tue, Jan 31, 2023 at 09:22:44PM -0500, Waiman Long wrote:
> On 1/31/23 17:17, Will Deacon wrote:
> > set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> > affinity mask is not a subset of the task_cpu_possible_mask() for the
> > task being updated. Consequently, on a heterogeneous system with cpusets
> > spanning the different CPU types, updates to the cgroup hierarchy can
> > silently fail to update task affinities when the effective affinity
> > mask for the cpuset is expanded.
> >
> > For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> > the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> > task to a cpuset containing CPUs 0-2 will correctly affine the task to
> > CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> > the affinity mask of the 32-bit task because update_tasks_cpumask() will
> > pass the full 0-3 mask to set_cpus_allowed_ptr().
> >
> > Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> > and use it to mask the 'effective_cpus' mask with the possible mask for
> > each task being updated.
> >
> > Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> > Signed-off-by: Will Deacon <[email protected]>
> > ---
> >
> > Note: We wondered whether it was worth calling guarantee_online_cpus()
> > if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
> > this path is only called when the effective mask changes, it didn't
> > seem appropriate. Ultimately, if you have 32-bit tasks attached to a
> > cpuset containing only 64-bit cpus, then the affinity is going to be
> > forced.
>
> Now I see how the sched_setaffinity() change is impacting arm64. Instead of
> putting in the bandage in cpuset. I would suggest doing another cpu masking
> in __set_cpus_allowed_ptr() similar to what is now done for user_cpus_ptr.
NO! cpuset is *BROKEN* it has been for a while, it needs to get fixed.
Masking the offline CPUs is *WRONG*.
On Tue, Jan 31, 2023 at 10:17:19PM +0000, Will Deacon wrote:
> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> affinity mask is not a subset of the task_cpu_possible_mask() for the
> task being updated. Consequently, on a heterogeneous system with cpusets
> spanning the different CPU types, updates to the cgroup hierarchy can
> silently fail to update task affinities when the effective affinity
> mask for the cpuset is expanded.
>
> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> task to a cpuset containing CPUs 0-2 will correctly affine the task to
> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> the affinity mask of the 32-bit task because update_tasks_cpumask() will
> pass the full 0-3 mask to set_cpus_allowed_ptr().
>
> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> and use it to mask the 'effective_cpus' mask with the possible mask for
> each task being updated.
>
> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> Signed-off-by: Will Deacon <[email protected]>
> ---
>
> Note: We wondered whether it was worth calling guarantee_online_cpus()
> if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
> this path is only called when the effective mask changes, it didn't
> seem appropriate. Ultimately, if you have 32-bit tasks attached to a
> cpuset containing only 64-bit cpus, then the affinity is going to be
> forced.
Right, so the case above where the cpuset is shrunk to 0-1, then the
intersection between the cpuset and task_cpu_possible_mask() is empty
and it currently simply fails to update mask.
I argued it was probably desired to walk up the tree to find a wider
parent until the intersection of {cpuset, task_cpu_possible_mask,
online} becomes non-empty.
But I suppose that can wait until we have more time.
On 2/1/23 04:15, Peter Zijlstra wrote:
> On Tue, Jan 31, 2023 at 09:22:44PM -0500, Waiman Long wrote:
>> On 1/31/23 17:17, Will Deacon wrote:
>>> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
>>> affinity mask is not a subset of the task_cpu_possible_mask() for the
>>> task being updated. Consequently, on a heterogeneous system with cpusets
>>> spanning the different CPU types, updates to the cgroup hierarchy can
>>> silently fail to update task affinities when the effective affinity
>>> mask for the cpuset is expanded.
>>>
>>> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
>>> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
>>> task to a cpuset containing CPUs 0-2 will correctly affine the task to
>>> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
>>> the affinity mask of the 32-bit task because update_tasks_cpumask() will
>>> pass the full 0-3 mask to set_cpus_allowed_ptr().
>>>
>>> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
>>> and use it to mask the 'effective_cpus' mask with the possible mask for
>>> each task being updated.
>>>
>>> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
>>> Signed-off-by: Will Deacon <[email protected]>
>>> ---
>>>
>>> Note: We wondered whether it was worth calling guarantee_online_cpus()
>>> if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
>>> this path is only called when the effective mask changes, it didn't
>>> seem appropriate. Ultimately, if you have 32-bit tasks attached to a
>>> cpuset containing only 64-bit cpus, then the affinity is going to be
>>> forced.
>> Now I see how the sched_setaffinity() change is impacting arm64. Instead of
>> putting in the bandage in cpuset. I would suggest doing another cpu masking
>> in __set_cpus_allowed_ptr() similar to what is now done for user_cpus_ptr.
> NO! cpuset is *BROKEN* it has been for a while, it needs to get fixed.
>
> Masking the offline CPUs is *WRONG*.
>
This patch is not related to offline cpus at all. It is all about the
32-bit misfit cpus in some arm64 system.
Cheers,
Longman
On 2/1/23 04:14, Peter Zijlstra wrote:
> On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:
>> On 1/31/23 17:17, Will Deacon wrote:
>>> From: Peter Zijlstra <[email protected]>
>>>
>>> There is a difference in behaviour between CPUSET={y,n} that is now
>>> wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().
>>>
>>> Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
>>> user requested cpumask") relax_compatible_cpus_allowed_ptr() is
>>> calling __sched_setaffinity() unconditionally.
>>>
>>> But the underlying problem goes back a lot further, possibly to
>>> commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which
>>> switched cpuset_cpus_allowed() from cs->cpus_allowed to
>>> cs->effective_cpus.
>>>
>>> The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out
>>> all offline CPUs. For tasks that are part of a (!root) cpuset this is
>>> then later fixed up by the cpuset hotplug notifiers that re-evaluate
>>> and re-apply cs->effective_cpus, but for (normal) tasks in the root
>>> cpuset this does not happen and they will forever after be excluded
>>> from CPUs onlined later.
>>>
>>> As such, rewrite cpuset_cpus_allowed() to return a wider mask,
>>> including the offline CPUs.
>>>
>>> Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask")
>>> Reported-by: Will Deacon <[email protected]>
>>> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
>>> Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
>>> Signed-off-by: Will Deacon <[email protected]>
>> Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only
>> tracked online cpus and ignored the offline ones. It behaves more like
>> effective_cpus in cpuset v2. With v2, we have 2 cpumasks - cpus_allowed and
>> effective_cpus. When cpuset v1 is mounted, cpus_allowed and effective_cpus
>> are effectively the same and track online cpus. With cpuset v2, cpus_allowed
>> contains what the user has written into and it won't be changed until
>> another write happen. However, what the user written may not be what the
>> system can give it and effective_cpus is what the system decides a cpuset
>> can use.
>>
>> Cpuset v2 is able to handle hotplug correctly and update the task's cpumask
>> accordingly. So missing previously offline cpus won't happen with v2.
>>
>> Since v1 keeps the old behavior, previously offlined cpus are lost in the
>> cpuset's cpus_allowed. However tasks in the root cpuset will still be fine
>> with cpu hotplug as its cpus_allowed should track cpu_online_mask. IOW, only
>> tasks in a non-root cpuset suffer this problem.
>>
>> It was a known issue in v1 and I believe is one of the major reasons of the
>> cpuset v2 redesign.
>>
>> A major concern I have is the overhead of creating a poor man version of v2
>> cpus_allowed. This issue can be worked around even for cpuset v1 if it is
>> mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask
>> handling. Alternatively we may be able to provide a config option to make
>> this the default for v1 without the special mount option, if necessary.
> You're still not getting it -- even cpuset (be it v1 or v2) *MUST* *NOT*
> mask offline cpus for root cgroup tasks, ever. (And the only reason it
> gets away with masking offline for !root is that it re-applies the mask
> every time it changes.)
>
> Yes it did that for a fair while -- but it is wrong and broken and a
> very big behavioural difference between CONFIG_CPUSET={y,n}. This must
> not be.
>
> Arguably cpuset-v2 is still wrong for masking offline cpus in it's
> effective_cpus mask, but I really didn't want to go rewrite cpuset.c for
> something that needs to go into /urgent *now*.
>
> Hence this minimal patch that at least lets sched_setaffinity() work as
> intended.
I don't object to the general idea of keeping offline cpus in a task's
cpu affinity. In the case of cpu offline event, we can skip removing
that offline cpu from the task's cpu affinity. That will partially solve
the problem here and is also simpler.
I believe a main reason why effective_cpus holds only online cpus is
because of the need to detect when there is no online cpus available in
a given cpuset. In this case, it will fall back to the nearest ancestors
with online cpus.
This offline cpu problem with cpuset v1 is a known problem for a long
time. It is not a recent regression.
Cheers,
Longman
On 2/1/23 10:16, Waiman Long wrote:
> On 2/1/23 04:14, Peter Zijlstra wrote:
>> On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:
>>> On 1/31/23 17:17, Will Deacon wrote:
>>>> From: Peter Zijlstra <[email protected]>
>>>>
>>>> There is a difference in behaviour between CPUSET={y,n} that is now
>>>> wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().
>>>>
>>>> Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
>>>> user requested cpumask") relax_compatible_cpus_allowed_ptr() is
>>>> calling __sched_setaffinity() unconditionally.
>>>>
>>>> But the underlying problem goes back a lot further, possibly to
>>>> commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which
>>>> switched cpuset_cpus_allowed() from cs->cpus_allowed to
>>>> cs->effective_cpus.
>>>>
>>>> The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out
>>>> all offline CPUs. For tasks that are part of a (!root) cpuset this is
>>>> then later fixed up by the cpuset hotplug notifiers that re-evaluate
>>>> and re-apply cs->effective_cpus, but for (normal) tasks in the root
>>>> cpuset this does not happen and they will forever after be excluded
>>>> from CPUs onlined later.
>>>>
>>>> As such, rewrite cpuset_cpus_allowed() to return a wider mask,
>>>> including the offline CPUs.
>>>>
>>>> Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested
>>>> cpumask")
>>>> Reported-by: Will Deacon <[email protected]>
>>>> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
>>>> Link:
>>>> https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
>>>> Signed-off-by: Will Deacon <[email protected]>
>>> Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only
>>> tracked online cpus and ignored the offline ones. It behaves more like
>>> effective_cpus in cpuset v2. With v2, we have 2 cpumasks -
>>> cpus_allowed and
>>> effective_cpus. When cpuset v1 is mounted, cpus_allowed and
>>> effective_cpus
>>> are effectively the same and track online cpus. With cpuset v2,
>>> cpus_allowed
>>> contains what the user has written into and it won't be changed until
>>> another write happen. However, what the user written may not be what
>>> the
>>> system can give it and effective_cpus is what the system decides a
>>> cpuset
>>> can use.
>>>
>>> Cpuset v2 is able to handle hotplug correctly and update the task's
>>> cpumask
>>> accordingly. So missing previously offline cpus won't happen with v2.
>>>
>>> Since v1 keeps the old behavior, previously offlined cpus are lost
>>> in the
>>> cpuset's cpus_allowed. However tasks in the root cpuset will still
>>> be fine
>>> with cpu hotplug as its cpus_allowed should track cpu_online_mask.
>>> IOW, only
>>> tasks in a non-root cpuset suffer this problem.
>>>
>>> It was a known issue in v1 and I believe is one of the major reasons
>>> of the
>>> cpuset v2 redesign.
>>>
>>> A major concern I have is the overhead of creating a poor man
>>> version of v2
>>> cpus_allowed. This issue can be worked around even for cpuset v1 if
>>> it is
>>> mounted with the cpuset_v2_mode option to behave more like v2 in its
>>> cpumask
>>> handling. Alternatively we may be able to provide a config option to
>>> make
>>> this the default for v1 without the special mount option, if necessary.
>> You're still not getting it -- even cpuset (be it v1 or v2) *MUST* *NOT*
>> mask offline cpus for root cgroup tasks, ever. (And the only reason it
>> gets away with masking offline for !root is that it re-applies the mask
>> every time it changes.)
>>
>> Yes it did that for a fair while -- but it is wrong and broken and a
>> very big behavioural difference between CONFIG_CPUSET={y,n}. This must
>> not be.
>>
>> Arguably cpuset-v2 is still wrong for masking offline cpus in it's
>> effective_cpus mask, but I really didn't want to go rewrite cpuset.c for
>> something that needs to go into /urgent *now*.
>>
>> Hence this minimal patch that at least lets sched_setaffinity() work as
>> intended.
>
> I don't object to the general idea of keeping offline cpus in a task's
> cpu affinity. In the case of cpu offline event, we can skip removing
> that offline cpu from the task's cpu affinity. That will partially
> solve the problem here and is also simpler.
>
> I believe a main reason why effective_cpus holds only online cpus is
> because of the need to detect when there is no online cpus available
> in a given cpuset. In this case, it will fall back to the nearest
> ancestors with online cpus.
>
> This offline cpu problem with cpuset v1 is a known problem for a long
> time. It is not a recent regression.
Note that using cpus_allowed directly in cgroup v2 may not be right
because cpus_allowed may have no relationship to effective_cpus at all
in some cases, e.g.
root
|
V
A (cpus_allowed = 1-4, effective_cpus = 1-4)
|
V
B (cpus_allowed = 5-8, effective_cpus = 1-4)
In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong.
I wonder how often is cpu hotplug happening in those arm64 cpu systems
that only have a subset of cpus that can run 32-bit programs.
Cheers,
Longman
On 2/1/23 13:46, Waiman Long wrote:
> On 2/1/23 10:16, Waiman Long wrote:
>> On 2/1/23 04:14, Peter Zijlstra wrote:
>>> On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:
>>>> On 1/31/23 17:17, Will Deacon wrote:
>>>>> From: Peter Zijlstra <[email protected]>
>>>>>
>>>>> There is a difference in behaviour between CPUSET={y,n} that is now
>>>>> wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().
>>>>>
>>>>> Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
>>>>> user requested cpumask") relax_compatible_cpus_allowed_ptr() is
>>>>> calling __sched_setaffinity() unconditionally.
>>>>>
>>>>> But the underlying problem goes back a lot further, possibly to
>>>>> commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}")
>>>>> which
>>>>> switched cpuset_cpus_allowed() from cs->cpus_allowed to
>>>>> cs->effective_cpus.
>>>>>
>>>>> The problem is that for CPUSET=y cpuset_cpus_allowed() will filter
>>>>> out
>>>>> all offline CPUs. For tasks that are part of a (!root) cpuset this is
>>>>> then later fixed up by the cpuset hotplug notifiers that re-evaluate
>>>>> and re-apply cs->effective_cpus, but for (normal) tasks in the root
>>>>> cpuset this does not happen and they will forever after be excluded
>>>>> from CPUs onlined later.
>>>>>
>>>>> As such, rewrite cpuset_cpus_allowed() to return a wider mask,
>>>>> including the offline CPUs.
>>>>>
>>>>> Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested
>>>>> cpumask")
>>>>> Reported-by: Will Deacon <[email protected]>
>>>>> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
>>>>> Link:
>>>>> https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
>>>>> Signed-off-by: Will Deacon <[email protected]>
>>>> Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only
>>>> tracked online cpus and ignored the offline ones. It behaves more like
>>>> effective_cpus in cpuset v2. With v2, we have 2 cpumasks -
>>>> cpus_allowed and
>>>> effective_cpus. When cpuset v1 is mounted, cpus_allowed and
>>>> effective_cpus
>>>> are effectively the same and track online cpus. With cpuset v2,
>>>> cpus_allowed
>>>> contains what the user has written into and it won't be changed until
>>>> another write happen. However, what the user written may not be
>>>> what the
>>>> system can give it and effective_cpus is what the system decides a
>>>> cpuset
>>>> can use.
>>>>
>>>> Cpuset v2 is able to handle hotplug correctly and update the task's
>>>> cpumask
>>>> accordingly. So missing previously offline cpus won't happen with v2.
>>>>
>>>> Since v1 keeps the old behavior, previously offlined cpus are lost
>>>> in the
>>>> cpuset's cpus_allowed. However tasks in the root cpuset will still
>>>> be fine
>>>> with cpu hotplug as its cpus_allowed should track cpu_online_mask.
>>>> IOW, only
>>>> tasks in a non-root cpuset suffer this problem.
>>>>
>>>> It was a known issue in v1 and I believe is one of the major
>>>> reasons of the
>>>> cpuset v2 redesign.
>>>>
>>>> A major concern I have is the overhead of creating a poor man
>>>> version of v2
>>>> cpus_allowed. This issue can be worked around even for cpuset v1 if
>>>> it is
>>>> mounted with the cpuset_v2_mode option to behave more like v2 in
>>>> its cpumask
>>>> handling. Alternatively we may be able to provide a config option
>>>> to make
>>>> this the default for v1 without the special mount option, if
>>>> necessary.
>>> You're still not getting it -- even cpuset (be it v1 or v2) *MUST*
>>> *NOT*
>>> mask offline cpus for root cgroup tasks, ever. (And the only reason it
>>> gets away with masking offline for !root is that it re-applies the mask
>>> every time it changes.)
>>>
>>> Yes it did that for a fair while -- but it is wrong and broken and a
>>> very big behavioural difference between CONFIG_CPUSET={y,n}. This must
>>> not be.
>>>
>>> Arguably cpuset-v2 is still wrong for masking offline cpus in it's
>>> effective_cpus mask, but I really didn't want to go rewrite cpuset.c
>>> for
>>> something that needs to go into /urgent *now*.
>>>
>>> Hence this minimal patch that at least lets sched_setaffinity() work as
>>> intended.
>>
>> I don't object to the general idea of keeping offline cpus in a
>> task's cpu affinity. In the case of cpu offline event, we can skip
>> removing that offline cpu from the task's cpu affinity. That will
>> partially solve the problem here and is also simpler.
>>
>> I believe a main reason why effective_cpus holds only online cpus is
>> because of the need to detect when there is no online cpus available
>> in a given cpuset. In this case, it will fall back to the nearest
>> ancestors with online cpus.
>>
>> This offline cpu problem with cpuset v1 is a known problem for a long
>> time. It is not a recent regression.
>
> Note that using cpus_allowed directly in cgroup v2 may not be right
> because cpus_allowed may have no relationship to effective_cpus at all
> in some cases, e.g.
>
> root
> |
> V
> A (cpus_allowed = 1-4, effective_cpus = 1-4)
> |
> V
> B (cpus_allowed = 5-8, effective_cpus = 1-4)
>
> In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is
> wrong.
>
> I wonder how often is cpu hotplug happening in those arm64 cpu systems
> that only have a subset of cpus that can run 32-bit programs.
One possible solution is to use cpuset_cpus_allowed_fallback() in case
none of the cpus in the current cpuset is allowed to be used to run a
given task.
Cheers,
Longman
On 2/1/23 14:14, Waiman Long wrote:
> One possible solution is to use cpuset_cpus_allowed_fallback() in case
> none of the cpus in the current cpuset is allowed to be used to run a
> given task.
It looks like we will need to enhance cpuset_cpus_allowed_fallback() to
walk up cpuset tree to find one that have useable cpus.
Cheers,
Longman
On Wed, Feb 01, 2023 at 01:46:11PM -0500, Waiman Long wrote:
> Note that using cpus_allowed directly in cgroup v2 may not be right because
> cpus_allowed may have no relationship to effective_cpus at all in some
> cases, e.g.
>
> ?? root
> ??? |
> ??? V
> ??? A (cpus_allowed = 1-4, effective_cpus = 1-4)
> ??? |
> ??? V
> ??? B (cpus_allowed = 5-8, effective_cpus = 1-4)
>
> In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong.
I think my patch as written does the right thing here. Since the
intersection of (1-4) and (5-8) is empty it will move up the hierarchy
and we'll end up with (1-4) from the cgroup side of things.
So the purpose of __cs_cpus_allowed() is to override the cpus_allowed of
the root set and force it to cpu_possible_mask.
Then cs_cpus_allowed() computes the intersection of cs->cpus_allowed and
all it's parents. This will, in the case of B above, result in the empty
mask.
Then cpuset_cpus_allowed() has a loop that starts with
task_cpu_possible_mask(), intersects that with cs_cpus_allowed() and if
the intersection of that and cpu_online_mask is empty, moves up the
hierarchy. Given cs_cpus_allowed(B) is the empty mask, we'll move to A.
Note that since we force the mask of root to cpu_possible_mask,
cs_cpus_allowed(root) will be a no-op and if we guarantee (in arch code)
that cpu_online_mask always has a non-empty intersection with
task_cpu_possible_mask(), this loop is guaranteed to terminate with a
viable mask.
On 2/1/23 16:10, Peter Zijlstra wrote:
> On Wed, Feb 01, 2023 at 01:46:11PM -0500, Waiman Long wrote:
>
>> Note that using cpus_allowed directly in cgroup v2 may not be right because
>> cpus_allowed may have no relationship to effective_cpus at all in some
>> cases, e.g.
>>
>> root
>> |
>> V
>> A (cpus_allowed = 1-4, effective_cpus = 1-4)
>> |
>> V
>> B (cpus_allowed = 5-8, effective_cpus = 1-4)
>>
>> In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong.
> I think my patch as written does the right thing here. Since the
> intersection of (1-4) and (5-8) is empty it will move up the hierarchy
> and we'll end up with (1-4) from the cgroup side of things.
>
> So the purpose of __cs_cpus_allowed() is to override the cpus_allowed of
> the root set and force it to cpu_possible_mask.
>
> Then cs_cpus_allowed() computes the intersection of cs->cpus_allowed and
> all it's parents. This will, in the case of B above, result in the empty
> mask.
>
> Then cpuset_cpus_allowed() has a loop that starts with
> task_cpu_possible_mask(), intersects that with cs_cpus_allowed() and if
> the intersection of that and cpu_online_mask is empty, moves up the
> hierarchy. Given cs_cpus_allowed(B) is the empty mask, we'll move to A.
>
> Note that since we force the mask of root to cpu_possible_mask,
> cs_cpus_allowed(root) will be a no-op and if we guarantee (in arch code)
> that cpu_online_mask always has a non-empty intersection with
> task_cpu_possible_mask(), this loop is guaranteed to terminate with a
> viable mask.
I will take a closer look at that tomorrow. I will be more comfortable
ack'ing that if this is specific to v1 cpuset instead of applying this
in both v1 and v2 since it is only v1 that is problematic.
Cheers,
Longman
On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:
> A major concern I have is the overhead of creating a poor man version of v2
> cpus_allowed. This issue can be worked around even for cpuset v1 if it is
> mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask
> handling. Alternatively we may be able to provide a config option to make
> this the default for v1 without the special mount option, if necessary.
It is equally broken for v2, it masks against effective_cpus. Not to
mention it explicitly starts with cpu_online_mask.
On 2/2/23 03:34, Peter Zijlstra wrote:
> On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:
>
>> A major concern I have is the overhead of creating a poor man version of v2
>> cpus_allowed. This issue can be worked around even for cpuset v1 if it is
>> mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask
>> handling. Alternatively we may be able to provide a config option to make
>> this the default for v1 without the special mount option, if necessary.
> It is equally broken for v2, it masks against effective_cpus. Not to
> mention it explicitly starts with cpu_online_mask.
After taking a close look at the patch, my understanding of what it is
doing is as follows:
v2: cpus_allowed will not be affected by hotplug. So the new
cpuset_cpus_allowed() will return effective_cpus + offline cpus that
should have been part of effective_cpus if online before masking it with
allowable cpus and then go up the cpuset hierarchy if necessary.
v1: cpus_allowed is equivalent to v2 effective_cpus. It starts at the
current cpuset and move up the hierarchy if necessary to find a cpuset
that have at least one allowable cpu.
First of all, it does not take into account of the v2 partition feature
that may cause it to produce incorrect result if partition is enabled
somewhere. Secondly, I don't see any benefit other than having some
additional offline cpu available in a task's cpumask which the scheduler
will ignore anyway. v2 is able to recover a previously offlined cpu. So
we don't gain any net benefit other than the going up the cpuset
hierarchy part.
For v1, I agree we should go up the cpuset hierarchy to find a usable
cpuset. Instead of introducing such a complexity in
cpuset_cpus_allowed(), my current preference is to do the hierarchy
climbing part in an enhanced cpuset_cpus_allowed_fallback() after an
initial failure of cpuset_cpus_allowed(). That will be easier to
understand than having such complexity and overhead in
cpuset_cpus_allowed() alone.
I will work on a patchset to do that as a counter offer.
Cheers,
Longman
On Thu, Feb 02, 2023 at 11:06:51AM -0500, Waiman Long wrote:
> After taking a close look at the patch, my understanding of what it is doing
> is as follows:
>
> v2: cpus_allowed will not be affected by hotplug. So the new
> cpuset_cpus_allowed() will return effective_cpus + offline cpus that should
> have been part of effective_cpus if online before masking it with allowable
> cpus and then go up the cpuset hierarchy if necessary.
>
> v1: cpus_allowed is equivalent to v2 effective_cpus. It starts at the
> current cpuset and move up the hierarchy if necessary to find a cpuset that
> have at least one allowable cpu.
>
> First of all, it does not take into account of the v2 partition feature that
> may cause it to produce incorrect result if partition is enabled somewhere.
How so? For a partition the cpus_allowed mask should be the parition
CPUs. The only magical bit about partitions is that any one CPU cannot
belong to two partitions and load-balancing is split.
> Secondly, I don't see any benefit other than having some additional offline
> cpu available in a task's cpumask which the scheduler will ignore anyway.
Those CPUs can come online again -- you're *again* dismissing the true
bug :/
If you filter out the offline CPUs at sched_setaffinity() time, you
forever lose those CPUs, the task will never again move to those CPUs,
even if they do come online after.
It is really simple to reproduce this:
- boot machine
- offline all CPUs except one
- taskset -p ffffffff $$
- online all CPUs
and observe your shell (and all its decendants) being stuck to the one
CPU. Do the same thing on a CPUSET=n build and note the difference (you
retain the full mask).
> v2 is able to recover a previously offlined cpu. So we don't gain any
> net benefit other than the going up the cpuset hierarchy part.
Only for !root tasks. Not even v2 will re-set the affinity of root tasks
afaict.
> For v1, I agree we should go up the cpuset hierarchy to find a usable
> cpuset. Instead of introducing such a complexity in cpuset_cpus_allowed(),
> my current preference is to do the hierarchy climbing part in an enhanced
> cpuset_cpus_allowed_fallback() after an initial failure of
> cpuset_cpus_allowed(). That will be easier to understand than having such
> complexity and overhead in cpuset_cpus_allowed() alone.
>
> I will work on a patchset to do that as a counter offer.
We will need a small and simple patch for /urgent, or I will need to
revert all your patches -- your call.
I also don't tihnk you fully appreciate the ramifications of
task_cpu_possible_mask(), cpuset currently gets that quite wrong.
On 2/2/23 14:42, Peter Zijlstra wrote:
> On Thu, Feb 02, 2023 at 11:06:51AM -0500, Waiman Long wrote:
>
>> After taking a close look at the patch, my understanding of what it is doing
>> is as follows:
>>
>> v2: cpus_allowed will not be affected by hotplug. So the new
>> cpuset_cpus_allowed() will return effective_cpus + offline cpus that should
>> have been part of effective_cpus if online before masking it with allowable
>> cpus and then go up the cpuset hierarchy if necessary.
>>
>> v1: cpus_allowed is equivalent to v2 effective_cpus. It starts at the
>> current cpuset and move up the hierarchy if necessary to find a cpuset that
>> have at least one allowable cpu.
>>
>> First of all, it does not take into account of the v2 partition feature that
>> may cause it to produce incorrect result if partition is enabled somewhere.
> How so? For a partition the cpus_allowed mask should be the parition
> CPUs. The only magical bit about partitions is that any one CPU cannot
> belong to two partitions and load-balancing is split.
There can be a child partition underneath it that uses up some of the
cpus in cpus_allowed mask. So if you cascading up the cpuset tree from
another child, your code will wrongly include those cpus that are
dedicated to the other child partition.
>
>> Secondly, I don't see any benefit other than having some additional offline
>> cpu available in a task's cpumask which the scheduler will ignore anyway.
> Those CPUs can come online again -- you're *again* dismissing the true
> bug :/
>
> If you filter out the offline CPUs at sched_setaffinity() time, you
> forever lose those CPUs, the task will never again move to those CPUs,
> even if they do come online after.
>
> It is really simple to reproduce this:
>
> - boot machine
> - offline all CPUs except one
> - taskset -p ffffffff $$
> - online all CPUs
>
> and observe your shell (and all its decendants) being stuck to the one
> CPU. Do the same thing on a CPUSET=n build and note the difference (you
> retain the full mask).
With the new sched_setaffinity(), the original mask should be stored in
user_cpus_ptr. The cpuset code is supposed to call
set_cpus_allowed_ptr() which then set the mask correctly. I will run
your test and figure out why it does not work as I would have expected.
>
>> v2 is able to recover a previously offlined cpu. So we don't gain any
>> net benefit other than the going up the cpuset hierarchy part.
> Only for !root tasks. Not even v2 will re-set the affinity of root tasks
> afaict.
>
>> For v1, I agree we should go up the cpuset hierarchy to find a usable
>> cpuset. Instead of introducing such a complexity in cpuset_cpus_allowed(),
>> my current preference is to do the hierarchy climbing part in an enhanced
>> cpuset_cpus_allowed_fallback() after an initial failure of
>> cpuset_cpus_allowed(). That will be easier to understand than having such
>> complexity and overhead in cpuset_cpus_allowed() alone.
>>
>> I will work on a patchset to do that as a counter offer.
> We will need a small and simple patch for /urgent, or I will need to
> revert all your patches -- your call.
>
> I also don't tihnk you fully appreciate the ramifications of
> task_cpu_possible_mask(), cpuset currently gets that quite wrong.
OK, I don't realize the urgency of that. If it is that urgent, I will
have no objection to get it in for now. We can improve it later on. So
are you planning to get it into the current 6.2 rc or 6.3?
Tejun, are you OK with that as you are the cgroup maintainer?
Cheers,
Longman
On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote:
> > > I will work on a patchset to do that as a counter offer.
> > We will need a small and simple patch for /urgent, or I will need to
> > revert all your patches -- your call.
> >
> > I also don't tihnk you fully appreciate the ramifications of
> > task_cpu_possible_mask(), cpuset currently gets that quite wrong.
>
> OK, I don't realize the urgency of that. If it is that urgent, I will have
> no objection to get it in for now. We can improve it later on. So are you
> planning to get it into the current 6.2 rc or 6.3?
>
> Tejun, are you OK with that as you are the cgroup maintainer?
Yeah, gotta fix the regression but is there currently a solution which fixes
the regression but doesn't further break other stuff?
Thanks.
--
tejun
On 2/2/23 15:48, Tejun Heo wrote:
> On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote:
>>>> I will work on a patchset to do that as a counter offer.
>>> We will need a small and simple patch for /urgent, or I will need to
>>> revert all your patches -- your call.
>>>
>>> I also don't tihnk you fully appreciate the ramifications of
>>> task_cpu_possible_mask(), cpuset currently gets that quite wrong.
>> OK, I don't realize the urgency of that. If it is that urgent, I will have
>> no objection to get it in for now. We can improve it later on. So are you
>> planning to get it into the current 6.2 rc or 6.3?
>>
>> Tejun, are you OK with that as you are the cgroup maintainer?
> Yeah, gotta fix the regression but is there currently a solution which fixes
> the regression but doesn't further break other stuff?
I believe there is a better way to do that, but it will need more time
to flex out. Since cpuset_cpus_allowed() is only used by
kernel/sched/core.c, Peter will be responsible if it somehow breaks
other stuff.
Thanks,
Longman
On 2/2/23 15:53, Waiman Long wrote:
>
> On 2/2/23 15:48, Tejun Heo wrote:
>> On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote:
>>>>> I will work on a patchset to do that as a counter offer.
>>>> We will need a small and simple patch for /urgent, or I will need to
>>>> revert all your patches -- your call.
>>>>
>>>> I also don't tihnk you fully appreciate the ramifications of
>>>> task_cpu_possible_mask(), cpuset currently gets that quite wrong.
>>> OK, I don't realize the urgency of that. If it is that urgent, I
>>> will have
>>> no objection to get it in for now. We can improve it later on. So
>>> are you
>>> planning to get it into the current 6.2 rc or 6.3?
>>>
>>> Tejun, are you OK with that as you are the cgroup maintainer?
>> Yeah, gotta fix the regression but is there currently a solution
>> which fixes
>> the regression but doesn't further break other stuff?
>
> I believe there is a better way to do that, but it will need more time
> to flex out. Since cpuset_cpus_allowed() is only used by
> kernel/sched/core.c, Peter will be responsible if it somehow breaks
> other stuff.
Maybe my cpuset patch that don't update task's cpumask on cpu offline
event can help. However, I don't know the exact scenario where the
regression happen, so it may not.
Cheers,
Longman
On Thu, Feb 02, 2023 at 04:05:14PM -0500, Waiman Long wrote:
>
> On 2/2/23 15:53, Waiman Long wrote:
> >
> > On 2/2/23 15:48, Tejun Heo wrote:
> > > On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote:
> > > > > > I will work on a patchset to do that as a counter offer.
> > > > > We will need a small and simple patch for /urgent, or I will need to
> > > > > revert all your patches -- your call.
> > > > >
> > > > > I also don't tihnk you fully appreciate the ramifications of
> > > > > task_cpu_possible_mask(), cpuset currently gets that quite wrong.
> > > > OK, I don't realize the urgency of that. If it is that urgent, I
> > > > will have
> > > > no objection to get it in for now. We can improve it later on.
> > > > So are you
> > > > planning to get it into the current 6.2 rc or 6.3?
> > > >
> > > > Tejun, are you OK with that as you are the cgroup maintainer?
> > > Yeah, gotta fix the regression but is there currently a solution
> > > which fixes
> > > the regression but doesn't further break other stuff?
> >
> > I believe there is a better way to do that, but it will need more time
> > to flex out. Since cpuset_cpus_allowed() is only used by
> > kernel/sched/core.c, Peter will be responsible if it somehow breaks
> > other stuff.
>
> Maybe my cpuset patch that don't update task's cpumask on cpu offline event
> can help. However, I don't know the exact scenario where the regression
> happen, so it may not.
Neither patch looks like they would break anything. That said, the patches
aren't trivial and we're really close to the merge window, so I'd really
appreciate if you can take a look and test a bit before we send these
Linus's way. We can replace it with a better solution afterwards.
Thanks.
--
tejun
On 2/2/23 16:50, Tejun Heo wrote:
> On Thu, Feb 02, 2023 at 04:05:14PM -0500, Waiman Long wrote:
>> On 2/2/23 15:53, Waiman Long wrote:
>>> On 2/2/23 15:48, Tejun Heo wrote:
>>>> On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote:
>>>>>>> I will work on a patchset to do that as a counter offer.
>>>>>> We will need a small and simple patch for /urgent, or I will need to
>>>>>> revert all your patches -- your call.
>>>>>>
>>>>>> I also don't tihnk you fully appreciate the ramifications of
>>>>>> task_cpu_possible_mask(), cpuset currently gets that quite wrong.
>>>>> OK, I don't realize the urgency of that. If it is that urgent, I
>>>>> will have
>>>>> no objection to get it in for now. We can improve it later on.
>>>>> So are you
>>>>> planning to get it into the current 6.2 rc or 6.3?
>>>>>
>>>>> Tejun, are you OK with that as you are the cgroup maintainer?
>>>> Yeah, gotta fix the regression but is there currently a solution
>>>> which fixes
>>>> the regression but doesn't further break other stuff?
>>> I believe there is a better way to do that, but it will need more time
>>> to flex out. Since cpuset_cpus_allowed() is only used by
>>> kernel/sched/core.c, Peter will be responsible if it somehow breaks
>>> other stuff.
>> Maybe my cpuset patch that don't update task's cpumask on cpu offline event
>> can help. However, I don't know the exact scenario where the regression
>> happen, so it may not.
> Neither patch looks like they would break anything. That said, the patches
> aren't trivial and we're really close to the merge window, so I'd really
> appreciate if you can take a look and test a bit before we send these
> Linus's way. We can replace it with a better solution afterwards.
OK, will do.
Cheers,
Longman
On Wed, Feb 01, 2023 at 10:34:00PM -0500, Waiman Long wrote:
> On 2/1/23 16:10, Peter Zijlstra wrote:
> > On Wed, Feb 01, 2023 at 01:46:11PM -0500, Waiman Long wrote:
> >
> > > Note that using cpus_allowed directly in cgroup v2 may not be right because
> > > cpus_allowed may have no relationship to effective_cpus at all in some
> > > cases, e.g.
> > >
> > > ?? root
> > > ??? |
> > > ??? V
> > > ??? A (cpus_allowed = 1-4, effective_cpus = 1-4)
> > > ??? |
> > > ??? V
> > > ??? B (cpus_allowed = 5-8, effective_cpus = 1-4)
> > >
> > > In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong.
> > I think my patch as written does the right thing here. Since the
> > intersection of (1-4) and (5-8) is empty it will move up the hierarchy
> > and we'll end up with (1-4) from the cgroup side of things.
> >
> > So the purpose of __cs_cpus_allowed() is to override the cpus_allowed of
> > the root set and force it to cpu_possible_mask.
> >
> > Then cs_cpus_allowed() computes the intersection of cs->cpus_allowed and
> > all it's parents. This will, in the case of B above, result in the empty
> > mask.
> >
> > Then cpuset_cpus_allowed() has a loop that starts with
> > task_cpu_possible_mask(), intersects that with cs_cpus_allowed() and if
> > the intersection of that and cpu_online_mask is empty, moves up the
> > hierarchy. Given cs_cpus_allowed(B) is the empty mask, we'll move to A.
> >
> > Note that since we force the mask of root to cpu_possible_mask,
> > cs_cpus_allowed(root) will be a no-op and if we guarantee (in arch code)
> > that cpu_online_mask always has a non-empty intersection with
> > task_cpu_possible_mask(), this loop is guaranteed to terminate with a
> > viable mask.
>
> I will take a closer look at that tomorrow. I will be more comfortable
> ack'ing that if this is specific to v1 cpuset instead of applying this in
> both v1 and v2 since it is only v1 that is problematic.
fwiw, the regression I'm seeing is with cgroup2. I haven't tried v1.
WIll
On 2/3/23 06:50, Will Deacon wrote:
> On Wed, Feb 01, 2023 at 10:34:00PM -0500, Waiman Long wrote:
>> On 2/1/23 16:10, Peter Zijlstra wrote:
>>> On Wed, Feb 01, 2023 at 01:46:11PM -0500, Waiman Long wrote:
>>>
>>>> Note that using cpus_allowed directly in cgroup v2 may not be right because
>>>> cpus_allowed may have no relationship to effective_cpus at all in some
>>>> cases, e.g.
>>>>
>>>> root
>>>> |
>>>> V
>>>> A (cpus_allowed = 1-4, effective_cpus = 1-4)
>>>> |
>>>> V
>>>> B (cpus_allowed = 5-8, effective_cpus = 1-4)
>>>>
>>>> In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong.
>>> I think my patch as written does the right thing here. Since the
>>> intersection of (1-4) and (5-8) is empty it will move up the hierarchy
>>> and we'll end up with (1-4) from the cgroup side of things.
>>>
>>> So the purpose of __cs_cpus_allowed() is to override the cpus_allowed of
>>> the root set and force it to cpu_possible_mask.
>>>
>>> Then cs_cpus_allowed() computes the intersection of cs->cpus_allowed and
>>> all it's parents. This will, in the case of B above, result in the empty
>>> mask.
>>>
>>> Then cpuset_cpus_allowed() has a loop that starts with
>>> task_cpu_possible_mask(), intersects that with cs_cpus_allowed() and if
>>> the intersection of that and cpu_online_mask is empty, moves up the
>>> hierarchy. Given cs_cpus_allowed(B) is the empty mask, we'll move to A.
>>>
>>> Note that since we force the mask of root to cpu_possible_mask,
>>> cs_cpus_allowed(root) will be a no-op and if we guarantee (in arch code)
>>> that cpu_online_mask always has a non-empty intersection with
>>> task_cpu_possible_mask(), this loop is guaranteed to terminate with a
>>> viable mask.
>> I will take a closer look at that tomorrow. I will be more comfortable
>> ack'ing that if this is specific to v1 cpuset instead of applying this in
>> both v1 and v2 since it is only v1 that is problematic.
> fwiw, the regression I'm seeing is with cgroup2. I haven't tried v1.
I think I know where the problem is. It is due to the fact the cpuset
hotplug code doesn't update cpumasks of the tasks in the top cpuset
(root) at all when there is a cpu offline or online event. It is
probably because for some of the tasks in the top cpuset, especially the
percpu kthread, changing their cpumasks can be catastrophic. The hotplug
code does update the cpumasks of the tasks that are not in the top
cpuset. This problem is irrespective of whether v1 or v2 is in use.
The partition code does try to update the cpumasks of the tasks in the
top cpuset, but skip the percpu kthreads. My testing show that is
probably OK. For safety, I agree that is better to extend the allowed
cpu list to all the possible cpus (including offline ones) for now until
more testings are done to find a safe way to do that. That special case
should apply only to tasks in the top cpuset. For the rests, the current
code should be OK and is less risky than adopting code in this patch.
Cheers,
Longman
On Fri, Feb 03, 2023 at 10:13:10AM -0500, Waiman Long wrote:
> I think I know where the problem is. It is due to the fact the cpuset
> hotplug code doesn't update cpumasks of the tasks in the top cpuset (root)
> at all when there is a cpu offline or online event. It is probably because
> for some of the tasks in the top cpuset, especially the percpu kthread,
> changing their cpumasks can be catastrophic. The hotplug code does update
> the cpumasks of the tasks that are not in the top cpuset. This problem is
> irrespective of whether v1 or v2 is in use.
I've been saying this exact thing for how many mails now?
On 2/3/23 10:26, Peter Zijlstra wrote:
> On Fri, Feb 03, 2023 at 10:13:10AM -0500, Waiman Long wrote:
>
>> I think I know where the problem is. It is due to the fact the cpuset
>> hotplug code doesn't update cpumasks of the tasks in the top cpuset (root)
>> at all when there is a cpu offline or online event. It is probably because
>> for some of the tasks in the top cpuset, especially the percpu kthread,
>> changing their cpumasks can be catastrophic. The hotplug code does update
>> the cpumasks of the tasks that are not in the top cpuset. This problem is
>> irrespective of whether v1 or v2 is in use.
> I've been saying this exact thing for how many mails now?
My bad. The fact that sched_getaffinity() masks off the offline cpus
makes me thought incorrectly that tasks in the top cpuset were also
updated by the hotplug code. Further testing indicates this is the case.
Thanks,
Longman
On Thu, Feb 02, 2023 at 11:50:55AM -1000, Tejun Heo wrote:
> On Thu, Feb 02, 2023 at 04:05:14PM -0500, Waiman Long wrote:
> >
> > On 2/2/23 15:53, Waiman Long wrote:
> > >
> > > On 2/2/23 15:48, Tejun Heo wrote:
> > > > On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote:
> > > > > > > I will work on a patchset to do that as a counter offer.
> > > > > > We will need a small and simple patch for /urgent, or I will need to
> > > > > > revert all your patches -- your call.
> > > > > >
> > > > > > I also don't tihnk you fully appreciate the ramifications of
> > > > > > task_cpu_possible_mask(), cpuset currently gets that quite wrong.
> > > > > OK, I don't realize the urgency of that. If it is that urgent, I
> > > > > will have
> > > > > no objection to get it in for now. We can improve it later on.
> > > > > So are you
> > > > > planning to get it into the current 6.2 rc or 6.3?
> > > > >
> > > > > Tejun, are you OK with that as you are the cgroup maintainer?
> > > > Yeah, gotta fix the regression but is there currently a solution
> > > > which fixes
> > > > the regression but doesn't further break other stuff?
> > >
> > > I believe there is a better way to do that, but it will need more time
> > > to flex out. Since cpuset_cpus_allowed() is only used by
> > > kernel/sched/core.c, Peter will be responsible if it somehow breaks
> > > other stuff.
> >
> > Maybe my cpuset patch that don't update task's cpumask on cpu offline event
> > can help. However, I don't know the exact scenario where the regression
> > happen, so it may not.
>
> Neither patch looks like they would break anything. That said, the patches
> aren't trivial and we're really close to the merge window, so I'd really
> appreciate if you can take a look and test a bit before we send these
> Linus's way. We can replace it with a better solution afterwards.
FWIW, I tested this series in an arm64 heterogeneous setup with things
like hotplug and exec()ing between 32-bit and 64-bit tasks and it all
seems good.
The alternative would be to revert Waiman's setaffinity changes, which
I've had a go at here:
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=ssa-reverts
and I _think_ I've rescued the UAF fix too.
What do people prefer?
Will
On 1/31/23 17:17, Will Deacon wrote:
> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> affinity mask is not a subset of the task_cpu_possible_mask() for the
> task being updated. Consequently, on a heterogeneous system with cpusets
> spanning the different CPU types, updates to the cgroup hierarchy can
> silently fail to update task affinities when the effective affinity
> mask for the cpuset is expanded.
>
> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> task to a cpuset containing CPUs 0-2 will correctly affine the task to
> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> the affinity mask of the 32-bit task because update_tasks_cpumask() will
> pass the full 0-3 mask to set_cpus_allowed_ptr().
>
> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> and use it to mask the 'effective_cpus' mask with the possible mask for
> each task being updated.
>
> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> Signed-off-by: Will Deacon <[email protected]>
> ---
>
> Note: We wondered whether it was worth calling guarantee_online_cpus()
> if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that
> this path is only called when the effective mask changes, it didn't
> seem appropriate. Ultimately, if you have 32-bit tasks attached to a
> cpuset containing only 64-bit cpus, then the affinity is going to be
> forced.
>
> kernel/cgroup/cpuset.c | 18 +++++++++++-------
> 1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 8552cc2c586a..f15fb0426707 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1205,12 +1205,13 @@ void rebuild_sched_domains(void)
> /**
> * update_tasks_cpumask - Update the cpumasks of tasks in the cpuset.
> * @cs: the cpuset in which each task's cpus_allowed mask needs to be changed
> + * @new_cpus: the temp variable for the new effective_cpus mask
> *
> * Iterate through each task of @cs updating its cpus_allowed to the
> * effective cpuset's. As this function is called with cpuset_rwsem held,
> * cpuset membership stays stable.
> */
> -static void update_tasks_cpumask(struct cpuset *cs)
> +static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
> {
> struct css_task_iter it;
> struct task_struct *task;
> @@ -1224,7 +1225,10 @@ static void update_tasks_cpumask(struct cpuset *cs)
> if (top_cs && (task->flags & PF_KTHREAD) &&
> kthread_is_per_cpu(task))
> continue;
> - set_cpus_allowed_ptr(task, cs->effective_cpus);
> +
> + cpumask_and(new_cpus, cs->effective_cpus,
> + task_cpu_possible_mask(task));
> + set_cpus_allowed_ptr(task, new_cpus);
> }
> css_task_iter_end(&it);
> }
> @@ -1509,7 +1513,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
> spin_unlock_irq(&callback_lock);
>
> if (adding || deleting)
> - update_tasks_cpumask(parent);
> + update_tasks_cpumask(parent, tmp->new_cpus);
>
> /*
> * Set or clear CS_SCHED_LOAD_BALANCE when partcmd_update, if necessary.
> @@ -1661,7 +1665,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
> WARN_ON(!is_in_v2_mode() &&
> !cpumask_equal(cp->cpus_allowed, cp->effective_cpus));
>
> - update_tasks_cpumask(cp);
> + update_tasks_cpumask(cp, tmp->new_cpus);
>
> /*
> * On legacy hierarchy, if the effective cpumask of any non-
> @@ -2309,7 +2313,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
> }
> }
>
> - update_tasks_cpumask(parent);
> + update_tasks_cpumask(parent, tmpmask.new_cpus);
>
> if (parent->child_ecpus_count)
> update_sibling_cpumasks(parent, cs, &tmpmask);
> @@ -3347,7 +3351,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
> * as the tasks will be migrated to an ancestor.
> */
> if (cpus_updated && !cpumask_empty(cs->cpus_allowed))
> - update_tasks_cpumask(cs);
> + update_tasks_cpumask(cs, new_cpus);
> if (mems_updated && !nodes_empty(cs->mems_allowed))
> update_tasks_nodemask(cs);
>
> @@ -3384,7 +3388,7 @@ hotplug_update_tasks(struct cpuset *cs,
> spin_unlock_irq(&callback_lock);
>
> if (cpus_updated)
> - update_tasks_cpumask(cs);
> + update_tasks_cpumask(cs, new_cpus);
> if (mems_updated)
> update_tasks_nodemask(cs);
> }
Acked-by: Waiman Long <[email protected]>
This change is good for backporting to stable releases. For the latest
kernel, I will prefer to centralize the masking in
__set_cpus_allowed_ptr() where a scratch_mask is already being used for
masking purpose. Let get this patch merged now, I will send a patch to
move off the masking afterward.
Cheers,
Longman
On Tue, Jan 31, 2023 at 10:17:19PM +0000, Will Deacon wrote:
> set_cpus_allowed_ptr() will fail with -EINVAL if the requested
> affinity mask is not a subset of the task_cpu_possible_mask() for the
> task being updated. Consequently, on a heterogeneous system with cpusets
> spanning the different CPU types, updates to the cgroup hierarchy can
> silently fail to update task affinities when the effective affinity
> mask for the cpuset is expanded.
>
> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are
> the only cores capable of executing 32-bit tasks. Attaching a 32-bit
> task to a cpuset containing CPUs 0-2 will correctly affine the task to
> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend
> the affinity mask of the 32-bit task because update_tasks_cpumask() will
> pass the full 0-3 mask to set_cpus_allowed_ptr().
>
> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater
> and use it to mask the 'effective_cpus' mask with the possible mask for
> each task being updated.
>
> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()")
> Signed-off-by: Will Deacon <[email protected]>
Applied to cgroup/for-6.2-fixes.
Thanks.
--
tejun