2023-07-01 07:16:44

by Miaohe Lin

[permalink] [raw]
Subject: [PATCH] cgroup/cpuset: update parent subparts cpumask while holding css refcnt

update_parent_subparts_cpumask() is called outside RCU read-side critical
section without holding extra css refcnt of cp. In theroy, cp could be
freed at any time. Holding extra css refcnt to ensure cp is valid while
updating parent subparts cpumask.

Fixes: d7c8142d5a55 ("cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule")
Signed-off-by: Miaohe Lin <[email protected]>
---
kernel/cgroup/cpuset.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 58e6f18f01c1..632a9986d5de 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1806,9 +1806,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
cpuset_for_each_child(cp, css, parent)
if (is_partition_valid(cp) &&
cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) {
+ if (!css_tryget_online(&cp->css))
+ continue;
rcu_read_unlock();
update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp);
rcu_read_lock();
+ css_put(&cp->css);
}
rcu_read_unlock();
retval = 0;
--
2.33.0



2023-07-02 00:00:55

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] cgroup/cpuset: update parent subparts cpumask while holding css refcnt

On 7/1/23 19:38, Waiman Long wrote:
> On 7/1/23 02:50, Miaohe Lin wrote:
>> update_parent_subparts_cpumask() is called outside RCU read-side
>> critical
>> section without holding extra css refcnt of cp. In theroy, cp could be
>> freed at any time. Holding extra css refcnt to ensure cp is valid while
>> updating parent subparts cpumask.
>>
>> Fixes: d7c8142d5a55 ("cgroup/cpuset: Make partition invalid if
>> cpumask change violates exclusivity rule")
>> Signed-off-by: Miaohe Lin <[email protected]>
>> ---
>>   kernel/cgroup/cpuset.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index 58e6f18f01c1..632a9986d5de 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -1806,9 +1806,12 @@ static int update_cpumask(struct cpuset *cs,
>> struct cpuset *trialcs,
>>           cpuset_for_each_child(cp, css, parent)
>>               if (is_partition_valid(cp) &&
>>                   cpumask_intersects(trialcs->cpus_allowed,
>> cp->cpus_allowed)) {
>> +                if (!css_tryget_online(&cp->css))
>> +                    continue;
>>                   rcu_read_unlock();
>>                   update_parent_subparts_cpumask(cp,
>> partcmd_invalidate, NULL, &tmp);
>>                   rcu_read_lock();
>> +                css_put(&cp->css);
>>               }
>>           rcu_read_unlock();
>>           retval = 0;
>
> Thanks for finding that. It looks good to me.
>
> Reviewed-by: Waiman Long <[email protected]>

Though, I will say that an offline cpuset cannot be a valid partition
root. So it is not really a problem. For correctness sake and
consistency with other similar code, I am in favor of getting it merged.

Cheers,
Longman


2023-07-02 00:02:12

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] cgroup/cpuset: update parent subparts cpumask while holding css refcnt

On 7/1/23 02:50, Miaohe Lin wrote:
> update_parent_subparts_cpumask() is called outside RCU read-side critical
> section without holding extra css refcnt of cp. In theroy, cp could be
> freed at any time. Holding extra css refcnt to ensure cp is valid while
> updating parent subparts cpumask.
>
> Fixes: d7c8142d5a55 ("cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule")
> Signed-off-by: Miaohe Lin <[email protected]>
> ---
> kernel/cgroup/cpuset.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 58e6f18f01c1..632a9986d5de 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1806,9 +1806,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
> cpuset_for_each_child(cp, css, parent)
> if (is_partition_valid(cp) &&
> cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) {
> + if (!css_tryget_online(&cp->css))
> + continue;
> rcu_read_unlock();
> update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp);
> rcu_read_lock();
> + css_put(&cp->css);
> }
> rcu_read_unlock();
> retval = 0;

Thanks for finding that. It looks good to me.

Reviewed-by: Waiman Long <[email protected]>


2023-07-03 03:13:53

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH] cgroup/cpuset: update parent subparts cpumask while holding css refcnt

On 2023/7/2 7:46, Waiman Long wrote:
> On 7/1/23 19:38, Waiman Long wrote:
>> On 7/1/23 02:50, Miaohe Lin wrote:
>>> update_parent_subparts_cpumask() is called outside RCU read-side critical
>>> section without holding extra css refcnt of cp. In theroy, cp could be
>>> freed at any time. Holding extra css refcnt to ensure cp is valid while
>>> updating parent subparts cpumask.
>>>
>>> Fixes: d7c8142d5a55 ("cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule")
>>> Signed-off-by: Miaohe Lin <[email protected]>
>>> ---
>>>   kernel/cgroup/cpuset.c | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index 58e6f18f01c1..632a9986d5de 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -1806,9 +1806,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
>>>           cpuset_for_each_child(cp, css, parent)
>>>               if (is_partition_valid(cp) &&
>>>                   cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) {
>>> +                if (!css_tryget_online(&cp->css))
>>> +                    continue;
>>>                   rcu_read_unlock();
>>>                   update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp);
>>>                   rcu_read_lock();
>>> +                css_put(&cp->css);
>>>               }
>>>           rcu_read_unlock();
>>>           retval = 0;
>>
>> Thanks for finding that. It looks good to me.
>>
>> Reviewed-by: Waiman Long <[email protected]>
>
> Though, I will say that an offline cpuset cannot be a valid partition root. So it is not really a problem. For correctness sake and consistency with other similar code, I am in favor of getting it merged.

Yes, cpuset_mutex will prevent cpuset from being offline while update cpumask. And as you mentioned, this patch makes code more consistency at least.
Thanks for your review and comment.




2023-07-03 19:55:31

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] cgroup/cpuset: update parent subparts cpumask while holding css refcnt

On Mon, Jul 03, 2023 at 10:58:19AM +0800, Miaohe Lin wrote:
> On 2023/7/2 7:46, Waiman Long wrote:
> > On 7/1/23 19:38, Waiman Long wrote:
> >> On 7/1/23 02:50, Miaohe Lin wrote:
> >>> update_parent_subparts_cpumask() is called outside RCU read-side critical
> >>> section without holding extra css refcnt of cp. In theroy, cp could be
> >>> freed at any time. Holding extra css refcnt to ensure cp is valid while
> >>> updating parent subparts cpumask.
> >>>
> >>> Fixes: d7c8142d5a55 ("cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule")
> >>> Signed-off-by: Miaohe Lin <[email protected]>
> >>> ---
> >>> ? kernel/cgroup/cpuset.c | 3 +++
> >>> ? 1 file changed, 3 insertions(+)
> >>>
> >>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> >>> index 58e6f18f01c1..632a9986d5de 100644
> >>> --- a/kernel/cgroup/cpuset.c
> >>> +++ b/kernel/cgroup/cpuset.c
> >>> @@ -1806,9 +1806,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
> >>> ????????? cpuset_for_each_child(cp, css, parent)
> >>> ????????????? if (is_partition_valid(cp) &&
> >>> ????????????????? cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) {
> >>> +??????????????? if (!css_tryget_online(&cp->css))
> >>> +??????????????????? continue;
> >>> ????????????????? rcu_read_unlock();
> >>> ????????????????? update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp);
> >>> ????????????????? rcu_read_lock();
> >>> +??????????????? css_put(&cp->css);
> >>> ????????????? }
> >>> ????????? rcu_read_unlock();
> >>> ????????? retval = 0;
> >>
> >> Thanks for finding that. It looks good to me.
> >>
> >> Reviewed-by: Waiman Long <[email protected]>
> >
> > Though, I will say that an offline cpuset cannot be a valid partition root. So it is not really a problem. For correctness sake and consistency with other similar code, I am in favor of getting it merged.
>
> Yes, cpuset_mutex will prevent cpuset from being offline while update cpumask. And as you mentioned, this patch makes code more consistency at least.

Can you update the patch description to note that this isn't required for
correctness?

Thanks.

--
tejun

2023-07-10 15:29:01

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH] cgroup/cpuset: update parent subparts cpumask while holding css refcnt

Hello.

On Sat, Jul 01, 2023 at 02:50:49PM +0800, Miaohe Lin <[email protected]> wrote:
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1806,9 +1806,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
> cpuset_for_each_child(cp, css, parent)
> if (is_partition_valid(cp) &&
> cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) {
> + if (!css_tryget_online(&cp->css))
> + continue;
> rcu_read_unlock();
> update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp);
> rcu_read_lock();
> + css_put(&cp->css);

Apologies for a possibly noob question -- why is RCU read lock
temporarily dropped within the loop?
(Is it only because of callback_lock or cgroup_file_kn_lock (via
notify_partition_change()) on PREEMPT_RT?)



[
OT question:
cpuset_for_each_child(cp, css, parent) (1)
if (is_partition_valid(cp) &&
cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) {
if (!css_tryget_online(&cp->css))
continue;
rcu_read_unlock();
update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp);
...
update_tasks_cpumask(cp->parent)
...
css_task_iter_start(&cp->parent->css, 0, &it); (2)
...
rcu_read_lock();
css_put(&cp->css);
}

May this touch each task same number of times as its depth within
herarchy?
]

Thanks,
Michal


Attachments:
(No filename) (1.41 kB)
signature.asc (235.00 B)
Download all attachments

2023-07-10 16:03:33

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] cgroup/cpuset: update parent subparts cpumask while holding css refcnt

On 7/10/23 11:11, Michal Koutný wrote:
> Hello.
>
> On Sat, Jul 01, 2023 at 02:50:49PM +0800, Miaohe Lin <[email protected]> wrote:
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -1806,9 +1806,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
>> cpuset_for_each_child(cp, css, parent)
>> if (is_partition_valid(cp) &&
>> cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) {
>> + if (!css_tryget_online(&cp->css))
>> + continue;
>> rcu_read_unlock();
>> update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp);
>> rcu_read_lock();
>> + css_put(&cp->css);
> Apologies for a possibly noob question -- why is RCU read lock
> temporarily dropped within the loop?
> (Is it only because of callback_lock or cgroup_file_kn_lock (via
> notify_partition_change()) on PREEMPT_RT?)
>
>
>
> [
> OT question:
> cpuset_for_each_child(cp, css, parent) (1)
> if (is_partition_valid(cp) &&
> cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) {
> if (!css_tryget_online(&cp->css))
> continue;
> rcu_read_unlock();
> update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp);
> ...
> update_tasks_cpumask(cp->parent)
> ...
> css_task_iter_start(&cp->parent->css, 0, &it); (2)
> ...
> rcu_read_lock();
> css_put(&cp->css);
> }
>
> May this touch each task same number of times as its depth within
> herarchy?

I believe the primary reason is because update_parent_subparts_cpumask()
can potential run for quite a while. So we don't want to hold the
rcu_read_lock for too long. There may also be a potential that
schedule() may be called.

Cheers,
Longman


2023-07-10 16:13:56

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH] cgroup/cpuset: update parent subparts cpumask while holding css refcnt

On Mon, Jul 10, 2023 at 11:40:36AM -0400, Waiman Long <[email protected]> wrote:
> I believe the primary reason is because update_parent_subparts_cpumask() can
> potential run for quite a while. So we don't want to hold the rcu_read_lock
> for too long.

But holding cpuset_mutex is even worse than rcu_read_lock()? IOW is the
relieve with this reason worth it?

> There may also be a potential that schedule() may be called.

Do you mean the spinlocks with PREEMPT_RT or anything else? (That seems
like the actual reason IIUC.)

Thanks,
Michal


Attachments:
(No filename) (561.00 B)
signature.asc (235.00 B)
Download all attachments

2023-07-11 03:00:41

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH] cgroup/cpuset: update parent subparts cpumask while holding css refcnt

On 2023/7/10 23:40, Waiman Long wrote:
> On 7/10/23 11:11, Michal Koutný wrote:
>> Hello.
>>
>> On Sat, Jul 01, 2023 at 02:50:49PM +0800, Miaohe Lin <[email protected]> wrote:
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -1806,9 +1806,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
>>>           cpuset_for_each_child(cp, css, parent)
>>>               if (is_partition_valid(cp) &&
>>>                   cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) {
>>> +                if (!css_tryget_online(&cp->css))
>>> +                    continue;
>>>                   rcu_read_unlock();
>>>                   update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp);
>>>                   rcu_read_lock();
>>> +                css_put(&cp->css);
>> Apologies for a possibly noob question -- why is RCU read lock
>> temporarily dropped within the loop?
>> (Is it only because of callback_lock or cgroup_file_kn_lock (via
>> notify_partition_change()) on PREEMPT_RT?)
>>
>>
>>
>> [
>> OT question:
>>     cpuset_for_each_child(cp, css, parent)                (1)
>>         if (is_partition_valid(cp) &&
>>             cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) {
>>             if (!css_tryget_online(&cp->css))
>>                 continue;
>>             rcu_read_unlock();
>>             update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp);
>>               ...
>>               update_tasks_cpumask(cp->parent)
>>                 ...
>>                 css_task_iter_start(&cp->parent->css, 0, &it);    (2)
>>                   ...
>>             rcu_read_lock();
>>             css_put(&cp->css);
>>         }
>>
>> May this touch each task same number of times as its depth within
>> herarchy?
>
> I believe the primary reason is because update_parent_subparts_cpumask() can potential run for quite a while. So we don't want to hold the rcu_read_lock for too long. There may also be a potential that schedule() may be called.

IMHO, the reason should be as same as the below commit:

commit 2bdfd2825c9662463371e6691b1a794e97fa36b4
Author: Waiman Long <[email protected]>
Date: Wed Feb 2 22:31:03 2022 -0500

cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning

It was found that a "suspicious RCU usage" lockdep warning was issued
with the rcu_read_lock() call in update_sibling_cpumasks(). It is
because the update_cpumasks_hier() function may sleep. So we have
to release the RCU lock, call update_cpumasks_hier() and reacquire
it afterward.

Also add a percpu_rwsem_assert_held() in update_sibling_cpumasks()
instead of stating that in the comment.

Thanks both.


2023-07-11 12:11:23

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH] cgroup/cpuset: update parent subparts cpumask while holding css refcnt

On Tue, Jul 11, 2023 at 10:52:02AM +0800, Miaohe Lin <[email protected]> wrote:
> commit 2bdfd2825c9662463371e6691b1a794e97fa36b4
> Author: Waiman Long <[email protected]>
> Date: Wed Feb 2 22:31:03 2022 -0500
>
> cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning

Aha, thanks for the pointer.

I've also found a paragraph in [1]:
> In addition, the -rt patchset turns spinlocks into a sleeping locks so
> that the corresponding critical sections can be preempted, which also
> means that these sleeplockified spinlocks (but not other sleeping
> locks!) may be acquire within -rt-Linux-kernel RCU read-side critical
> sections.

That suggests (together with practical use) that dicussed spinlocks
should be fine in RCU read section. And the possible reason is deeper in
generate_sched_domains() that do kmalloc(..., GFP_KERNEL).

Alas update_cpumask_hier() still calls generate_sched_domains(), OTOH,
update_parent_subparts_cpumask() doesn't seem so.

The idea to not relieve rcu_read_lock() in update_cpumask() iteration
(instead of the technically unneeded refcnt bump) would have to be
verified with CONFIG_PROVE_RCU && CONFIG_LOCKDEP. WDYT?

Michal

[1] https://www.kernel.org/doc/html/latest/RCU/Design/Requirements/Requirements.html?highlight=rcu+read+section#specialization


Attachments:
(No filename) (1.30 kB)
signature.asc (235.00 B)
Download all attachments

2023-07-12 02:42:01

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH] cgroup/cpuset: update parent subparts cpumask while holding css refcnt

On 2023/7/11 19:52, Michal Koutn? wrote:
> On Tue, Jul 11, 2023 at 10:52:02AM +0800, Miaohe Lin <[email protected]> wrote:
>> commit 2bdfd2825c9662463371e6691b1a794e97fa36b4
>> Author: Waiman Long <[email protected]>
>> Date: Wed Feb 2 22:31:03 2022 -0500
>>
>> cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning
>
> Aha, thanks for the pointer.
>
> I've also found a paragraph in [1]:
>> In addition, the -rt patchset turns spinlocks into a sleeping locks so
>> that the corresponding critical sections can be preempted, which also
>> means that these sleeplockified spinlocks (but not other sleeping
>> locks!) may be acquire within -rt-Linux-kernel RCU read-side critical
>> sections.
>
> That suggests (together with practical use) that dicussed spinlocks
> should be fine in RCU read section. And the possible reason is deeper in
> generate_sched_domains() that do kmalloc(..., GFP_KERNEL).

update_parent_subparts_cpumask() would call update_flag() that do kmemdup(..., GFP_KERNEL)?

>
> Alas update_cpumask_hier() still calls generate_sched_domains(), OTOH,
> update_parent_subparts_cpumask() doesn't seem so.

It seems update_parent_subparts_cpumask() doesn't call generate_sched_domains().

>
> The idea to not relieve rcu_read_lock() in update_cpumask() iteration
> (instead of the technically unneeded refcnt bump) would have to be
> verified with CONFIG_PROVE_RCU && CONFIG_LOCKDEP. WDYT?

The idea to relieve rcu_read_lock() in update_cpumask() iteration was initially introduced
via the below commit:

commit d7c8142d5a5534c3c7de214e35a40a493a32b98e
Author: Waiman Long <[email protected]>
Date: Thu Sep 1 16:57:43 2022 -0400

cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule

Currently, changes in "cpust.cpus" of a partition root is not allowed if
it violates the sibling cpu exclusivity rule when the check is done
in the validate_change() function. That is inconsistent with the
other cpuset changes that are always allowed but may make a partition
invalid.

Update the cpuset code to allow cpumask change even if it violates the
sibling cpu exclusivity rule, but invalidate the partition instead
just like the other changes. However, other sibling partitions with
conflicting cpumask will also be invalidated in order to not violating
the exclusivity rule. This behavior is specific to this partition
rule violation.

Note that a previous commit has made sibling cpu exclusivity rule check
the last check of validate_change(). So if -EINVAL is returned, we can
be sure that sibling cpu exclusivity rule violation is the only rule
that is broken.

It would be really helpful if @Waiman can figure this out.

Thanks both.