LinuxLists.cc - BUG: HANG_DETECT waiting for migration_cpu

2022-09-05 03:55:57

Subject: BUG: HANG_DETECT waiting for migration_cpu_stop() complete

Hi,

We meet the HANG_DETECT happened in T SW version with kernel-5.15.
Many tasks have been blocked for a long time.

Root cause:
migration_cpu_stop() is not complete due to is_migration_disabled(p) is
true, complete is false and complete_all() never get executed.
It let other task wait the rwsem.

Detail:
system_server waiting for cgroup_threadgroup_rwsem.
OomAdjuster is holding the cgroup_threadgroup_rwsem and waiting for
cpuset_rwsem.
cpuset_hotplug_workfn is holding the cpuset_rwsem and waiting for
affine_move_task() complete.
affine_move_task() waiting for migration_cpu_stop() complete.

The backtrace of system_server:
__switch_to
__schedule
schedule
percpu_rwsem_wait
__percpu_down_read
cgroup_css_set_fork => wait for cgroup_threadgroup_rwsem
cgroup_can_fork
copy_process
kernel_clone

The backtrace of OomAdjuster:
__switch_to
__schedule
schedule
percpu_rwsem_wait
percpu_down_write
cpuset_can_attach => wait for cpuset_rwsem
cgroup_migrate_execute
cgroup_attach_task
__cgroup1_procs_write => hold cgroup_threadgroup_rwsem
cgroup1_procs_write
cgroup_file_write
kernfs_fop_write_iter
vfs_write
ksys_write

The backtrace of cpuset_hotplug_workfn:
__switch_to
__schedule
schedule
schedule_timeout
wait_for_common
affine_move_task => wait for complete
__set_cpus_allowed_ptr_locked
update_tasks_cpumask
cpuset_hotplug_update_tasks => hold cpuset_rwsem
cpuset_hotplug_workfn
process_one_work
worker_thread
kthread

In affine_move_task() will call migration_cpu_stop() and wait for it
complete.
In normal case, if migration_cpu_stop() complete it will inform
everyone that he is done.
But there is an exception case that will not notify.
If is_migration_disabled(p) is true and complete will always is false,
then complete_all() never get executed.

static int migration_cpu_stop(void *data)
{
...
bool complete = false;
...

if (task_rq(p) == rq) {
if (is_migration_disabled(p))
goto out; => is_migration_disabled(p) = true,
so complete = false.
...
}
...

out:
...
if (complete) => complete = false,
so complete_all() never get executed.
complete_all(&pending->done);

return 0;
}

Review the code, we found that there are many places can change
is_migration_disabled() value.
(such as: __rt_spin_lock(), rt_read_lock(), rt_write_lock(), ...)

Do you have any suggestion for this issue?
Thank you.

Best regards,
Jing-Ting Wu

2022-09-05 06:57:15

by Mukesh Ojha

[permalink] [raw]

Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete

This is fixed by this.

https://lore.kernel.org/lkml/[email protected]/

-Mukesh

On 9/5/2022 8:17 AM, Jing-Ting Wu wrote:
> Hi,
>
> We meet the HANG_DETECT happened in T SW version with kernel-5.15.
> Many tasks have been blocked for a long time.
>
>
> Root cause:
> migration_cpu_stop() is not complete due to is_migration_disabled(p) is
> true, complete is false and complete_all() never get executed.
> It let other task wait the rwsem.
>
> Detail:
> system_server waiting for cgroup_threadgroup_rwsem.
> OomAdjuster is holding the cgroup_threadgroup_rwsem and waiting for
> cpuset_rwsem.
> cpuset_hotplug_workfn is holding the cpuset_rwsem and waiting for
> affine_move_task() complete.
> affine_move_task() waiting for migration_cpu_stop() complete.
>
> The backtrace of system_server:
> __switch_to
> __schedule
> schedule
> percpu_rwsem_wait
> __percpu_down_read
> cgroup_css_set_fork => wait for cgroup_threadgroup_rwsem
> cgroup_can_fork
> copy_process
> kernel_clone
>
> The backtrace of OomAdjuster:
> __switch_to
> __schedule
> schedule
> percpu_rwsem_wait
> percpu_down_write
> cpuset_can_attach => wait for cpuset_rwsem
> cgroup_migrate_execute
> cgroup_attach_task
> __cgroup1_procs_write => hold cgroup_threadgroup_rwsem
> cgroup1_procs_write
> cgroup_file_write
> kernfs_fop_write_iter
> vfs_write
> ksys_write
>
> The backtrace of cpuset_hotplug_workfn:
> __switch_to
> __schedule
> schedule
> schedule_timeout
> wait_for_common
> affine_move_task => wait for complete
> __set_cpus_allowed_ptr_locked
> update_tasks_cpumask
> cpuset_hotplug_update_tasks => hold cpuset_rwsem
> cpuset_hotplug_workfn
> process_one_work
> worker_thread
> kthread
>
>
> In affine_move_task() will call migration_cpu_stop() and wait for it
> complete.
> In normal case, if migration_cpu_stop() complete it will inform
> everyone that he is done.
> But there is an exception case that will not notify.
> If is_migration_disabled(p) is true and complete will always is false,
> then complete_all() never get executed.
>
> static int migration_cpu_stop(void *data)
> {
> ...
> bool complete = false;
> ...
>
> if (task_rq(p) == rq) {
> if (is_migration_disabled(p))
> goto out; => is_migration_disabled(p) = true,
> so complete = false.
> ...
> }
> ...
>
> out:
> ...
> if (complete) => complete = false,
> so complete_all() never get executed.
> complete_all(&pending->done);
>
> return 0;
> }
>
>
> Review the code, we found that there are many places can change
> is_migration_disabled() value.
> (such as: __rt_spin_lock(), rt_read_lock(), rt_write_lock(), ...)
>
> Do you have any suggestion for this issue?
> Thank you.
>
>
>
>
> Best regards,
> Jing-Ting Wu
>
>

2022-09-05 09:52:40

by Jing-Ting Wu

[permalink] [raw]

Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete

Hi, Mukesh

https://lore.kernel.org/lkml/[email protected]/ is for
fix cgroup_threadgroup_rwsem <-> cpus_read_lock() deadlock.
But this issue is cgroup_threadgroup_rwsem <-> cpuset_rwsem deadlock.

I think they are not same issue.
Do the patch is useful for this issue?

Best regards,
Jing-Ting Wu

On Mon, 2022-09-05 at 12:14 +0530, Mukesh Ojha wrote:
> This is fixed by this.
>
> https://lore.kernel.org/lkml/[email protected]/
>
> -Mukesh
>
> On 9/5/2022 8:17 AM, Jing-Ting Wu wrote:
> > Hi,
> >
> > We meet the HANG_DETECT happened in T SW version with kernel-5.15.
> > Many tasks have been blocked for a long time.
> >
> >
> > Root cause:
> > migration_cpu_stop() is not complete due to
> > is_migration_disabled(p) is
> > true, complete is false and complete_all() never get executed.
> > It let other task wait the rwsem.
> >
> > Detail:
> > system_server waiting for cgroup_threadgroup_rwsem.
> > OomAdjuster is holding the cgroup_threadgroup_rwsem and waiting for
> > cpuset_rwsem.
> > cpuset_hotplug_workfn is holding the cpuset_rwsem and waiting for
> > affine_move_task() complete.
> > affine_move_task() waiting for migration_cpu_stop() complete.
> >
> > The backtrace of system_server:
> > __switch_to
> > __schedule
> > schedule
> > percpu_rwsem_wait
> > __percpu_down_read
> > cgroup_css_set_fork => wait for cgroup_threadgroup_rwsem
> > cgroup_can_fork
> > copy_process
> > kernel_clone
> >
> > The backtrace of OomAdjuster:
> > __switch_to
> > __schedule
> > schedule
> > percpu_rwsem_wait
> > percpu_down_write
> > cpuset_can_attach => wait for cpuset_rwsem
> > cgroup_migrate_execute
> > cgroup_attach_task
> > __cgroup1_procs_write => hold cgroup_threadgroup_rwsem
> > cgroup1_procs_write
> > cgroup_file_write
> > kernfs_fop_write_iter
> > vfs_write
> > ksys_write
> >
> > The backtrace of cpuset_hotplug_workfn:
> > __switch_to
> > __schedule
> > schedule
> > schedule_timeout
> > wait_for_common
> > affine_move_task => wait for complete
> > __set_cpus_allowed_ptr_locked
> > update_tasks_cpumask
> > cpuset_hotplug_update_tasks => hold cpuset_rwsem
> > cpuset_hotplug_workfn
> > process_one_work
> > worker_thread
> > kthread
> >
> >
> > In affine_move_task() will call migration_cpu_stop() and wait for
> > it
> > complete.
> > In normal case, if migration_cpu_stop() complete it will inform
> > everyone that he is done.
> > But there is an exception case that will not notify.
> > If is_migration_disabled(p) is true and complete will always is
> > false,
> > then complete_all() never get executed.
> >
> > static int migration_cpu_stop(void *data)
> > {
> > ...
> > bool complete = false;
> > ...
> >
> > if (task_rq(p) == rq) {
> > if (is_migration_disabled(p))
> > goto out; => is_migration_disabled(p) = true,
> > so complete = false.
> > ...
> > }
> > ...
> >
> > out:
> > ...
> > if (complete) => complete = false,
> > so complete_all() never get executed.
> > complete_all(&pending->done);
> >
> > return 0;
> > }
> >
> >
> > Review the code, we found that there are many places can change
> > is_migration_disabled() value.
> > (such as: __rt_spin_lock(), rt_read_lock(), rt_write_lock(), ...)
> >
> > Do you have any suggestion for this issue?
> > Thank you.
> >
> >
> >
> >
> > Best regards,
> > Jing-Ting Wu
> >
> >

2022-09-06 18:38:06

by Tejun Heo

[permalink] [raw]

Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete

Hello,

(cc'ing Waiman in case he has a better idea)

On Mon, Sep 05, 2022 at 04:22:29PM +0800, Jing-Ting Wu wrote:
> https://lore.kernel.org/lkml/[email protected]/ is for
> fix cgroup_threadgroup_rwsem <-> cpus_read_lock() deadlock.
> But this issue is cgroup_threadgroup_rwsem <-> cpuset_rwsem deadlock.

If I'm understanding what you're writing correctly, this isn't a deadlock.
The cpuset_hotplug_workfn simply isn't being woken up while holding
cpuset_rwsem and others are just waiting for that lock to be released.

Thanks.

--
tejun

2022-09-06 21:03:50

by Waiman Long

[permalink] [raw]

Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete

On 9/6/22 16:01, Waiman Long wrote:
> On 9/6/22 14:30, Tejun Heo wrote:
>> Hello,
>>
>> (cc'ing Waiman in case he has a better idea)
>>
>> On Mon, Sep 05, 2022 at 04:22:29PM +0800, Jing-Ting Wu wrote:
>>> https://lore.kernel.org/lkml/[email protected]/ is for
>>> fix cgroup_threadgroup_rwsem <-> cpus_read_lock() deadlock.
>>> But this issue is cgroup_threadgroup_rwsem <-> cpuset_rwsem deadlock.
>> If I'm understanding what you're writing correctly, this isn't a
>> deadlock.
>> The cpuset_hotplug_workfn simply isn't being woken up while holding
>> cpuset_rwsem and others are just waiting for that lock to be released.
>
> I believe it is probably a bug in the scheduler core code.
> __set_cpus_allowed_ptr_locked() calls affine_move_task() to move to a
> random cpu within the new set allowable CPUs. However, if migration is
> disabled, it shouldn't call affine_move_task() at all. Instead, I
> would suggest that if the current cpu is within the new allowable
> cpus, it should just skip doing affine_move_task(). Otherwise, it
> should fail __set_cpus_allowed_ptr_locked().

Maybe like

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 838623b68031..5d9ea1553ec0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct
task_struct *p,
                if (cpumask_equal(&p->cpus_mask, new_mask))
                        goto out;

-               if (WARN_ON_ONCE(p == current &&
-                                is_migration_disabled(p) &&
-                                !cpumask_test_cpu(task_cpu(p),
new_mask))) {
+               if (is_migration_disabled(p) &&
+                   !cpumask_test_cpu(task_cpu(p), new_mask)) {
+                       WARN_ON_ONCE(p == current);
                        ret = -EBUSY;
                        goto out;
                }
@@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct
task_struct *p,
        if (flags & SCA_USER)
                user_mask = clear_user_cpus_ptr(p);

-       ret = affine_move_task(rq, p, rf, dest_cpu, flags);
+       if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
+               ret = affine_move_task(rq, p, rf, dest_cpu, flags);
+       } else {
+               task_rq_unlock(rq, p, rf);
+       }

        kfree(user_mask);

I haven't tested it myself, though.

Cheers,
Longman

2022-09-06 21:27:21

by Waiman Long

[permalink] [raw]

Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete

On 9/6/22 14:30, Tejun Heo wrote:
> Hello,
>
> (cc'ing Waiman in case he has a better idea)
>
> On Mon, Sep 05, 2022 at 04:22:29PM +0800, Jing-Ting Wu wrote:
>> https://lore.kernel.org/lkml/[email protected]/ is for
>> fix cgroup_threadgroup_rwsem <-> cpus_read_lock() deadlock.
>> But this issue is cgroup_threadgroup_rwsem <-> cpuset_rwsem deadlock.
> If I'm understanding what you're writing correctly, this isn't a deadlock.
> The cpuset_hotplug_workfn simply isn't being woken up while holding
> cpuset_rwsem and others are just waiting for that lock to be released.

I believe it is probably a bug in the scheduler core code.
__set_cpus_allowed_ptr_locked() calls affine_move_task() to move to a
random cpu within the new set allowable CPUs. However, if migration is
disabled, it shouldn't call affine_move_task() at all. Instead, I would
suggest that if the current cpu is within the new allowable cpus, it
should just skip doing affine_move_task(). Otherwise, it should fail
__set_cpus_allowed_ptr_locked().

My 2 cents.

Cheers,
Longman

2022-09-06 21:31:03

by Waiman Long

[permalink] [raw]

Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete

On 9/6/22 16:50, Peter Zijlstra wrote:
> On Tue, Sep 06, 2022 at 04:40:03PM -0400, Waiman Long wrote:
>
> I've not followed the earlier stuff due to being unreadable; just
> reacting to this..
>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 838623b68031..5d9ea1553ec0 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct
>> task_struct *p,
>>                 if (cpumask_equal(&p->cpus_mask, new_mask))
>>                         goto out;
>>
>> -               if (WARN_ON_ONCE(p == current &&
>> -                                is_migration_disabled(p) &&
>> -                                !cpumask_test_cpu(task_cpu(p), new_mask)))
>> {
>> +               if (is_migration_disabled(p) &&
>> +                   !cpumask_test_cpu(task_cpu(p), new_mask)) {
>> +                       WARN_ON_ONCE(p == current);
>>                         ret = -EBUSY;
>>                         goto out;
>>                 }
>> @@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct
>> task_struct *p,
>>         if (flags & SCA_USER)
>>                 user_mask = clear_user_cpus_ptr(p);
>>
>> -       ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>> +       if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
>> +               ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>> +       } else {
>> +               task_rq_unlock(rq, p, rf);
>> +       }
> This cannot be right. There might be previous set_cpus_allowed_ptr()
> callers that are blocked and waiting for the task to land on a valid
> CPU.

You are probably right. I haven't fully understand all the migration
disable code yet. However, if migration is disabled, there are some
corner cases we need to handle properly.

Cheers,
Longman

2022-09-06 21:43:13

by Peter Zijlstra

[permalink] [raw]

Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete

On Tue, Sep 06, 2022 at 04:40:03PM -0400, Waiman Long wrote:

I've not followed the earlier stuff due to being unreadable; just
reacting to this..

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 838623b68031..5d9ea1553ec0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct
> task_struct *p,
> ??????????????? if (cpumask_equal(&p->cpus_mask, new_mask))
> ??????????????????????? goto out;
>
> -?????????????? if (WARN_ON_ONCE(p == current &&
> -??????????????????????????????? is_migration_disabled(p) &&
> -??????????????????????????????? !cpumask_test_cpu(task_cpu(p), new_mask)))
> {
> +?????????????? if (is_migration_disabled(p) &&
> +?????????????????? !cpumask_test_cpu(task_cpu(p), new_mask)) {
> +?????????????????????? WARN_ON_ONCE(p == current);
> ??????????????????????? ret = -EBUSY;
> ??????????????????????? goto out;
> ??????????????? }
> @@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct
> task_struct *p,
> ??????? if (flags & SCA_USER)
> ??????????????? user_mask = clear_user_cpus_ptr(p);
>
> -?????? ret = affine_move_task(rq, p, rf, dest_cpu, flags);
> +?????? if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
> +?????????????? ret = affine_move_task(rq, p, rf, dest_cpu, flags);
> +?????? } else {
> +?????????????? task_rq_unlock(rq, p, rf);
> +?????? }

This cannot be right. There might be previous set_cpus_allowed_ptr()
callers that are blocked and waiting for the task to land on a valid
CPU.

2022-09-22 05:46:59

by Jing-Ting Wu

[permalink] [raw]

Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete

On Wed, 2022-09-07 at 08:07 +0800, Hillf Danton wrote:
> On 5 Sep 2022 10:47:36 +0800 Jing-Ting Wu <[email protected]>
> wrote
> >
> > We meet the HANG_DETECT happened in T SW version with kernel-5.15.
> > Many tasks have been blocked for a long time.
> >
> > Root cause:
> > migration_cpu_stop() is not complete due to
> > is_migration_disabled(p) is
> > true, complete is false and complete_all() never get executed.
> > It let other task wait the rwsem.
>
> See if handing task over to stopper again in case of migration
> disabled
> could survive your tests.
>
> Hillf
>
> --- linux-5.15/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2322,9 +2322,7 @@ static int migration_cpu_stop(void *data
> * holding rq->lock, if p->on_rq == 0 it cannot get enqueued
> because
> * we're holding p->pi_lock.
> */
> - if (task_rq(p) == rq) {
> - if (is_migration_disabled(p))
> - goto out;
> + if (task_rq(p) == rq && !is_migration_disabled(p)) {
>
> if (pending) {
> p->migration_pending = NULL;

Because Peter have some concern for patch by Waiman.
We add Hillf's patch to our stability test.
But there are side effects after patched.
The warning appear once < two weeks.

Backtrace as follows:
[name:panic&]WARNING: CPU: 6 PID: 32583 at affine_move_task
pc : affine_move_task
lr : __set_cpus_allowed_ptr_locked
Call trace:
affine_move_task
__set_cpus_allowed_ptr_locked
migrate_enable
__cgroup_bpf_run_filter_skb
ip_finish_output
ip_output

The root cause is when is_migration_disabled(p) is true，the patched
version will set p->migration_pending to NULL by migration_cpu_stop.
And in affine_move_task will raise a WARN_ON_ONCE(!pending).

Kernel-5.15/kernel/sched/core.c:
static int affine_move_task(struct rq *rq, struct task_struct *p,
struct rq_flags *rf, int dest_cpu, unsigned int flags) {
...
If (WARN_ON_ONCE(!pending)) {
Task_rq_unlock(rq,p,fr);
return -EINVAL;
}
...
}

But the tasks have not been migrated to the new affinity CPU, so there
should be pending tasks to be processed, so p->migration_pending should
not be NULL.

Without patch:
When is_migration_disabled is true, then goto out and not set p-
>migration_pending to NULL.

static int migration_cpu_stop(void *data) {
...
If (task_rq(p) == rq) {
if (is_migration_disabled(p))
goto out;
}
...
}

With patch:
When is_migration_disabled is true and pending is true, goto else if
flow. Because p->cpus_ptr not updated when migrate_disable, so this
condition is always true and p->migration_pending will set to NULL.

static int migration_cpu_stop(void *data) {
...
If (task_rq(p) == rq && !is_migration_disabled(p) ) {
...
} else if (pending) {
...
If (cpumask_test_cpu(task_cpu(p), p-> cpus_ ptr)) {
p->migration_pending = NULL;
complete = true;
goto out;
}
...
}

Best regards,
Jing-Ting Wu

2022-09-23 14:57:20

by Mukesh Ojha

[permalink] [raw]

Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete

Hi Peter,

On 9/7/2022 2:20 AM, Peter Zijlstra wrote:
> On Tue, Sep 06, 2022 at 04:40:03PM -0400, Waiman Long wrote:
>
> I've not followed the earlier stuff due to being unreadable; just
> reacting to this..

We are able to reproduce this issue explained at this link

https://lore.kernel.org/lkml/[email protected]/

>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 838623b68031..5d9ea1553ec0 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct
>> task_struct *p,
>>                 if (cpumask_equal(&p->cpus_mask, new_mask))
>>                         goto out;
>>
>> -               if (WARN_ON_ONCE(p == current &&
>> -                                is_migration_disabled(p) &&
>> -                                !cpumask_test_cpu(task_cpu(p), new_mask)))
>> {
>> +               if (is_migration_disabled(p) &&
>> +                   !cpumask_test_cpu(task_cpu(p), new_mask)) {
>> +                       WARN_ON_ONCE(p == current);
>>                         ret = -EBUSY;
>>                         goto out;
>>                 }
>> @@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct
>> task_struct *p,
>>         if (flags & SCA_USER)
>>                 user_mask = clear_user_cpus_ptr(p);
>>
>> -       ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>> +       if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
>> +               ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>> +       } else {
>> +               task_rq_unlock(rq, p, rf);
>> +       }
>
> This cannot be right. There might be previous set_cpus_allowed_ptr()
> callers that are blocked and waiting for the task to land on a valid
> CPU.
>

Was thinking if just skipping as below will help here, well i am not sure .

But thinking what if we keep the task as it is on the same cpu and let's
wait for migration to be enabled for the task to take care of it later.

------------------->O------------------------------------------

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d90d37c..7717733 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2390,8 +2390,10 @@ static int migration_cpu_stop(void *data)
* we're holding p->pi_lock.
*/
if (task_rq(p) == rq) {
- if (is_migration_disabled(p))
+ if (is_migration_disabled(p)) {
+ complete = true;
goto out;
+ }

if (pending) {

-Mukesh

2022-09-29 15:44:13

by Mukesh Ojha

[permalink] [raw]

Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete

Hi All,

On 9/23/2022 7:50 PM, Mukesh Ojha wrote:
> Hi Peter,
>
>
> On 9/7/2022 2:20 AM, Peter Zijlstra wrote:
>> On Tue, Sep 06, 2022 at 04:40:03PM -0400, Waiman Long wrote:
>>
>> I've not followed the earlier stuff due to being unreadable; just
>> reacting to this..
>
> We are able to reproduce this issue explained at this link
>
> https://lore.kernel.org/lkml/[email protected]/
>
>
>
>>
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 838623b68031..5d9ea1553ec0 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct
>>> task_struct *p,
>>>                  if (cpumask_equal(&p->cpus_mask, new_mask))
>>>                          goto out;
>>>
>>> -               if (WARN_ON_ONCE(p == current &&
>>> -                                is_migration_disabled(p) &&
>>> -                                !cpumask_test_cpu(task_cpu(p),
>>> new_mask)))
>>> {
>>> +               if (is_migration_disabled(p) &&
>>> +                   !cpumask_test_cpu(task_cpu(p), new_mask)) {
>>> +                       WARN_ON_ONCE(p == current);
>>>                          ret = -EBUSY;
>>>                          goto out;
>>>                  }
>>> @@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct
>>> task_struct *p,
>>>          if (flags & SCA_USER)
>>>                  user_mask = clear_user_cpus_ptr(p);
>>>
>>> -       ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>>> +       if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
>>> +               ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>>> +       } else {
>>> +               task_rq_unlock(rq, p, rf);
>>> +       }
>>
>> This cannot be right. There might be previous set_cpus_allowed_ptr()
>> callers that are blocked and waiting for the task to land on a valid
>> CPU.
>>
>
> Was thinking if just skipping as below will help here, well i am not sure .
>
> But thinking what if we keep the task as it is on the same cpu and let's
> wait for migration to be enabled for the task to take care of it later.
>
> ------------------->O------------------------------------------
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d90d37c..7717733 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2390,8 +2390,10 @@ static int migration_cpu_stop(void *data)
>          * we're holding p->pi_lock.
>          */
>         if (task_rq(p) == rq) {
> -               if (is_migration_disabled(p))
> +               if (is_migration_disabled(p)) {
> +                       complete = true;
>                         goto out;
> +               }
>
>                 if (pending) {
>

Any suggestion on this bug ?

-Mukesh