2019-09-15 03:33:14

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH] sched: fix migration to invalid cpu in __set_cpus_allowed_ptr

From: <[email protected]>

reason: migration to invalid cpu in __set_cpus_allowed_ptr
archive path: patches/euleros/sched

Oops occur when running qemu on arm64:
Unable to handle kernel paging request at virtual address ffff000008effe40
Internal error: Oops: 96000007 [#1] SMP
Process migration/0 (pid: 12, stack limit = 0x00000000084e3736)
pstate: 20000085 (nzCv daIf -PAN -UAO)
pc : __ll_sc___cmpxchg_case_acq_4+0x4/0x20
lr : move_queued_task.isra.21+0x124/0x298
...
Call trace:
__ll_sc___cmpxchg_case_acq_4+0x4/0x20
__migrate_task+0xc8/0xe0
migration_cpu_stop+0x170/0x180
cpu_stopper_thread+0xec/0x178
smpboot_thread_fn+0x1ac/0x1e8
kthread+0x134/0x138
ret_from_fork+0x10/0x18

__set_cpus_allowed_ptr will choose an active dest_cpu in affinity mask to migrage the process if process is not
currently running on any one of the CPUs specified in affinity mask.__set_cpus_allowed_ptr will choose an invalid
dest_cpu(>= nr_cpu_ids, 1024 in my virtual machine) if CPUS in affinity mask are deactived by cpu_down after
cpumask_intersects check.Cpumask_test_cpu of dest_cpu afterwards is overflow and may passes if corresponding bit
is coincidentally set.As a consequence, kernel will access a invalid rq address associate with the invalid cpu in
migration_cpu_stop->__migrate_task->move_queued_task and the Oops occurs. Process as follows may trigger the Oops:
1) A process repeatedly bind itself to cpu0 and cpu1 in turn by calling sched_setaffinity
2) A shell script repeatedly "echo 0 > /sys/devices/system/cpu/cpu1/online" and "echo 1 > /sys/devices/system/cpu/cpu1/online" in turn
3) Oops appears if the invalid cpu is set in memory after tested cpumask.

Change-Id: I9c2f95aecd3da568991b7408397215f26c990e40
Signed-off-by: <[email protected]>
---
kernel/sched/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4b63fef..5181ea9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1112,7 +1112,8 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
if (cpumask_equal(&p->cpus_allowed, new_mask))
goto out;

- if (!cpumask_intersects(new_mask, cpu_valid_mask)) {
+ dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);
+ if (dest_cpu >= nr_cpu_ids) {
ret = -EINVAL;
goto out;
}
@@ -1133,7 +1134,6 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
if (cpumask_test_cpu(task_cpu(p), new_mask))
goto out;

- dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);
if (task_running(rq, p) || p->state == TASK_WAKING) {
struct migration_arg arg = { p, dest_cpu };
/* Need help from migration thread: drop lock and wait. */
--
1.8.3.1



2019-09-15 19:27:29

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched: fix migration to invalid cpu in __set_cpus_allowed_ptr

On 15/09/2019 04:07, shikemeng wrote:
> From: <[email protected]>
>
> reason: migration to invalid cpu in __set_cpus_allowed_ptr
> archive path: patches/euleros/sched
>
> Oops occur when running qemu on arm64:
> Unable to handle kernel paging request at virtual address ffff000008effe40
> Internal error: Oops: 96000007 [#1] SMP
> Process migration/0 (pid: 12, stack limit = 0x00000000084e3736)
> pstate: 20000085 (nzCv daIf -PAN -UAO)
> pc : __ll_sc___cmpxchg_case_acq_4+0x4/0x20
> lr : move_queued_task.isra.21+0x124/0x298
> ...
> Call trace:
> __ll_sc___cmpxchg_case_acq_4+0x4/0x20
> __migrate_task+0xc8/0xe0
> migration_cpu_stop+0x170/0x180
> cpu_stopper_thread+0xec/0x178
> smpboot_thread_fn+0x1ac/0x1e8
> kthread+0x134/0x138
> ret_from_fork+0x10/0x18
>
> __set_cpus_allowed_ptr will choose an active dest_cpu in affinity mask to migrage the process if process is not
> currently running on any one of the CPUs specified in affinity mask.__set_cpus_allowed_ptr will choose an invalid
> dest_cpu(>= nr_cpu_ids, 1024 in my virtual machine) if CPUS in affinity mask are deactived by cpu_down after
> cpumask_intersects check.Cpumask_test_cpu of dest_cpu afterwards is overflow and may passes if corresponding bit
> is coincidentally set.As a consequence, kernel will access a invalid rq address associate with the invalid cpu in
> migration_cpu_stop->__migrate_task->move_queued_task and the Oops occurs. Process as follows may trigger the Oops:
> 1) A process repeatedly bind itself to cpu0 and cpu1 in turn by calling sched_setaffinity
> 2) A shell script repeatedly "echo 0 > /sys/devices/system/cpu/cpu1/online" and "echo 1 > /sys/devices/system/cpu/cpu1/online" in turn
> 3) Oops appears if the invalid cpu is set in memory after tested cpumask.
>
> Change-Id: I9c2f95aecd3da568991b7408397215f26c990e40
> Signed-off-by: <[email protected]>

The log still isn't wrapped to 75 chars, and the change-id still hasn't been
removed.

The subject should also mention that this is v2 of the patch, again this is
all in the process documentation.

The fix itself looks fine though, so once the log respects the rules:
Reviewed-by: Valentin Schneider <[email protected]>