2017-11-15 23:24:31

by Joe Korty

[permalink] [raw]
Subject: [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed and cpus_allowed

4.4.86-rt99's patch

0037-Intrduce-migrate_disable-cpu_light.patch

introduces a place where a task's cpus_allowed mask is
updated without a corresponding update to nr_cpus_allowed.

This path is executed when task affinity is changed while
migrate_disabled() is true. As there is no code present
to set nr_cpus_allowed when the migrate_disable state is
dropped, the scheduler at that point on may make incorrect
scheduling decisions for this task.

My testing consists of temporarily adding a

if (tsk_nr_cpus_allowed(p) == cpumask_weight(tsk_cpus_allowed(p))
printk_ratelimited(...)

stmt to schedule() and running a simple affinity rotation
program I wrote, one that rotates the threads of stress(1).
While rotating, I got the expected kernel error messages.
With this patch applied the messages disappeared.

Signed-off-by: Joe Korty <[email protected]>

Index: b/kernel/sched/core.c
===================================================================
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1220,6 +1220,7 @@ void do_set_cpus_allowed(struct task_str
lockdep_assert_held(&p->pi_lock);

if (__migrate_disabled(p)) {
+ p->nr_cpus_allowed = cpumask_weight(new_mask);
cpumask_copy(&p->cpus_allowed, new_mask);
return;
}

From 1585451998885630718@xxx Thu Nov 30 01:20:51 +0000 2017
X-GM-THRID: 1585436972156040679
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread