2015-05-15 15:51:39

by Peter Zijlstra

[permalink] [raw]
Subject: [RFC][PATCH 3/4] sched: Change sched_class::set_cpus_allowed calling context

Change the calling context of sched_class::set_cpus_allowed() such
that we can assume the task is inactive.

This allows us to easily make changes that affect accounting done by
enqueue/dequeue. This does in fact completely remove
set_cpus_allowed_rt and greatly reduces set_cpus_allowed_dl.


Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
kernel/sched/core.c | 23 +++++++++++++++++++++++
kernel/sched/deadline.c | 39 ++-------------------------------------
kernel/sched/rt.c | 45 +--------------------------------------------
3 files changed, 26 insertions(+), 81 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4803,8 +4803,31 @@ void set_cpus_allowed_common(struct task

void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
{
+ struct rq *rq = task_rq(p);
+ bool queued, running;
+
lockdep_assert_held(&p->pi_lock);
+
+ queued = task_on_rq_queued(p);
+ running = task_current(rq, p);
+
+ if (queued) {
+ /*
+ * Because __kthread_bind() calls this on blocked tasks without
+ * holding rq->lock.
+ */
+ lockdep_assert_held(&rq->lock);
+ dequeue_task(rq, p, 0);
+ }
+ if (running)
+ put_prev_task(rq, p);
+
p->sched_class->set_cpus_allowed(p, new_mask);
+
+ if (running)
+ p->sched_class->set_curr_task(rq);
+ if (queued)
+ enqueue_task(rq, p, 0);
}

/*
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1569,9 +1569,8 @@ static void task_woken_dl(struct rq *rq,
static void set_cpus_allowed_dl(struct task_struct *p,
const struct cpumask *new_mask)
{
- struct rq *rq;
struct root_domain *src_rd;
- int weight;
+ struct rq *rq;

BUG_ON(!dl_task(p));

@@ -1597,41 +1596,7 @@ static void set_cpus_allowed_dl(struct t
raw_spin_unlock(&src_dl_b->lock);
}

- weight = cpumask_weight(new_mask);
-
- /*
- * Only update if the process changes its state from whether it
- * can migrate or not.
- */
- if ((p->nr_cpus_allowed > 1) == (weight > 1))
- goto done;
-
- /*
- * Update only if the task is actually running (i.e.,
- * it is on the rq AND it is not throttled).
- */
- if (!on_dl_rq(&p->dl))
- goto done;
-
- /*
- * The process used to be able to migrate OR it can now migrate
- */
- if (weight <= 1) {
- if (!task_current(rq, p))
- dequeue_pushable_dl_task(rq, p);
- BUG_ON(!rq->dl.dl_nr_migratory);
- rq->dl.dl_nr_migratory--;
- } else {
- if (!task_current(rq, p))
- enqueue_pushable_dl_task(rq, p);
- rq->dl.dl_nr_migratory++;
- }
-
- update_dl_migration(&rq->dl);
-
-done:
- cpumask_copy(&p->cpus_allowed, new_mask);
- p->nr_cpus_allowed = weight;
+ set_cpus_allowed_common(p, new_mask);
}

/* Assumes rq->lock is held */
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2057,49 +2057,6 @@ static void task_woken_rt(struct rq *rq,
push_rt_tasks(rq);
}

-static void set_cpus_allowed_rt(struct task_struct *p,
- const struct cpumask *new_mask)
-{
- struct rq *rq;
- int weight;
-
- BUG_ON(!rt_task(p));
-
- weight = cpumask_weight(new_mask);
-
- /*
- * Only update if the process changes its state from whether it
- * can migrate or not.
- */
- if ((p->nr_cpus_allowed > 1) == (weight > 1))
- goto done;
-
- if (!task_on_rq_queued(p))
- goto done;
-
- rq = task_rq(p);
-
- /*
- * The process used to be able to migrate OR it can now migrate
- */
- if (weight <= 1) {
- if (!task_current(rq, p))
- dequeue_pushable_task(rq, p);
- BUG_ON(!rq->rt.rt_nr_migratory);
- rq->rt.rt_nr_migratory--;
- } else {
- if (!task_current(rq, p))
- enqueue_pushable_task(rq, p);
- rq->rt.rt_nr_migratory++;
- }
-
- update_rt_migration(&rq->rt);
-
-done:
- cpumask_copy(&p->cpus_allowed, new_mask);
- p->nr_cpus_allowed = weight;
-}
-
/* Assumes rq->lock is held */
static void rq_online_rt(struct rq *rq)
{
@@ -2313,7 +2270,7 @@ const struct sched_class rt_sched_class
#ifdef CONFIG_SMP
.select_task_rq = select_task_rq_rt,

- .set_cpus_allowed = set_cpus_allowed_rt,
+ .set_cpus_allowed = set_cpus_allowed_common,
.rq_online = rq_online_rt,
.rq_offline = rq_offline_rt,
.post_schedule = post_schedule_rt,


2015-05-18 08:32:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/4] sched: Change sched_class::set_cpus_allowed calling context

On Mon, May 18, 2015 at 03:37:43PM +0800, [email protected] wrote:
> Hi Peter,
>
> With this modification, I think the pushing action in my previous patch
> "Check to push the task away after its affinity was changed" will not
> be able to be implemented inside sched_class::set_cpus_allowed().

Ah, right, I knew there was a patch I needed to look at.

I'll try and not forget, but there's a few regression reports I need to
look at first.

2015-05-18 09:35:02

by Juri Lelli

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/4] sched: Change sched_class::set_cpus_allowed calling context

Hi Peter,

On 05/18/2015 09:32 AM, Peter Zijlstra wrote:
> On Mon, May 18, 2015 at 03:37:43PM +0800, [email protected] wrote:
>> Hi Peter,
>>
>> With this modification, I think the pushing action in my previous patch
>> "Check to push the task away after its affinity was changed" will not
>> be able to be implemented inside sched_class::set_cpus_allowed().
>
> Ah, right, I knew there was a patch I needed to look at.
>
> I'll try and not forget, but there's a few regression reports I need to
> look at first.
>

Apart from this (and the fact that I still have to look at Xunlei's patches too)
the changes seem ok with DL. Didn't test them yet though. I'll do it soon.

Best,

- Juri

2015-05-18 20:04:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/4] sched: Change sched_class::set_cpus_allowed calling context

On Mon, May 18, 2015 at 10:32:16AM +0200, Peter Zijlstra wrote:
> On Mon, May 18, 2015 at 03:37:43PM +0800, [email protected] wrote:
> > Hi Peter,
> >
> > With this modification, I think the pushing action in my previous patch
> > "Check to push the task away after its affinity was changed" will not
> > be able to be implemented inside sched_class::set_cpus_allowed().
>
> Ah, right, I knew there was a patch I needed to look at.

So basically you want to do:

+check_push:
+ if (weight > 1 &&
+ !task_running(rq, p) &&
+ !test_tsk_need_resched(rq->curr) &&
+ !cpumask_subset(new_mask, &p->cpus_allowed)) {
+ /* Update new affinity and try to push. */
+ cpumask_copy(&p->cpus_allowed, new_mask);
+ p->nr_cpus_allowed = weight;
+ push_rt_tasks(rq);
+ return true;
+ }

in set_cpus_allowed_rt(), which would not work because of us calling
put_prev_task(), which does enqueue_pushable_task() and would allow
pick_next_pushable_task() to select the current task, which would then
BUG_ON().

Note however that you already test for !task_running(), which precludes
that entire argument, because if @p is not running, we did not call
put_prev_task() etc..

So I think the above would still work; albeit it needs a comment on why
etc..

Subject: [tip:sched/core] sched: Change the sched_class::set_cpus_allowed( ) calling context

Commit-ID: 6c37067e27867db172b988cc11b9ff921175dee5
Gitweb: http://git.kernel.org/tip/6c37067e27867db172b988cc11b9ff921175dee5
Author: Peter Zijlstra <[email protected]>
AuthorDate: Fri, 15 May 2015 17:43:36 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 12 Aug 2015 12:06:10 +0200

sched: Change the sched_class::set_cpus_allowed() calling context

Change the calling context of sched_class::set_cpus_allowed() such
that we can assume the task is inactive.

This allows us to easily make changes that affect accounting done by
enqueue/dequeue. This does in fact completely remove
set_cpus_allowed_rt() and greatly reduces set_cpus_allowed_dl().

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/core.c | 23 +++++++++++++++++++++++
kernel/sched/deadline.c | 39 ++-------------------------------------
kernel/sched/rt.c | 45 +--------------------------------------------
3 files changed, 26 insertions(+), 81 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 740f90b..9917c96 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1163,8 +1163,31 @@ void set_cpus_allowed_common(struct task_struct *p, const struct cpumask *new_ma

void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
{
+ struct rq *rq = task_rq(p);
+ bool queued, running;
+
lockdep_assert_held(&p->pi_lock);
+
+ queued = task_on_rq_queued(p);
+ running = task_current(rq, p);
+
+ if (queued) {
+ /*
+ * Because __kthread_bind() calls this on blocked tasks without
+ * holding rq->lock.
+ */
+ lockdep_assert_held(&rq->lock);
+ dequeue_task(rq, p, 0);
+ }
+ if (running)
+ put_prev_task(rq, p);
+
p->sched_class->set_cpus_allowed(p, new_mask);
+
+ if (running)
+ p->sched_class->set_curr_task(rq);
+ if (queued)
+ enqueue_task(rq, p, 0);
}

/*
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index dc357fa..b473056 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1668,9 +1668,8 @@ static void task_woken_dl(struct rq *rq, struct task_struct *p)
static void set_cpus_allowed_dl(struct task_struct *p,
const struct cpumask *new_mask)
{
- struct rq *rq;
struct root_domain *src_rd;
- int weight;
+ struct rq *rq;

BUG_ON(!dl_task(p));

@@ -1696,41 +1695,7 @@ static void set_cpus_allowed_dl(struct task_struct *p,
raw_spin_unlock(&src_dl_b->lock);
}

- weight = cpumask_weight(new_mask);
-
- /*
- * Only update if the process changes its state from whether it
- * can migrate or not.
- */
- if ((p->nr_cpus_allowed > 1) == (weight > 1))
- goto done;
-
- /*
- * Update only if the task is actually running (i.e.,
- * it is on the rq AND it is not throttled).
- */
- if (!on_dl_rq(&p->dl))
- goto done;
-
- /*
- * The process used to be able to migrate OR it can now migrate
- */
- if (weight <= 1) {
- if (!task_current(rq, p))
- dequeue_pushable_dl_task(rq, p);
- BUG_ON(!rq->dl.dl_nr_migratory);
- rq->dl.dl_nr_migratory--;
- } else {
- if (!task_current(rq, p))
- enqueue_pushable_dl_task(rq, p);
- rq->dl.dl_nr_migratory++;
- }
-
- update_dl_migration(&rq->dl);
-
-done:
- cpumask_copy(&p->cpus_allowed, new_mask);
- p->nr_cpus_allowed = weight;
+ set_cpus_allowed_common(p, new_mask);
}

/* Assumes rq->lock is held */
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 63692ef..d2ea593 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2076,49 +2076,6 @@ static void task_woken_rt(struct rq *rq, struct task_struct *p)
push_rt_tasks(rq);
}

-static void set_cpus_allowed_rt(struct task_struct *p,
- const struct cpumask *new_mask)
-{
- struct rq *rq;
- int weight;
-
- BUG_ON(!rt_task(p));
-
- weight = cpumask_weight(new_mask);
-
- /*
- * Only update if the process changes its state from whether it
- * can migrate or not.
- */
- if ((p->nr_cpus_allowed > 1) == (weight > 1))
- goto done;
-
- if (!task_on_rq_queued(p))
- goto done;
-
- rq = task_rq(p);
-
- /*
- * The process used to be able to migrate OR it can now migrate
- */
- if (weight <= 1) {
- if (!task_current(rq, p))
- dequeue_pushable_task(rq, p);
- BUG_ON(!rq->rt.rt_nr_migratory);
- rq->rt.rt_nr_migratory--;
- } else {
- if (!task_current(rq, p))
- enqueue_pushable_task(rq, p);
- rq->rt.rt_nr_migratory++;
- }
-
- update_rt_migration(&rq->rt);
-
-done:
- cpumask_copy(&p->cpus_allowed, new_mask);
- p->nr_cpus_allowed = weight;
-}
-
/* Assumes rq->lock is held */
static void rq_online_rt(struct rq *rq)
{
@@ -2327,7 +2284,7 @@ const struct sched_class rt_sched_class = {
#ifdef CONFIG_SMP
.select_task_rq = select_task_rq_rt,

- .set_cpus_allowed = set_cpus_allowed_rt,
+ .set_cpus_allowed = set_cpus_allowed_common,
.rq_online = rq_online_rt,
.rq_offline = rq_offline_rt,
.task_woken = task_woken_rt,

2015-08-13 18:50:11

by Sasha Levin

[permalink] [raw]
Subject: Re: [tip:sched/core] sched: Change the sched_class::set_cpus_allowed( ) calling context

On 08/12/2015 08:38 AM, tip-bot for Peter Zijlstra wrote:
> Commit-ID: 6c37067e27867db172b988cc11b9ff921175dee5
> Gitweb: http://git.kernel.org/tip/6c37067e27867db172b988cc11b9ff921175dee5
> Author: Peter Zijlstra <[email protected]>
> AuthorDate: Fri, 15 May 2015 17:43:36 +0200
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Wed, 12 Aug 2015 12:06:10 +0200
>
> sched: Change the sched_class::set_cpus_allowed() calling context
>
> Change the calling context of sched_class::set_cpus_allowed() such
> that we can assume the task is inactive.
>
> This allows us to easily make changes that affect accounting done by
> enqueue/dequeue. This does in fact completely remove
> set_cpus_allowed_rt() and greatly reduces set_cpus_allowed_dl().
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Mike Galbraith <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>

Hey Peter,

This patch breaks boot inside my VM:

[79817.224383] bad: scheduling from the idle thread!
[79817.224900] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc6-next-20150813-sasha-00040-g059fd6d #2431
[79817.225709] ffffffffae0c6320 ffff880051b77c00 ffffffffade7f739 ffff8800a69e15c0
[79817.226419] ffff880051b77c18 ffffffffa43c4ed3 ffff880051be8000 ffff880051b77c48
[79817.227122] ffffffffa43a7ffb ffff880051be8000 ffff8800a69e15c0 ffffffffae094d40
[79817.227806] Call Trace:
[79817.228039] dump_stack (lib/dump_stack.c:52)
[79817.228503] dequeue_task_idle (kernel/sched/idle_task.c:45)
[79817.229005] dequeue_task (kernel/sched/core.c:839)
[79817.229485] do_set_cpus_allowed (kernel/sched/core.c:1178 (discriminator 9))
[79817.230006] init_idle (kernel/sched/core.c:4937)
[79817.230467] idle_thread_get (kernel/smpboot.c:35)
[79817.230945] ? cpu_up (kernel/cpu.c:569)
[79817.231374] _cpu_up (kernel/cpu.c:513)
[79817.231801] ? cpu_down (kernel/cpu.c:500)
[79817.232254] ? fork_idle (kernel/fork.c:1688)
[79817.232731] ? find_next_bit (lib/find_bit.c:65)
[79817.233216] ? idle_threads_init (include/linux/cpumask.h:189 kernel/smpboot.c:71)
[79817.233715] cpu_up (kernel/cpu.c:574)
[79817.234132] smp_init (kernel/smp.c:578)
[79817.234557] kernel_init_freeable (init/main.c:690 init/main.c:884 init/main.c:1009)
[79817.235095] ? start_kernel (init/main.c:979)
[79817.235575] ? finish_task_switch (kernel/sched/sched.h:1087 kernel/sched/core.c:2526)
[79817.236106] ? rest_init (init/main.c:934)
[79817.236567] kernel_init (init/main.c:939)
[79817.237020] ? rest_init (init/main.c:934)
[79817.237496] ret_from_fork (arch/x86/entry/entry_64.S:473)
[79817.237994] ? rest_init (init/main.c:934)
[79817.238477] BUG: unable to handle kernel NULL pointer dereference at (null)
[79817.239170] IP: [< (null)>] null) (??:?)
[79817.239608] PGD 0
[79817.239802] Oops: 0010 [#1] SMP DEBUG_PAGEALLOC KASAN
[79817.240295] Dumping ftrace buffer:
[79817.240593] (ftrace buffer empty)
[79817.240899] Modules linked in:
[79817.241180] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc6-next-20150813-sasha-00040-g059fd6d #2431
[79817.241957] task: ffff880278228000 ti: ffff880051b70000 task.ti: ffff880051b70000
[79817.242573] RIP: [< (null)>] null) (??:?)
[79817.243203] RSP: 0000:ffff880051b77c18 EFLAGS: 00010046
[79817.243646] RAX: dffffc0000000000 RBX: ffff880051be8000 RCX: 0000000000000004
[79817.244229] RDX: 0000000000000000 RSI: ffff880051be8000 RDI: ffff8800a69e15c0
[79817.244815] RBP: ffff880051b77c48 R08: 0000000000000001 R09: 0000000000000004
[79817.245404] R10: ffffed00ff815e03 R11: ffffed00ff815e01 R12: ffff8800a69e15c0
[79817.245987] R13: ffffffffae0c6320 R14: 0000000000000000 R15: 00004897e6254646
[79817.246577] FS: 0000000000000000(0000) GS:ffff880052600000(0000) knlGS:0000000000000000
[79817.247243] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[79817.247723] CR2: 0000000000000000 CR3: 0000000030a2d000 CR4: 00000000000006b0
[79817.248319] Stack:
[79817.248496] ffffffffa43a7d9b ffff880051be8000 ffff8800a69e15c0 ffffffffae0c6320
[79817.249176] ffff880051be8058 0000000000000001 ffff880051b77c88 ffffffffa43aab66
[79817.249850] ffff880051be8000 ffff8800a69e15c0 0000000000000001 0000000000000001
[79817.250521] Call Trace:
[79817.250735] ? enqueue_task (kernel/sched/core.c:832)
[79817.251211] do_set_cpus_allowed (kernel/sched/core.c:1189)
[79817.251720] init_idle (kernel/sched/core.c:4937)
[79817.252153] idle_thread_get (kernel/smpboot.c:35)
[79817.252623] ? cpu_up (kernel/cpu.c:569)
[79817.253045] _cpu_up (kernel/cpu.c:513)
[79817.253471] ? cpu_down (kernel/cpu.c:500)
[79817.253902] ? fork_idle (kernel/fork.c:1688)
[79817.254354] ? find_next_bit (lib/find_bit.c:65)
[79817.254820] ? idle_threads_init (include/linux/cpumask.h:189 kernel/smpboot.c:71)
[79817.255326] cpu_up (kernel/cpu.c:574)
[79817.255729] smp_init (kernel/smp.c:578)
[79817.256158] kernel_init_freeable (init/main.c:690 init/main.c:884 init/main.c:1009)
[79817.256675] ? start_kernel (init/main.c:979)
[79817.257158] ? finish_task_switch (kernel/sched/sched.h:1087 kernel/sched/core.c:2526)
[79817.257677] ? rest_init (init/main.c:934)
[79817.258143] kernel_init (init/main.c:939)
[79817.258604] ? rest_init (init/main.c:934)
[79817.259094] ret_from_fork (arch/x86/entry/entry_64.S:473)
[79817.259558] ? rest_init (init/main.c:934)
[79817.260021] Code: Bad RIP value.

Code starting with the faulting instruction
===========================================
[79817.260342] RIP [< (null)>] null) (??:?)
[79817.260791] RSP <ffff880051b77c18>
[79817.261098] CR2: 0000000000000000
[79817.261415] ---[ end trace eddc979a4104e4f3 ]---
[79817.261818] Kernel panic - not syncing: Fatal exception


Thanks,
Sasha

2015-08-13 20:37:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip:sched/core] sched: Change the sched_class::set_cpus_allowed( ) calling context

On Thu, Aug 13, 2015 at 02:47:20PM -0400, Sasha Levin wrote:
> [79817.224383] bad: scheduling from the idle thread!
> [79817.224900] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc6-next-20150813-sasha-00040-g059fd6d #2431
> [79817.225709] ffffffffae0c6320 ffff880051b77c00 ffffffffade7f739 ffff8800a69e15c0
> [79817.226419] ffff880051b77c18 ffffffffa43c4ed3 ffff880051be8000 ffff880051b77c48
> [79817.227122] ffffffffa43a7ffb ffff880051be8000 ffff8800a69e15c0 ffffffffae094d40
> [79817.227806] Call Trace:
> [79817.228039] dump_stack (lib/dump_stack.c:52)
> [79817.228503] dequeue_task_idle (kernel/sched/idle_task.c:45)
> [79817.229005] dequeue_task (kernel/sched/core.c:839)
> [79817.229485] do_set_cpus_allowed (kernel/sched/core.c:1178 (discriminator 9))
> [79817.230006] init_idle (kernel/sched/core.c:4937)
> [79817.230467] idle_thread_get (kernel/smpboot.c:35)

Dhurr.. so oddly enough my actual hardware boots without issue.

But I can see how that could go wrong, does the below make your
(virtual) machine happy again?

---
kernel/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 56aed8fce3cb..2d2871e2cf5d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4924,7 +4924,7 @@ void init_idle(struct task_struct *idle, int cpu)
idle->state = TASK_RUNNING;
idle->se.exec_start = sched_clock();

- do_set_cpus_allowed(idle, cpumask_of(cpu));
+ set_cpus_allowed_common(idle, cpumask_of(cpu));
/*
* We're having a chicken and egg problem, even though we are
* holding rq->lock, the cpu isn't yet set to this cpu so the

2015-08-13 20:59:41

by Sasha Levin

[permalink] [raw]
Subject: Re: [tip:sched/core] sched: Change the sched_class::set_cpus_allowed( ) calling context

On 08/13/2015 04:37 PM, Peter Zijlstra wrote:
> On Thu, Aug 13, 2015 at 02:47:20PM -0400, Sasha Levin wrote:
>> > [79817.224383] bad: scheduling from the idle thread!
>> > [79817.224900] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc6-next-20150813-sasha-00040-g059fd6d #2431
>> > [79817.225709] ffffffffae0c6320 ffff880051b77c00 ffffffffade7f739 ffff8800a69e15c0
>> > [79817.226419] ffff880051b77c18 ffffffffa43c4ed3 ffff880051be8000 ffff880051b77c48
>> > [79817.227122] ffffffffa43a7ffb ffff880051be8000 ffff8800a69e15c0 ffffffffae094d40
>> > [79817.227806] Call Trace:
>> > [79817.228039] dump_stack (lib/dump_stack.c:52)
>> > [79817.228503] dequeue_task_idle (kernel/sched/idle_task.c:45)
>> > [79817.229005] dequeue_task (kernel/sched/core.c:839)
>> > [79817.229485] do_set_cpus_allowed (kernel/sched/core.c:1178 (discriminator 9))
>> > [79817.230006] init_idle (kernel/sched/core.c:4937)
>> > [79817.230467] idle_thread_get (kernel/smpboot.c:35)
> Dhurr.. so oddly enough my actual hardware boots without issue.
>
> But I can see how that could go wrong, does the below make your
> (virtual) machine happy again?

Seems to work fine now, thanks!


Thanks,
Sasha

2015-08-13 21:30:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip:sched/core] sched: Change the sched_class::set_cpus_allowed( ) calling context

On Thu, Aug 13, 2015 at 04:59:01PM -0400, Sasha Levin wrote:

> Seems to work fine now, thanks!

I've no clue how this doesn't also explode on actual hardware, that code
is convoluted, read and weep.

Ingo, please stick somewhere appropriate :-)

---
Subject: sched: Avoid trying to dequeue/enqueue the idle thread
From: Peter Zijlstra <[email protected]>
Date: Thu Aug 13 23:09:29 CEST 2015

Sasha reports that his virtual machine tries to schedule the idle
thread since commit 6c37067e2786 ("sched: Change the
sched_class::set_cpus_allowed() calling context").

His trace shows this happening from idle_thread_get()->init_idle(),
which is the _second_ init_idle() invocation on that task_struct, the
first being done through idle_init()->fork_idle(). (this code is
insane...)

Because we call init_idle() twice in a row, its ->sched_class ==
&idle_sched_class and ->on_rq = TASK_ON_RQ_QUEUED. This means
do_set_cpus_allowed() thinks we're queued and will call dequeue_task(),
which is implemented with BUG() for the idle class, seeing how
dequeueing the idle task is a daft thing.

Aside of the whole insanity of calling init_idle() _twice_, change the
code to call set_cpus_allowed_common() instead as this is 'obviously'
before the idle task gets ran etc..

Fixes: 6c37067e2786 ("sched: Change the sched_class::set_cpus_allowed() calling context")
Reported-by: Sasha Levin <[email protected]>
Tested-by: Sasha Levin <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
kernel/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4924,7 +4924,7 @@ void init_idle(struct task_struct *idle,
idle->state = TASK_RUNNING;
idle->se.exec_start = sched_clock();

- do_set_cpus_allowed(idle, cpumask_of(cpu));
+ set_cpus_allowed_common(idle, cpumask_of(cpu));
/*
* We're having a chicken and egg problem, even though we are
* holding rq->lock, the cpu isn't yet set to this cpu so the