2015-02-24 09:28:40

by Juri Lelli

[permalink] [raw]
Subject: [PATCH 1/2] sched/deadline,core: fix bandwidth update when changing cpuset cpumask

When cpumask of an exclusive cpuset is changed the cpuset needs
to be destroyed and recreated with the new span. Since we keep
track of the current used deadline bandwidth in the root_domain(s)
associated with exclusive cpusets, the information is gone after
the rood_domain is destroyed.

Add two methods to save and restore such bandwidth across cpuset
reconfiguration.

Signed-off-by: Juri Lelli <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: [email protected]
---
kernel/sched/core.c | 3 +++
kernel/sched/deadline.c | 24 ++++++++++++++++++++++++
kernel/sched/sched.h | 4 ++++
3 files changed, 31 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 97fe79c..5827ff4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6923,6 +6923,7 @@ void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
{
int i, j, n;
int new_topology;
+ unsigned long long dl_bw = 0;

mutex_lock(&sched_domains_mutex);

@@ -6942,6 +6943,7 @@ void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
goto match1;
}
/* no match - a current sched domain not in new doms_new[] */
+ save_dl_bw_of_rd(doms_cur[i], &dl_bw);
detach_destroy_domains(doms_cur[i]);
match1:
;
@@ -6964,6 +6966,7 @@ match1:
}
/* no match - add a new doms_new */
build_sched_domains(doms_new[i], dattr_new ? dattr_new + i : NULL);
+ restore_dl_bw_of_rd(doms_new[i], dl_bw);
match2:
;
}
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 3fa8fa6..dbf12a9 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -69,6 +69,30 @@ void init_dl_bw(struct dl_bw *dl_b)
dl_b->total_bw = 0;
}

+void save_dl_bw_of_rd(const struct cpumask *cpu_map,
+ unsigned long long *dl_bw)
+{
+ int cpu;
+
+ cpu = cpumask_any(cpu_map);
+ rcu_read_lock_sched();
+ if (cpu < num_online_cpus())
+ *dl_bw = dl_bw_of(cpu)->total_bw;
+ rcu_read_unlock_sched();
+}
+
+void restore_dl_bw_of_rd(const struct cpumask *cpu_map,
+ unsigned long long dl_bw)
+{
+ int cpu;
+
+ cpu = cpumask_any(cpu_map);
+ rcu_read_lock_sched();
+ if (cpu < num_online_cpus())
+ dl_bw_of(cpu)->total_bw = dl_bw;
+ rcu_read_unlock_sched();
+}
+
void init_dl_rq(struct dl_rq *dl_rq, struct rq *rq)
{
dl_rq->rb_root = RB_ROOT;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index dc0f435..46d231a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1276,6 +1276,10 @@ extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime
extern struct dl_bandwidth def_dl_bandwidth;
extern void init_dl_bandwidth(struct dl_bandwidth *dl_b, u64 period, u64 runtime);
extern void init_dl_task_timer(struct sched_dl_entity *dl_se);
+void save_dl_bw_of_rd(const struct cpumask *cpu_map,
+ unsigned long long *dl_bw);
+void restore_dl_bw_of_rd(const struct cpumask *cpu_map,
+ unsigned long long dl_bw);

unsigned long to_ratio(u64 period, u64 runtime);

--
2.3.0


2015-02-24 09:28:56

by Juri Lelli

[permalink] [raw]
Subject: [PATCH 2/2] sched/deadline: always enqueue on previous rq when dl_task_timer fires

dl_task_timer() may fire on a different rq from where a task was removed
after throttling. Since the call path is:

dl_task_timer() ->
enqueue_task_dl() ->
enqueue_dl_entity() ->
replenish_dl_entity()

and replenish_dl_entity() uses dl_se's rq, we can't use current's rq
in dl_task_timer(), but we need to lock the task's previous one.

Signed-off-by: Juri Lelli <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: [email protected]
Fixes: 3960c8c0c789 ("sched: Make dl_task_time() use task_rq_lock()")
---
kernel/sched/deadline.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index dbf12a9..519e468 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -538,7 +538,7 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
unsigned long flags;
struct rq *rq;

- rq = task_rq_lock(current, &flags);
+ rq = task_rq_lock(p, &flags);

/*
* We need to take care of several possible races here:
@@ -593,7 +593,7 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
push_dl_task(rq);
#endif
unlock:
- task_rq_unlock(rq, current, &flags);
+ task_rq_unlock(rq, p, &flags);

return HRTIMER_NORESTART;
}
--
2.3.0

2015-02-26 01:01:34

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH 2/2] sched/deadline: always enqueue on previous rq when dl_task_timer fires

On Tue, Feb 24, 2015 at 09:28:35AM +0000, Juri Lelli wrote:
>dl_task_timer() may fire on a different rq from where a task was removed
>after throttling. Since the call path is:
>
> dl_task_timer() ->
> enqueue_task_dl() ->
> enqueue_dl_entity() ->
> replenish_dl_entity()
>
>and replenish_dl_entity() uses dl_se's rq, we can't use current's rq
>in dl_task_timer(), but we need to lock the task's previous one.
>
>Signed-off-by: Juri Lelli <[email protected]>

Tested-by: Wanpeng Li <[email protected]>

I see a panic when try to run a dl task and kill the task after several
seconds than retry the process several times, the bug is triggered by
commit 3960c8c0c789 ("sched: Make dl_task_time() use task_rq_lock()"),
Juri's patch fix it.

[ 313.352676] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 313.353483] IP: [<ffffffff8139ee28>] rb_erase+0x118/0x390
[ 313.354060] PGD b5ddb067 PUD b5d96067 PMD 0
[ 313.354501] Oops: 0002 [#1] SMP
[...]
[ 313.356633] Call Trace:
[ 313.356633] [<ffffffff810b2cb7>] dequeue_pushable_dl_task+0x47/0x80
[ 313.356633] [<ffffffff810b46ff>] pick_next_task_dl+0x7f/0x150
[ 313.356633] [<ffffffff8178f7b9>] __schedule+0x839/0x8cb
[ 313.356633] [<ffffffff8178f947>] schedule+0x37/0x90
[ 313.356633] [<ffffffff8178fbae>] schedule_preempt_disabled+0xe/0x10
[ 313.356633] [<ffffffff810b5b18>] cpu_startup_entry+0x168/0x380
[ 313.356633] [<ffffffff810eb2e3>] ? clockevents_register_device+0xe3/0x150
[ 313.356633] [<ffffffff810eba96>] ? clockevents_config_and_register+0x26/0x30
[ 313.356633] [<ffffffff8104a96c>] start_secondary+0x14c/0x170
[ 313.356633] Code: e2 fc 74 ab 48 89 c1 48 89 d0 48 8b 50 08 48 39 ca 74 48 f6 02 01 75 b3 48 8b 4a 10 48 89 c7 48 83 cf 01 48 89 48 08 48 89
42 10 <48> 89 39 48 8b 38 48 89 3a 48 83 e7 fc 48 89 10 0f 84 02 01 00
[ 313.356633] RIP [<ffffffff8139ee28>] rb_erase+0x118/0x390
[ 313.356633] RSP <ffff8800ba3efdc8>
[ 313.356633] CR2: 0000000000000000
[ 313.356633] ---[ end trace 5fbbfdbbc196604d ]---
[ 313.356633] Kernel panic - not syncing: Attempted to kill the idle task!
[ 313.356633] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)

>Cc: Ingo Molnar <[email protected]>
>Cc: Peter Zijlstra <[email protected]>
>Cc: Kirill Tkhai <[email protected]>
>Cc: Juri Lelli <[email protected]>
>Cc: [email protected]
>Fixes: 3960c8c0c789 ("sched: Make dl_task_time() use task_rq_lock()")
>---
> kernel/sched/deadline.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>index dbf12a9..519e468 100644
>--- a/kernel/sched/deadline.c
>+++ b/kernel/sched/deadline.c
>@@ -538,7 +538,7 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
> unsigned long flags;
> struct rq *rq;
>
>- rq = task_rq_lock(current, &flags);
>+ rq = task_rq_lock(p, &flags);
>
> /*
> * We need to take care of several possible races here:
>@@ -593,7 +593,7 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
> push_dl_task(rq);
> #endif
> unlock:
>- task_rq_unlock(rq, current, &flags);
>+ task_rq_unlock(rq, p, &flags);
>
> return HRTIMER_NORESTART;
> }
>--
>2.3.0
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/