2022-08-22 08:02:37

by Ankit Jain

[permalink] [raw]
Subject: [PATCH v4.19.y 0/4] sched/deadline: Fix panic due to nested priority inheritance

When a CFS task that was boosted by a SCHED_DEADLINE
task boosts another CFS task (nested priority inheritance),
Kernel panic is observed.
Fixing priority inheritance changes the way how sched_deadline
attributes are being inherited from original donor task.

Additional supporting patches are added to fix throttling of
boosted tasks.

Daniel Bristot de Oliveira (1):
sched/deadline: Unthrottle PI boosted threads while enqueuing

Lucas Stach (1):
sched/deadline: Fix stale throttling on de-/boosted tasks

Juri Lelli (1):
sched/deadline: Fix priority inheritance with multiple scheduling
classes

Hui Su (1):
kernel/sched: Remove dl_boosted flag comment

include/linux/sched.h | 13 ++--
kernel/sched/core.c | 11 ++--
kernel/sched/deadline.c | 131 +++++++++++++++++++++++++---------------
3 files changed, 96 insertions(+), 59 deletions(-)

--
2.34.1


2022-08-22 08:03:20

by Ankit Jain

[permalink] [raw]
Subject: [PATCH v4.19.y 1/4] sched/deadline: Unthrottle PI boosted threads while enqueuing

From: Daniel Bristot de Oliveira <[email protected]>

commit feff2e65efd8d84cf831668e182b2ce73c604bbb upstream.

stress-ng has a test (stress-ng --cyclic) that creates a set of threads
under SCHED_DEADLINE with the following parameters:

dl_runtime = 10000 (10 us)
dl_deadline = 100000 (100 us)
dl_period = 100000 (100 us)

These parameters are very aggressive. When using a system without HRTICK
set, these threads can easily execute longer than the dl_runtime because
the throttling happens with 1/HZ resolution.

During the main part of the test, the system works just fine because
the workload does not try to run over the 10 us. The problem happens at
the end of the test, on the exit() path. During exit(), the threads need
to do some cleanups that require real-time mutex locks, mainly those
related to memory management, resulting in this scenario:

Note: locks are rt_mutexes...
------------------------------------------------------------------------
TASK A: TASK B: TASK C:
activation
activation
activation

lock(a): OK! lock(b): OK!
<overrun runtime>
lock(a)
-> block (task A owns it)
-> self notice/set throttled
+--< -> arm replenished timer
| switch-out
| lock(b)
| -> <C prio > B prio>
| -> boost TASK B
| unlock(a) switch-out
| -> handle lock a to B
| -> wakeup(B)
| -> B is throttled:
| -> do not enqueue
| switch-out
|
|
+---------------------> replenishment timer
-> TASK B is boosted:
-> do not enqueue
------------------------------------------------------------------------

BOOM: TASK B is runnable but !enqueued, holding TASK C: the system
crashes with hung task C.

This problem is avoided by removing the throttle state from the boosted
thread while boosting it (by TASK A in the example above), allowing it to
be queued and run boosted.

The next replenishment will take care of the runtime overrun, pushing
the deadline further away. See the "while (dl_se->runtime <= 0)" on
replenish_dl_entity() for more information.

Reported-by: Mark Simmons <[email protected]>
Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Juri Lelli <[email protected]>
Tested-by: Mark Simmons <[email protected]>
Link: https://lkml.kernel.org/r/5076e003450835ec74e6fa5917d02c4fa41687e6.1600170294.git.bristot@redhat.com
[Ankit: Regenerated the patch for v4.19.y]
Signed-off-by: Ankit Jain <[email protected]>
---
kernel/sched/deadline.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index beec5081a55a..1f8444d9df9d 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1484,6 +1484,27 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
*/
if (pi_task && dl_prio(pi_task->normal_prio) && p->dl.dl_boosted) {
pi_se = &pi_task->dl;
+ /*
+ * Because of delays in the detection of the overrun of a
+ * thread's runtime, it might be the case that a thread
+ * goes to sleep in a rt mutex with negative runtime. As
+ * a consequence, the thread will be throttled.
+ *
+ * While waiting for the mutex, this thread can also be
+ * boosted via PI, resulting in a thread that is throttled
+ * and boosted at the same time.
+ *
+ * In this case, the boost overrides the throttle.
+ */
+ if (p->dl.dl_throttled) {
+ /*
+ * The replenish timer needs to be canceled. No
+ * problem if it fires concurrently: boosted threads
+ * are ignored in dl_task_timer().
+ */
+ hrtimer_try_to_cancel(&p->dl.dl_timer);
+ p->dl.dl_throttled = 0;
+ }
} else if (!dl_prio(p->normal_prio)) {
/*
* Special case in which we have a !SCHED_DEADLINE task
--
2.34.1

2022-08-22 08:03:42

by Ankit Jain

[permalink] [raw]
Subject: [PATCH v4.19.y 2/4] sched/deadline: Fix stale throttling on de-/boosted tasks

From: Lucas Stach <[email protected]>

commit 46fcc4b00c3cca8adb9b7c9afdd499f64e427135 upstream.

When a boosted task gets throttled, what normally happens is that it's
immediately enqueued again with ENQUEUE_REPLENISH, which replenishes the
runtime and clears the dl_throttled flag. There is a special case however:
if the throttling happened on sched-out and the task has been deboosted in
the meantime, the replenish is skipped as the task will return to its
normal scheduling class. This leaves the task with the dl_throttled flag
set.

Now if the task gets boosted up to the deadline scheduling class again
while it is sleeping, it's still in the throttled state. The normal wakeup
however will enqueue the task with ENQUEUE_REPLENISH not set, so we don't
actually place it on the rq. Thus we end up with a task that is runnable,
but not actually on the rq and neither a immediate replenishment happens,
nor is the replenishment timer set up, so the task is stuck in
forever-throttled limbo.

Clear the dl_throttled flag before dropping back to the normal scheduling
class to fix this issue.

Signed-off-by: Lucas Stach <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Juri Lelli <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
[Ankit: Regenerated the patch for v4.19.y]
Signed-off-by: Ankit Jain <[email protected]>
---
kernel/sched/deadline.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 1f8444d9df9d..29cd4c0a92c0 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1507,12 +1507,15 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
}
} else if (!dl_prio(p->normal_prio)) {
/*
- * Special case in which we have a !SCHED_DEADLINE task
- * that is going to be deboosted, but exceeds its
- * runtime while doing so. No point in replenishing
- * it, as it's going to return back to its original
- * scheduling class after this.
+ * Special case in which we have a !SCHED_DEADLINE task that is going
+ * to be deboosted, but exceeds its runtime while doing so. No point in
+ * replenishing it, as it's going to return back to its original
+ * scheduling class after this. If it has been throttled, we need to
+ * clear the flag, otherwise the task may wake up as throttled after
+ * being boosted again with no means to replenish the runtime and clear
+ * the throttle.
*/
+ p->dl.dl_throttled = 0;
BUG_ON(!p->dl.dl_boosted || flags != ENQUEUE_REPLENISH);
return;
}
--
2.34.1

2022-08-22 08:26:15

by Ankit Jain

[permalink] [raw]
Subject: [PATCH v4.19.y 4/4] kernel/sched: Remove dl_boosted flag comment

From: Hui Su <[email protected]>

commit 0e3872499de1a1230cef5221607d71aa09264bd5 upstream.

since commit 2279f540ea7d ("sched/deadline: Fix priority
inheritance with multiple scheduling classes"), we should not
keep it here.

Signed-off-by: Hui Su <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Daniel Bristot de Oliveira <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[Ankit: Regenerated the patch for v4.19.y]
Signed-off-by: Ankit Jain <[email protected]>
---
include/linux/sched.h | 4 ----
1 file changed, 4 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2ed820558da1..bc04745da6c1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -528,10 +528,6 @@ struct sched_dl_entity {
* task has to wait for a replenishment to be performed at the
* next firing of dl_timer.
*
- * @dl_boosted tells if we are boosted due to DI. If so we are
- * outside bandwidth enforcement mechanism (but only until we
- * exit the critical section);
- *
* @dl_yielded tells if task gave up the CPU before consuming
* all its available runtime during the last job.
*
--
2.34.1

2022-08-25 12:53:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v4.19.y 0/4] sched/deadline: Fix panic due to nested priority inheritance

On Mon, Aug 22, 2022 at 01:13:44PM +0530, Ankit Jain wrote:
> When a CFS task that was boosted by a SCHED_DEADLINE
> task boosts another CFS task (nested priority inheritance),
> Kernel panic is observed.
> Fixing priority inheritance changes the way how sched_deadline
> attributes are being inherited from original donor task.
>
> Additional supporting patches are added to fix throttling of
> boosted tasks.
>
> Daniel Bristot de Oliveira (1):
> sched/deadline: Unthrottle PI boosted threads while enqueuing
>
> Lucas Stach (1):
> sched/deadline: Fix stale throttling on de-/boosted tasks
>
> Juri Lelli (1):
> sched/deadline: Fix priority inheritance with multiple scheduling
> classes
>
> Hui Su (1):
> kernel/sched: Remove dl_boosted flag comment
>
> include/linux/sched.h | 13 ++--
> kernel/sched/core.c | 11 ++--
> kernel/sched/deadline.c | 131 +++++++++++++++++++++++++---------------
> 3 files changed, 96 insertions(+), 59 deletions(-)
>
> --
> 2.34.1
>

Both sets of backports now queued up, thanks.

greg k-h