2014-02-18 23:57:58

by George McCollister

[permalink] [raw]
Subject: [PATCH] sched: fix double normalization of vruntime

dequeue_entity() is called when p->on_rq and sets se->on_rq = 0
which appears to guarentee that the !se->on_rq condition is met.
If the task has done set_current_state(TASK_INTERRUPTIBLE) without
schedule() the second condition will be met and vruntime will be
incorrectly adjusted twice.

In certain cases this can result in the task's vruntime never increasing
past the vruntime of other tasks on the CFS' run queue, starving them of
CPU time.

This patch changes switched_from_fair() to use !p->on_rq instead of
!se->on_rq.

I'm able to cause a task with a priority of 120 to starve all other
tasks with the same priority on an ARM platform running 3.2.51-rt72
PREEMPT RT by writing one character at time to a serial tty (16550 UART)
in a tight loop. I'm also able to verify making this change corrects the
problem on that platform and kernel version.

I haven't and not sure I will have an opportunity to get a newer kernel
version running on the platform mentioned above and have yet to
reproduce the problem on another platform.
---
kernel/sched/fair.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 966cc2b..fa1c6df 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6999,15 +6999,15 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
struct cfs_rq *cfs_rq = cfs_rq_of(se);

/*
- * Ensure the task's vruntime is normalized, so that when its
+ * Ensure the task's vruntime is normalized, so that when it's
* switched back to the fair class the enqueue_entity(.flags=0) will
* do the right thing.
*
- * If it was on_rq, then the dequeue_entity(.flags=0) will already
- * have normalized the vruntime, if it was !on_rq, then only when
+ * If it's on_rq, then the dequeue_entity(.flags=0) will already
+ * have normalized the vruntime, if it's !on_rq, then only when
* the task is sleeping will it still have non-normalized vruntime.
*/
- if (!se->on_rq && p->state != TASK_RUNNING) {
+ if (!p->on_rq && p->state != TASK_RUNNING) {
/*
* Fix up our vruntime so that the current sleep doesn't
* cause 'unlimited' sleep bonus.
--
1.8.2.1


2014-02-26 13:28:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: fix double normalization of vruntime

On Tue, Feb 18, 2014 at 05:56:51PM -0600, George McCollister wrote:
> dequeue_entity() is called when p->on_rq and sets se->on_rq = 0
> which appears to guarentee that the !se->on_rq condition is met.
> If the task has done set_current_state(TASK_INTERRUPTIBLE) without
> schedule() the second condition will be met and vruntime will be
> incorrectly adjusted twice.
>
> In certain cases this can result in the task's vruntime never increasing
> past the vruntime of other tasks on the CFS' run queue, starving them of
> CPU time.
>
> This patch changes switched_from_fair() to use !p->on_rq instead of
> !se->on_rq.
>
> I'm able to cause a task with a priority of 120 to starve all other
> tasks with the same priority on an ARM platform running 3.2.51-rt72
> PREEMPT RT by writing one character at time to a serial tty (16550 UART)
> in a tight loop. I'm also able to verify making this change corrects the
> problem on that platform and kernel version.
>
> I haven't and not sure I will have an opportunity to get a newer kernel
> version running on the platform mentioned above and have yet to
> reproduce the problem on another platform.

Yes, I think you're quite right. Another way to look at this is that
p->on_rq is the one matching p->state.

Can I have (or add) your Signed-off-by for this patch?

2014-02-26 19:01:34

by George McCollister

[permalink] [raw]
Subject: Re: [PATCH] sched: fix double normalization of vruntime

On Wed, Feb 26, 2014 at 7:28 AM, Peter Zijlstra <[email protected]> wrote:
> On Tue, Feb 18, 2014 at 05:56:51PM -0600, George McCollister wrote:
>> dequeue_entity() is called when p->on_rq and sets se->on_rq = 0
>> which appears to guarentee that the !se->on_rq condition is met.
>> If the task has done set_current_state(TASK_INTERRUPTIBLE) without
>> schedule() the second condition will be met and vruntime will be
>> incorrectly adjusted twice.
>>
>> In certain cases this can result in the task's vruntime never increasing
>> past the vruntime of other tasks on the CFS' run queue, starving them of
>> CPU time.
>>
>> This patch changes switched_from_fair() to use !p->on_rq instead of
>> !se->on_rq.
>>
>> I'm able to cause a task with a priority of 120 to starve all other
>> tasks with the same priority on an ARM platform running 3.2.51-rt72
>> PREEMPT RT by writing one character at time to a serial tty (16550 UART)
>> in a tight loop. I'm also able to verify making this change corrects the
>> problem on that platform and kernel version.
>>
>> I haven't and not sure I will have an opportunity to get a newer kernel
>> version running on the platform mentioned above and have yet to
>> reproduce the problem on another platform.
>
> Yes, I think you're quite right. Another way to look at this is that
> p->on_rq is the one matching p->state.
Yes, correct

>
> Can I have (or add) your Signed-off-by for this patch?

Go ahead and add my sign off. I didn't want to add it before
discussing the issue with someone.

Thanks,
George McCollister

Subject: [tip:sched/urgent] sched: Fix double normalization of vruntime

Commit-ID: 791c9e0292671a3bfa95286bb5c08129d8605618
Gitweb: http://git.kernel.org/tip/791c9e0292671a3bfa95286bb5c08129d8605618
Author: George McCollister <[email protected]>
AuthorDate: Tue, 18 Feb 2014 17:56:51 -0600
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 27 Feb 2014 12:29:38 +0100

sched: Fix double normalization of vruntime

dequeue_entity() is called when p->on_rq and sets se->on_rq = 0
which appears to guarentee that the !se->on_rq condition is met.
If the task has done set_current_state(TASK_INTERRUPTIBLE) without
schedule() the second condition will be met and vruntime will be
incorrectly adjusted twice.

In certain cases this can result in the task's vruntime never increasing
past the vruntime of other tasks on the CFS' run queue, starving them of
CPU time.

This patch changes switched_from_fair() to use !p->on_rq instead of
!se->on_rq.

I'm able to cause a task with a priority of 120 to starve all other
tasks with the same priority on an ARM platform running 3.2.51-rt72
PREEMPT RT by writing one character at time to a serial tty (16550 UART)
in a tight loop. I'm also able to verify making this change corrects the
problem on that platform and kernel version.

Signed-off-by: George McCollister <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/fair.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7815709..9b4c4f3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7001,15 +7001,15 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
struct cfs_rq *cfs_rq = cfs_rq_of(se);

/*
- * Ensure the task's vruntime is normalized, so that when its
+ * Ensure the task's vruntime is normalized, so that when it's
* switched back to the fair class the enqueue_entity(.flags=0) will
* do the right thing.
*
- * If it was on_rq, then the dequeue_entity(.flags=0) will already
- * have normalized the vruntime, if it was !on_rq, then only when
+ * If it's on_rq, then the dequeue_entity(.flags=0) will already
+ * have normalized the vruntime, if it's !on_rq, then only when
* the task is sleeping will it still have non-normalized vruntime.
*/
- if (!se->on_rq && p->state != TASK_RUNNING) {
+ if (!p->on_rq && p->state != TASK_RUNNING) {
/*
* Fix up our vruntime so that the current sleep doesn't
* cause 'unlimited' sleep bonus.