2007-02-22 00:28:27

by Thomas Gleixner

[permalink] [raw]
Subject: [PATCH] Fix posix-cpu-timer breakage caused by stale p->last_ran value

Problem description at:
http://bugzilla.kernel.org/show_bug.cgi?id=8048

Commit b18ec80396834497933d77b81ec0918519f4e2a7
[PATCH] sched: improve migration accuracy
optimized the scheduler time calculations, but broke posix-cpu-timers.

The problem is that the p->last_ran value is not updated after a context
switch. So a subsequent call to current_sched_time() calculates with a
stale p->last_ran value, i.e. accounts the full time, which the task was
scheduled away.

Signed-off-by: Thomas Gleixner <[email protected]>

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -3566,7 +3566,7 @@ switch_tasks:

sched_info_switch(prev, next);
if (likely(prev != next)) {
- next->timestamp = now;
+ next->timestamp = next->last_ran = now;
rq->nr_switches++;
rq->curr = next;
++*switch_count;




2007-02-22 07:51:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] Fix posix-cpu-timer breakage caused by stale p->last_ran value


* Thomas Gleixner <[email protected]> wrote:

> The problem is that the p->last_ran value is not updated after a
> context switch. So a subsequent call to current_sched_time()
> calculates with a stale p->last_ran value, i.e. accounts the full
> time, which the task was scheduled away.
>
> Signed-off-by: Thomas Gleixner <[email protected]>

> sched_info_switch(prev, next);
> if (likely(prev != next)) {
> - next->timestamp = now;
> + next->timestamp = next->last_ran = now;

ouch! nice catch. Also for v2.6.20.2 i think. 2.6.19 should be
unaffected.

Acked-by: Ingo Molnar <[email protected]>

Ingo

2007-02-22 07:56:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] Fix posix-cpu-timer breakage caused by stale p->last_ran value

On Thu, 2007-02-22 at 08:46 +0100, Ingo Molnar wrote:
> * Thomas Gleixner <[email protected]> wrote:
>
> > The problem is that the p->last_ran value is not updated after a
> > context switch. So a subsequent call to current_sched_time()
> > calculates with a stale p->last_ran value, i.e. accounts the full
> > time, which the task was scheduled away.
> >
> > Signed-off-by: Thomas Gleixner <[email protected]>
>
> > sched_info_switch(prev, next);
> > if (likely(prev != next)) {
> > - next->timestamp = now;
> > + next->timestamp = next->last_ran = now;
>
> ouch! nice catch. Also for v2.6.20.2 i think. 2.6.19 should be
> unaffected.

Yes, was introduced in 2.6.20 and definitely should hit the stable tree.

tglx


2007-02-22 09:16:40

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] Fix posix-cpu-timer breakage caused by stale p->last_ran value

On Thu, 2007-02-22 at 01:33 +0100, Thomas Gleixner wrote:
> Problem description at:
> http://bugzilla.kernel.org/show_bug.cgi?id=8048
>
> Commit b18ec80396834497933d77b81ec0918519f4e2a7
> [PATCH] sched: improve migration accuracy
> optimized the scheduler time calculations, but broke posix-cpu-timers.
>
> The problem is that the p->last_ran value is not updated after a context
> switch. So a subsequent call to current_sched_time() calculates with a
> stale p->last_ran value, i.e. accounts the full time, which the task was
> scheduled away.

Oops, missed that. You could also remove the prev->last_ran assignment
just above your addition, and turn this into a negative cost bugfix :)

-Mike

2007-02-22 16:48:50

by John Sigler

[permalink] [raw]
Subject: Re: [PATCH] Fix posix-cpu-timer breakage caused by stale p->last_ran value

Thomas Gleixner wrote:

> Problem description at:
> http://bugzilla.kernel.org/show_bug.cgi?id=8048
>
> Commit b18ec80396834497933d77b81ec0918519f4e2a7
> [PATCH] sched: improve migration accuracy
> optimized the scheduler time calculations, but broke posix-cpu-timers.
>
> The problem is that the p->last_ran value is not updated after a context
> switch. So a subsequent call to current_sched_time() calculates with a
> stale p->last_ran value, i.e. accounts the full time, which the task was
> scheduled away.

Could you expand on the impact of this bug for non-kernel hackers? :-)

I'm currently testing 2.6.20-rt5. My app runs in SCHED_RR and just
blocks waiting for several signals. Since my app is the only SCHED_RR
process running on the system, I get the impression that I'm not
affected by this bug. Is that correct?

> Signed-off-by: Thomas Gleixner <[email protected]>
>
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -3566,7 +3566,7 @@ switch_tasks:
>
> sched_info_switch(prev, next);
> if (likely(prev != next)) {
> - next->timestamp = now;
> + next->timestamp = next->last_ran = now;
> rq->nr_switches++;
> rq->curr = next;
> ++*switch_count;

Regards,

John