2007-10-12 20:32:01

by Frans Pop

[permalink] [raw]
Subject: 2.6.23 regression: top displaying 9999% CPU usage

Please consider this patch for 2.6.23.2
http://lkml.org/lkml/2007/10/4/389

tested by me in
http://lkml.org/lkml/2007/10/5/150

to fix the regression first reported in
http://lkml.org/lkml/2007/10/3/123

Cause of the wrong display in top is the value of stime in /proc/<pid>/stat
decreasing occasionally.

The issue is also registered as:
http://bugzilla.kernel.org/show_bug.cgi?id=9135

Thanks,
Frans Pop


2007-10-12 21:22:52

by Greg KH

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

On Fri, Oct 12, 2007 at 10:31:50PM +0200, Frans Pop wrote:
> Please consider this patch for 2.6.23.2
> http://lkml.org/lkml/2007/10/4/389
>
> tested by me in
> http://lkml.org/lkml/2007/10/5/150
>
> to fix the regression first reported in
> http://lkml.org/lkml/2007/10/3/123
>
> Cause of the wrong display in top is the value of stime in /proc/<pid>/stat
> decreasing occasionally.
>
> The issue is also registered as:
> http://bugzilla.kernel.org/show_bug.cgi?id=9135

Is it already in Linus's tree? If so, do you have a git commit id? If
not, please let us (stable@) know when it is, and what the id is, and
then we can add it to our tree.

thanks,

greg k-h

2007-10-13 07:53:34

by Frans Pop

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

On Friday 12 October 2007, Greg KH wrote:
> On Fri, Oct 12, 2007 at 10:31:50PM +0200, Frans Pop wrote:
> > Please consider this patch for 2.6.23.2
> > http://lkml.org/lkml/2007/10/4/389
> >
> > tested by me in
> > http://lkml.org/lkml/2007/10/5/150
> >
> > to fix the regression first reported in
> > http://lkml.org/lkml/2007/10/3/123
> >
> > Cause of the wrong display in top is the value of stime in
> > /proc/<pid>/stat decreasing occasionally.
> >
> > The issue is also registered as:
> > http://bugzilla.kernel.org/show_bug.cgi?id=9135
>
> Is it already in Linus's tree? If so, do you have a git commit id? If
> not, please let us (stable@) know when it is, and what the id is, and
> then we can add it to our tree.

Not AFAICT.
CCing Christian (as patch author) and Ingo (as author of the change that
caused the regression) so they can push it through the correct channels.

2007-10-14 20:36:58

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

Am Samstag, 13. Oktober 2007 schrieb Frans Pop:
> > > Please consider this patch for 2.6.23.2
> > > http://lkml.org/lkml/2007/10/4/389
> > Is it already in Linus's tree? If so, do you have a git commit id? If
> > not, please let us (stable@) know when it is, and what the id is, and
> > then we can add it to our tree.
>
> Not AFAICT.
> CCing Christian (as patch author) and Ingo (as author of the change that
> caused the regression) so they can push it through the correct channels.
>

I dont know how to proceed with this issue. The more I think about it, the
more I am convinced that using sum_exec_runtime together with sampled utime
and stime will never guarantee monotonicity for utime and stime in proc.
Just imagine an process with 9 ticks for utime and 0 ticks for stime. If
we now sample one tick for stime (but having only a small increase in
sum_exec_runtime) the next utime value will only be 90% of the last value.
So returning to the 2.6.22 model seems to be the safest solution until
somebody else comes up with an idea that works proper.

Ingo, any opinion?

Christian

2007-10-16 08:30:27

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

Chuck, Balbir,

we still have a problem with stime occosionally going backwards. I stated
below that I think this is not fixable with the current utime/stime split
algorithm.

Balbir, you wrote this code, Chuck you tried to fix it. Any ideas how to
fix this properly? The only idea I have requires that we save the old value
of utime and stime and therefore requires additional locking.

Otherwise I suggest to apply my reversion patch from below:

Am Sonntag, 14. Oktober 2007 schrieb Christian Borntraeger:
> Am Samstag, 13. Oktober 2007 schrieb Frans Pop:
> > > > Please consider this patch for 2.6.23.2
> > > > http://lkml.org/lkml/2007/10/4/389
> > > Is it already in Linus's tree? If so, do you have a git commit id? If
> > > not, please let us (stable@) know when it is, and what the id is, and
> > > then we can add it to our tree.
> >
> > Not AFAICT.
> > CCing Christian (as patch author) and Ingo (as author of the change that
> > caused the regression) so they can push it through the correct channels.
> >
>
> I dont know how to proceed with this issue. The more I think about it, the
> more I am convinced that using sum_exec_runtime together with sampled utime
> and stime will never guarantee monotonicity for utime and stime in proc.
> Just imagine an process with 9 ticks for utime and 0 ticks for stime. If
> we now sample one tick for stime (but having only a small increase in
> sum_exec_runtime) the next utime value will only be 90% of the last value.
> So returning to the 2.6.22 model seems to be the safest solution until
> somebody else comes up with an idea that works proper.
>
> Ingo, any opinion?

--------- patch-----------

Fix stime going backwards

after 2.6.22 the accounting was changes to use sched_clock and to use stime
and utime only for the split. This is where task_utime and task_stime were
introduced. See the code in 2.6.23-rc1.
Unfortunately this broke the accouting on s390 which already has precise
numbers in utime and stime. So the code was partially reverted to use stime
and utime again where appropriate, which are of type ?cputime_t.

It now seems, that the rest of the logic is still broken.

This patch basically reverts the rest of b27f03d4bdc145a09fb7b0c0e004b29f1ee555fa
and should restore the 2.6.22 behavior. The process time is used from tasks
utime and stime instead of the scheduler clock. That means, in general after
a long period of time, it is less accurate than the sum_exec_runtime based
code, but it doesnt break top.

Signed-off-by: Christian Borntraeger <[email protected]>

---
fs/proc/array.c | 37 -------------------------------------
1 file changed, 37 deletions(-)

Index: linux-2.6/fs/proc/array.c
===================================================================
--- linux-2.6.orig/fs/proc/array.c
+++ linux-2.6/fs/proc/array.c
@@ -323,7 +323,6 @@ int proc_pid_status(struct task_struct *
/*
* Use precise platform statistics if available:
*/
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING
static cputime_t task_utime(struct task_struct *p)
{
return p->utime;
@@ -333,42 +332,6 @@ static cputime_t task_stime(struct task_
{
return p->stime;
}
-#else
-static cputime_t task_utime(struct task_struct *p)
-{
- clock_t utime = cputime_to_clock_t(p->utime),
- total = utime + cputime_to_clock_t(p->stime);
- u64 temp;
-
- /*
- * Use CFS's precise accounting:
- */
- temp = (u64)nsec_to_clock_t(p->se.sum_exec_runtime);
-
- if (total) {
- temp *= utime;
- do_div(temp, total);
- }
- utime = (clock_t)temp;
-
- return clock_t_to_cputime(utime);
-}
-
-static cputime_t task_stime(struct task_struct *p)
-{
- clock_t stime;
-
- /*
- * Use CFS's precise accounting. (we subtract utime from
- * the total, to make sure the total observed by userspace
- * grows monotonically - apps rely on that):
- */
- stime = nsec_to_clock_t(p->se.sum_exec_runtime) -
- cputime_to_clock_t(task_utime(p));
-
- return clock_t_to_cputime(stime);
-}
-#endif

static int do_task_stat(struct task_struct *task, char *buffer, int whole)
{

2007-10-16 09:31:51

by Balbir Singh

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

Christian Borntraeger wrote:
> Chuck, Balbir,
>
> we still have a problem with stime occosionally going backwards. I stated
> below that I think this is not fixable with the current utime/stime split
> algorithm.
>

Hi,

I missed seeing this problem before, sorry about that. Thanks for the
link below, I now understand the problem.

> Balbir, you wrote this code, Chuck you tried to fix it. Any ideas how to
> fix this properly? The only idea I have requires that we save the old value
> of utime and stime and therefore requires additional locking.
>

I am trying to think out loud as to what the root cause of the problem
might be. In one of the discussion threads, I saw utime going backwards,
which seemed very odd, I suspect that those are rounding errors.

I don't understand your explanation below

Initially utime = 9, stime = 0, sum_exec_runtime = S1

Later

utime = 9, stime = 1, sum_exec_runtime = S2

We can be sure that S >= (utime + stime)

If S2 = S1 + delta, then as per our calculation

Initially

utime_proc = (utime * (S1))/(utime + stime)
= nsec_to_clock_t(9 * S1 / 9)

later

utime_proc = nsec_to_clock_t(9 * S2/10)

Given that S >= (utime + stime), we should be fine.
The only problem I see is with rounding, like I mentioned before at two
places

1. Rounding at do_div() in task_utime()
2. Rounding in conversion from clock_t_to_cputime()

I have tried and not had any success reproducing the problem, could you
please help me with some pointers/steps to reproduce the problem?

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2007-10-16 10:11:53

by Frans Pop

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

On Tuesday 16 October 2007, Balbir Singh wrote:
> I am trying to think out loud as to what the root cause of the problem
> might be. In one of the discussion threads, I saw utime going backwards,
> which seemed very odd, I suspect that those are rounding errors.

I only remembered stime going backwards, but looking back at the mails
you are right, there has been one case of utime going backwards too.
stime going backwards happens _much_ more frequently.

> I have tried and not had any success reproducing the problem, could you
> please help me with some pointers/steps to reproduce the problem?

I can reproduce the problem reliably for two KDE programs I have running
permanently on my system: kontact and amarok. Both of these wake up
regularly and use some (not very much) processor capacity before going to
sleep again. Any process that has that that characteristic should do.

/me thinks a bit and tries something

lol - I have just managed to reproduce this with 'top' itself :-P

Run 'top' in one terminal; then in another terminal:
$ ps ax | grep " [t]op"
17869 pts/0 S+ 0:00 top
$ while true; do awk '{print $14" "$15}' /proc/17869/stat; sleep 1; done | ts
[...]
Oct 16 11:54:48 8 10
Oct 16 11:54:49 6 12 <-- utime
Oct 16 11:54:50 6 12
Oct 16 11:54:51 6 12
Oct 16 11:54:52 8 10 <-- stime
Oct 16 11:54:53 8 10
Oct 16 11:54:54 8 10
Oct 16 11:54:55 8 12
Oct 16 11:54:56 8 12

This example happens to have both stime _and_ utime decreasing...

'ts' is a small utility that prints the timestamps before the readings;
the test will work just as well without it.
Note that it may take a while for the error to show up. In this case it
was 40 seconds.

Cheers,
Frans Pop

2007-10-16 10:35:30

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

Am Dienstag, 16. Oktober 2007 schrieb Balbir Singh:
> I am trying to think out loud as to what the root cause of the problem
> might be. In one of the discussion threads, I saw utime going backwards,
> which seemed very odd, I suspect that those are rounding errors.
>
> I don't understand your explanation below
>
> Initially utime = 9, stime = 0, sum_exec_runtime = S1
>
> Later
>
> utime = 9, stime = 1, sum_exec_runtime = S2
>
> We can be sure that S >= (utime + stime)

I think here is the problem. How can we be sure? We cant. utime and stime
are sampled, so they can be largely off in any direction,if the program
sleeps often and manages to synchronize itself to the timer tick. Lets say
a program only does a simple system call and then sleeps. So sum_exec_runtime
is increased by lets say 1000 cycles on a 1Ghz box which means 1000ns. If now
the timer tick happens exactly at this moment, stime is increased by 1 tick
= 1000000ns.

Maybe there is some magic in the code which I did not see, but obviously
the problem exists and looking at Frans data (stime+utime) are not decreasing,
but stime isnt and utime is. If you look at Frans data you see:
Oct 16 11:54:48 8 10
Oct 16 11:54:49 6 12 ?<-- utime
Oct 16 11:54:50 6 12
Oct 16 11:54:51 6 12
Oct 16 11:54:52 8 10 ?<-- stime
Oct 16 11:54:53 8 10
Oct 16 11:54:54 8 10
Oct 16 11:54:55 8 12
Oct 16 11:54:56 8 12

(stime+utime) is constant. That means that S2-S1 is obviously smaller than
one tick (See the calculation in task_stime). I am quite sure it is caused
by changes in the sampled values p->utime and p->stime.

>
> If S2 = S1 + delta, then as per our calculation
>
> Initially
>
> utime_proc = (utime * (S1))/(utime + stime)
> = nsec_to_clock_t(9 * S1 / 9)
>
> later
>
> utime_proc = nsec_to_clock_t(9 * S2/10)
>
> Given that S >= (utime + stime), we should be fine.



2007-10-16 10:39:27

by Balbir Singh

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

Frans Pop wrote:
> On Tuesday 16 October 2007, Balbir Singh wrote:
>> I am trying to think out loud as to what the root cause of the problem
>> might be. In one of the discussion threads, I saw utime going backwards,
>> which seemed very odd, I suspect that those are rounding errors.
>
> I only remembered stime going backwards, but looking back at the mails
> you are right, there has been one case of utime going backwards too.
> stime going backwards happens _much_ more frequently.
>
>> I have tried and not had any success reproducing the problem, could you
>> please help me with some pointers/steps to reproduce the problem?
>
> I can reproduce the problem reliably for two KDE programs I have running
> permanently on my system: kontact and amarok. Both of these wake up
> regularly and use some (not very much) processor capacity before going to
> sleep again. Any process that has that that characteristic should do.
>
> /me thinks a bit and tries something
>
> lol - I have just managed to reproduce this with 'top' itself :-P
>

Thanks, I know why I am not seeing the problem. My box has
CONFIG_VIRT_CPU_ACCOUNTING set. I'll find another box and try.



--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2007-10-16 13:00:10

by Balbir Singh

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

Christian Borntraeger wrote:
> Am Dienstag, 16. Oktober 2007 schrieb Balbir Singh:
>> I am trying to think out loud as to what the root cause of the problem
>> might be. In one of the discussion threads, I saw utime going backwards,
>> which seemed very odd, I suspect that those are rounding errors.
>>
>> I don't understand your explanation below
>>
>> Initially utime = 9, stime = 0, sum_exec_runtime = S1
>>
>> Later
>>
>> utime = 9, stime = 1, sum_exec_runtime = S2
>>
>> We can be sure that S >= (utime + stime)
>
> I think here is the problem. How can we be sure? We cant. utime and stime
> are sampled, so they can be largely off in any direction,if the program
> sleeps often and manages to synchronize itself to the timer tick. Lets say
> a program only does a simple system call and then sleeps. So sum_exec_runtime
> is increased by lets say 1000 cycles on a 1Ghz box which means 1000ns. If now
> the timer tick happens exactly at this moment, stime is increased by 1 tick
> = 1000000ns.
>

Yes, I thought of that just after I sent out my email. In the case that
you mention, the utime and stime accounting is incorrect anyway :-)
I think we need to find a better solution. I was going to propose that
we round correctly in (the divisions in)

1. task_utime()
2. clock_t_to_cputime()

I suspect we'll need to round task_utime() to p->utime if the value of
task_utime() < p->utime and the same thing for task_stime(). I've tried
reproducing the problem on my UML setup without any success. Let me
try and grab an x86 box.

> Maybe there is some magic in the code which I did not see, but obviously
> the problem exists and looking at Frans data (stime+utime) are not decreasing,
> but stime isnt and utime is. If you look at Frans data you see:
> Oct 16 11:54:48 8 10
> Oct 16 11:54:49 6 12 <-- utime
> Oct 16 11:54:50 6 12
> Oct 16 11:54:51 6 12
> Oct 16 11:54:52 8 10 <-- stime
> Oct 16 11:54:53 8 10
> Oct 16 11:54:54 8 10
> Oct 16 11:54:55 8 12
> Oct 16 11:54:56 8 12
>
> (stime+utime) is constant. That means that S2-S1 is obviously smaller than
> one tick (See the calculation in task_stime). I am quite sure it is caused
> by changes in the sampled values p->utime and p->stime.
>

Yes, very interesting observation.

[snip]


--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2007-10-29 12:06:10

by Frans Pop

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

Hello Balbir,

On Tuesday 16 October 2007, Balbir Singh wrote:
> Christian Borntraeger wrote:
> > Am Dienstag, 16. Oktober 2007 schrieb Balbir Singh:
> >> I am trying to think out loud as to what the root cause of the problem
> >> might be. In one of the discussion threads, I saw utime going
> >> backwards, which seemed very odd, I suspect that those are rounding
> >> errors.
> >>
> >> I don't understand your explanation below
> >>
> >> Initially utime = 9, stime = 0, sum_exec_runtime = S1
> >>
> >> Later
> >>
> >> utime = 9, stime = 1, sum_exec_runtime = S2
> >>
> >> We can be sure that S >= (utime + stime)
> >
> > I think here is the problem. How can we be sure? We cant. utime and
> > stime are sampled, so they can be largely off in any direction,if the
> > program sleeps often and manages to synchronize itself to the timer
> > tick. Lets say a program only does a simple system call and then
> > sleeps. So sum_exec_runtime is increased by lets say 1000 cycles on a
> > 1Ghz box which means 1000ns. If now the timer tick happens exactly at
> > this moment, stime is increased by 1 tick = 1000000ns.
>
> Yes, I thought of that just after I sent out my email. In the case that
> you mention, the utime and stime accounting is incorrect anyway :-)
> I think we need to find a better solution. I was going to propose that
> we round correctly in (the divisions in)
>
> 1. task_utime()
> 2. clock_t_to_cputime()
>
> I suspect we'll need to round task_utime() to p->utime if the value of
> task_utime() < p->utime and the same thing for task_stime(). I've tried
> reproducing the problem on my UML setup without any success. Let me
> try and grab an x86 box.

Any progress on this issue? I noticed that it's still there in current git.

If a better implementation is not expected any time soon, how about an ACK
on the reversion patch Christian proposed in
http://lkml.org/lkml/2007/10/16/76
so we can at least get rid of the regression?

Thanks,
Frans Pop

2007-10-29 12:32:08

by Balbir Singh

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

Frans Pop wrote:
> Hello Balbir,

[snip]

> Any progress on this issue? I noticed that it's still there in current git.
>

I have been looking elsewhere, sorry, I'll look into this issue tonight

> If a better implementation is not expected any time soon, how about an ACK
> on the reversion patch Christian proposed in
> http://lkml.org/lkml/2007/10/16/76
> so we can at least get rid of the regression?
>

We'll I initially thought of it and then I remembered that the
regression occurs only when the accounting itself is inaccurate.
I am tempted to ack the removal, but I would like to get input
from others. Meanwhile, I'll try and see if I can fix the problem

> Thanks,
> Frans Pop


--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2007-10-29 20:05:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage


* Balbir Singh <[email protected]> wrote:

> We'll I initially thought of it and then I remembered that the
> regression occurs only when the accounting itself is inaccurate. I am
> tempted to ack the removal, but I would like to get input from others.
> Meanwhile, I'll try and see if I can fix the problem

i've got a patch from Peter queued up. (see below) This should fix the
main issue.

Ingo

-------------------->
Subject: sched: keep utime/stime monotonic
From: Peter Zijlstra <[email protected]>

keep utime/stime monotonic.

cpustats use utime/stime as a ratio against sum_exec_runtime, as a
consequence it can happen - when the ratio changes faster than time
accumulates - that either can be appear to go backwards.

Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
fs/proc/array.c | 3 ++-
include/linux/sched.h | 1 +
kernel/fork.c | 1 +
3 files changed, 4 insertions(+), 1 deletion(-)

Index: linux/fs/proc/array.c
===================================================================
--- linux.orig/fs/proc/array.c
+++ linux/fs/proc/array.c
@@ -358,7 +358,8 @@ static cputime_t task_utime(struct task_
}
utime = (clock_t)temp;

- return clock_t_to_cputime(utime);
+ p->prev_utime = max(p->prev_utime, clock_t_to_cputime(utime));
+ return p->prev_utime;
}

static cputime_t task_stime(struct task_struct *p)
Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -1009,6 +1009,7 @@ struct task_struct {
unsigned int rt_priority;
cputime_t utime, stime, utimescaled, stimescaled;
cputime_t gtime;
+ cputime_t prev_utime;
unsigned long nvcsw, nivcsw; /* context switch counts */
struct timespec start_time; /* monotonic time */
struct timespec real_start_time; /* boot based time */
Index: linux/kernel/fork.c
===================================================================
--- linux.orig/kernel/fork.c
+++ linux/kernel/fork.c
@@ -1056,6 +1056,7 @@ static struct task_struct *copy_process(
p->gtime = cputime_zero;
p->utimescaled = cputime_zero;
p->stimescaled = cputime_zero;
+ p->prev_utime = cputime_zero;

#ifdef CONFIG_TASK_XACCT
p->rchar = 0; /* I/O counter: bytes read */

2007-10-29 20:34:28

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

Am Montag, 29. Oktober 2007 schrieb Ingo Molnar:
> i've got a patch from Peter queued up. (see below) This should fix the
> main issue.
[...]
> --- linux.orig/fs/proc/array.c
> +++ linux/fs/proc/array.c
> @@ -358,7 +358,8 @@ static cputime_t task_utime(struct task_
> }
> utime = (clock_t)temp;
>
> - return clock_t_to_cputime(utime);
> + p->prev_utime = max(p->prev_utime, clock_t_to_cputime(utime));
> + return p->prev_utime;
> }
[...]

I dont think it will work. It will make utime monotic, but stime can still
decrease. For example let sum_exec_runtime increase by a tiny little bit while
utime will get a full additional tick. stime is sum-utime. So stime can still
go backwards. So I think that we need this kind of logic for stime as well,
no?

Christian



2007-10-29 20:41:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage


* Christian Borntraeger <[email protected]> wrote:

> > - return clock_t_to_cputime(utime);
> > + p->prev_utime = max(p->prev_utime, clock_t_to_cputime(utime));
> > + return p->prev_utime;
> > }
> [...]
>
> I dont think it will work. It will make utime monotic, but stime can
> still decrease. For example let sum_exec_runtime increase by a tiny
> little bit while utime will get a full additional tick. stime is
> sum-utime. So stime can still go backwards. So I think that we need
> this kind of logic for stime as well, no?

yeah, probably. Peter?

Ingo

2007-10-29 21:12:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage


On Mon, 2007-10-29 at 21:41 +0100, Ingo Molnar wrote:
> * Christian Borntraeger <[email protected]> wrote:
>
> > > - return clock_t_to_cputime(utime);
> > > + p->prev_utime = max(p->prev_utime, clock_t_to_cputime(utime));
> > > + return p->prev_utime;
> > > }
> > [...]
> >
> > I dont think it will work. It will make utime monotic, but stime can
> > still decrease. For example let sum_exec_runtime increase by a tiny
> > little bit while utime will get a full additional tick. stime is
> > sum-utime. So stime can still go backwards. So I think that we need
> > this kind of logic for stime as well, no?
>
> yeah, probably. Peter?

/me dons the brown paper bag while mumbling an agreement of sorts.

I'll not attempt to come up with a patch as I fear I'll just make a
bigger mess in my current state, hope to feel better tomorrow..

2007-10-29 21:22:54

by Frans Pop

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

On Monday 29 October 2007, Ingo Molnar wrote:
> * Christian Borntraeger <[email protected]> wrote:
> > > - return clock_t_to_cputime(utime);
> > > + p->prev_utime = max(p->prev_utime, clock_t_to_cputime(utime));
> > > + return p->prev_utime;
> > > }
> >
> > [...]
> >
> > I dont think it will work. It will make utime monotic, but stime can
> > still decrease. For example let sum_exec_runtime increase by a tiny
> > little bit while utime will get a full additional tick. stime is
> > sum-utime. So stime can still go backwards. So I think that we need
> > this kind of logic for stime as well, no?
>
> yeah, probably. Peter?

Yes, definitely :-)

With this patch stime is still all over the place.

Oct 29 22:12:39 314 64
Oct 29 22:12:40 392 68
Oct 29 22:12:41 408 67 <--
Oct 29 22:12:42 410 67
Oct 29 22:12:43 416 68
Oct 29 22:12:44 420 68
Oct 29 22:12:45 424 68
Oct 29 22:12:46 426 68
Oct 29 22:12:47 430 70
Oct 29 22:12:48 430 70
Oct 29 22:12:49 430 70
Oct 29 22:12:50 432 68 <--
Oct 29 22:12:51 432 69
Oct 29 22:12:52 432 69
Oct 29 22:12:53 432 69
Oct 29 22:12:54 432 69
Oct 29 22:12:55 432 69
Oct 29 22:12:56 433 70
Oct 29 22:12:57 434 69 <--
Oct 29 22:12:58 443 71

utime looks OK now, though I'd like to test it a bit more (when stime is
fixed too) before giving a final verdict on that.

2007-10-29 22:13:40

by Balbir Singh

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

On Mon, Oct 29, 2007 at 09:04:25PM +0100, Ingo Molnar wrote:
>
> * Balbir Singh <[email protected]> wrote:
>
> > We'll I initially thought of it and then I remembered that the
> > regression occurs only when the accounting itself is inaccurate. I am
> > tempted to ack the removal, but I would like to get input from others.
> > Meanwhile, I'll try and see if I can fix the problem
>
> i've got a patch from Peter queued up. (see below) This should fix the
> main issue.
>
> Ingo
>

We'll also need this additional patch (untested), but in the long run
I think the approach needs to be

1. Update stime and utime at the time of context switching -- keep it
in sync with p->sum_exec_runtime
2. Keep track of system/user context at system call entry points


Extend Peter's patch to fix accounting issues.

Signed-off-by: Balbir Singh <[email protected]>
---

fs/proc/array.c | 3 ++-
include/linux/sched.h | 2 +-
kernel/fork.c | 1 +
3 files changed, 4 insertions(+), 2 deletions(-)

diff -puN fs/proc/array.c~fix-accounting-issue-extended fs/proc/array.c
--- 2.6.24-rc1/fs/proc/array.c~fix-accounting-issue-extended 2007-10-30 03:08:18.000000000 +0530
+++ 2.6.24-rc1-balbir/fs/proc/array.c 2007-10-30 03:09:46.000000000 +0530
@@ -374,7 +374,8 @@ static cputime_t task_stime(struct task_
stime = nsec_to_clock_t(p->se.sum_exec_runtime) -
cputime_to_clock_t(task_utime(p));

- return clock_t_to_cputime(stime);
+ p->prev_stime = max(p->prev_stime, clock_t_to_cputime(stime));
+ return p->prev_stime;
}
#endif

diff -puN include/linux/sched.h~fix-accounting-issue-extended include/linux/sched.h
--- 2.6.24-rc1/include/linux/sched.h~fix-accounting-issue-extended 2007-10-30 03:08:18.000000000 +0530
+++ 2.6.24-rc1-balbir/include/linux/sched.h 2007-10-30 03:08:32.000000000 +0530
@@ -1004,7 +1004,7 @@ struct task_struct {
unsigned int rt_priority;
cputime_t utime, stime, utimescaled, stimescaled;
cputime_t gtime;
- cputime_t prev_utime;
+ cputime_t prev_utime, prev_stime;
unsigned long nvcsw, nivcsw; /* context switch counts */
struct timespec start_time; /* monotonic time */
struct timespec real_start_time; /* boot based time */
diff -puN kernel/fork.c~fix-accounting-issue-extended kernel/fork.c
--- 2.6.24-rc1/kernel/fork.c~fix-accounting-issue-extended 2007-10-30 03:08:18.000000000 +0530
+++ 2.6.24-rc1-balbir/kernel/fork.c 2007-10-30 03:08:42.000000000 +0530
@@ -1057,6 +1057,7 @@ static struct task_struct *copy_process(
p->utimescaled = cputime_zero;
p->stimescaled = cputime_zero;
p->prev_utime = cputime_zero;
+ p->prev_stime = cputime_zero;

#ifdef CONFIG_TASK_XACCT
p->rchar = 0; /* I/O counter: bytes read */
_

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2007-10-29 23:19:39

by Frans Pop

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

On Monday 29 October 2007, Balbir Singh wrote:
> We'll also need this additional patch (untested),

OK. Both patches together do the trick. Gave it a nice long test run and got
no more weirdness.
Tested-by: Frans Pop <[email protected]>

> but in the long run I think the approach needs to be
>
> 1. Update stime and utime at the time of context switching -- keep it
> in sync with p->sum_exec_runtime
> 2. Keep track of system/user context at system call entry points

Feel free to send me a patch for testing when you have one.

Cheers,
Frans

2007-10-29 23:22:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage


* Frans Pop <[email protected]> wrote:

> On Monday 29 October 2007, Balbir Singh wrote:
> > We'll also need this additional patch (untested),
>
> OK. Both patches together do the trick. Gave it a nice long test run and got
> no more weirdness.
> Tested-by: Frans Pop <[email protected]>

cool, thanks! I've queued it up for the next scheduler batch.

Ingo

2007-10-29 23:24:48

by Balbir Singh

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

Frans Pop wrote:
> On Monday 29 October 2007, Balbir Singh wrote:
>> We'll also need this additional patch (untested),
>
> OK. Both patches together do the trick. Gave it a nice long test run and got
> no more weirdness.
> Tested-by: Frans Pop <[email protected]>
>

Thanks for testing it

Ingo, I think this fix needs to be picked as well.

>> but in the long run I think the approach needs to be
>>
>> 1. Update stime and utime at the time of context switching -- keep it
>> in sync with p->sum_exec_runtime
>> 2. Keep track of system/user context at system call entry points
>
> Feel free to send me a patch for testing when you have one.
>

I hope to find some free time soon and get to it.

> Cheers,
> Frans


--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2007-10-30 05:56:51

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

Am Montag, 29. Oktober 2007 schrieb Balbir Singh:
> We'll also need this additional patch (untested), but in the long run
> I think the approach needs to be
>
> 1. Update stime and utime at the time of context switching -- keep it
> in sync with p->sum_exec_runtime
> 2. Keep track of system/user context at system call entry points

This is similar to what s390 and ppc64 do when CONFIG_VIRT_CPU_ACCOUNTING is
set. Whenever you do a patch, please make sure to not break the already
precise values for utime and stime on s390 and ppc. Feel free to CC me
whenever you have a patch for this.

Christian

2007-10-30 06:00:27

by Balbir Singh

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

Christian Borntraeger wrote:
> Am Montag, 29. Oktober 2007 schrieb Balbir Singh:
>> We'll also need this additional patch (untested), but in the long run
>> I think the approach needs to be
>>
>> 1. Update stime and utime at the time of context switching -- keep it
>> in sync with p->sum_exec_runtime
>> 2. Keep track of system/user context at system call entry points
>
> This is similar to what s390 and ppc64 do when CONFIG_VIRT_CPU_ACCOUNTING is
> set. Whenever you do a patch, please make sure to not break the already
> precise values for utime and stime on s390 and ppc. Feel free to CC me
> whenever you have a patch for this.
>

Yes, true. I'll cc everyone on this list and send out the patch as RFC.
Much of this code will be architecture specific and as you rightly
pointed out, needs to happen under !CONFIG_VIRT_CPU_ACCOUNTING

> Christian


--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2007-10-30 20:23:19

by Otavio Salvador

[permalink] [raw]
Subject: Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

Ingo Molnar <[email protected]> writes:

> * Frans Pop <[email protected]> wrote:
>
>> On Monday 29 October 2007, Balbir Singh wrote:
>> > We'll also need this additional patch (untested),
>>
>> OK. Both patches together do the trick. Gave it a nice long test run and got
>> no more weirdness.
>> Tested-by: Frans Pop <[email protected]>
>
> cool, thanks! I've queued it up for the next scheduler batch.

It would be nice to put those two on stable point release too.

--
O T A V I O S A L V A D O R
---------------------------------------------
E-mail: [email protected] UIN: 5906116
GNU/Linux User: 239058 GPG ID: 49A5F855
Home Page: http://otavio.ossystems.com.br
---------------------------------------------
"Microsoft sells you Windows ... Linux gives
you the whole house."