2007-06-14 18:57:37

by malc

[permalink] [raw]
Subject: Re: [patch] sched: accurate user accounting

Hello Ingo and others,

After reading http://lwn.net/Articles/236485/ and noticing few refernces
to accounting i decided to give CFS a try. With sched-cfs-v2.6.21.4-16
i get pretty weird results, it seems like scheduler is dead set on trying
to move the processes to different CPUs/cores all the time. And with hog
(manually tweaking the amount iterations) i get fairly strange resuls,
first of all the process is split between two cores, secondly while
integral load provided by the kernel looks correct, it's off by good
20 percent on each idividial core.

(http://www.boblycat.org/~malc/apc/hog-cfs-v16.png)

Thought this information might be of some interest.

P.S. How come the /proc/stat information is much closer to reality now?
Something like what Con Kolivas suggested was added to sched.c?

--
vale


2007-06-14 20:43:20

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] sched: accurate user accounting


* Vassili Karpov <[email protected]> wrote:

> Hello Ingo and others,
>
> After reading http://lwn.net/Articles/236485/ and noticing few
> refernces to accounting i decided to give CFS a try. With
> sched-cfs-v2.6.21.4-16 i get pretty weird results, it seems like
> scheduler is dead set on trying to move the processes to different
> CPUs/cores all the time. And with hog (manually tweaking the amount
> iterations) i get fairly strange resuls, first of all the process is
> split between two cores, secondly while integral load provided by the
> kernel looks correct, it's off by good 20 percent on each idividial
> core.
>
> (http://www.boblycat.org/~malc/apc/hog-cfs-v16.png)
>
> Thought this information might be of some interest.

hm - what does 'hog' do, can i download hog.c from somewhere?

the alternating balancing might be due to an uneven number of tasks
perhaps? If you have 3 tasks on 2 cores then there's no other solution
to achieve even performance of each task but to rotate them amongst the
cores.

> P.S. How come the /proc/stat information is much closer to reality
> now? Something like what Con Kolivas suggested was added to
> sched.c?

well, precise/finegrained accounting patches have been available for
years, the thing with CFS is that there we get them 'for free', because
CFS needs those metrics for its own logic. That's why this information
is much closer to reality now. But note: right now what is affected by
the changes in the CFS patches is /proc/PID/stat (i.e. the per-task
information that 'top' and 'ps' displays, _not_ /proc/stat) - but more
accurate /proc/stat could certainly come later on too.

Ingo

2007-06-14 20:57:44

by malc

[permalink] [raw]
Subject: Re: [patch] sched: accurate user accounting

On Thu, 14 Jun 2007, Ingo Molnar wrote:

>
> * Vassili Karpov <[email protected]> wrote:
>
>> Hello Ingo and others,
>>
>> After reading http://lwn.net/Articles/236485/ and noticing few
>> refernces to accounting i decided to give CFS a try. With
>> sched-cfs-v2.6.21.4-16 i get pretty weird results, it seems like
>> scheduler is dead set on trying to move the processes to different
>> CPUs/cores all the time. And with hog (manually tweaking the amount
>> iterations) i get fairly strange resuls, first of all the process is
>> split between two cores, secondly while integral load provided by the
>> kernel looks correct, it's off by good 20 percent on each idividial
>> core.
>>
>> (http://www.boblycat.org/~malc/apc/hog-cfs-v16.png)
>>
>> Thought this information might be of some interest.
>
> hm - what does 'hog' do, can i download hog.c from somewhere?

http://www.boblycat.org/~malc/apc/hog.c and also a in
Documentation/cpu-load.txt.

>
> the alternating balancing might be due to an uneven number of tasks
> perhaps? If you have 3 tasks on 2 cores then there's no other solution
> to achieve even performance of each task but to rotate them amongst the
> cores.

One task, one thread. I have also tried to watch fairly demanding video
(Elephants Dream in 1920x1080/MPEG4) with mplayer, and CFS moves the
only task between cores almost every second.

>> P.S. How come the /proc/stat information is much closer to reality
>> now? Something like what Con Kolivas suggested was added to
>> sched.c?
>
> well, precise/finegrained accounting patches have been available for
> years, the thing with CFS is that there we get them 'for free', because
> CFS needs those metrics for its own logic. That's why this information
> is much closer to reality now. But note: right now what is affected by
> the changes in the CFS patches is /proc/PID/stat (i.e. the per-task
> information that 'top' and 'ps' displays, _not_ /proc/stat) - but more
> accurate /proc/stat could certainly come later on too.

Aha. I see, it's just that integral load for hog is vastly improved
compared to vanilla 2.6.21 (then again some other tests are off by a few
percent (at least), though they were fine with Con's patch (which was
announced at the beginning of this thread))

--
vale

2007-06-14 21:19:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] sched: accurate user accounting


* malc <[email protected]> wrote:

> > the alternating balancing might be due to an uneven number of tasks
> > perhaps? If you have 3 tasks on 2 cores then there's no other
> > solution to achieve even performance of each task but to rotate them
> > amongst the cores.
>
> One task, one thread. I have also tried to watch fairly demanding
> video (Elephants Dream in 1920x1080/MPEG4) with mplayer, and CFS moves
> the only task between cores almost every second.

hm, mplayer is not running alone when it does video playback: Xorg is
also pretty active. Furthermore, the task you are using to monitor
mplayer counts too. The Core2Duo has a shared L2 cache between cores, so
it is pretty cheap to move tasks between the cores.

> > well, precise/finegrained accounting patches have been available for
> > years, the thing with CFS is that there we get them 'for free',
> > because CFS needs those metrics for its own logic. That's why this
> > information is much closer to reality now. But note: right now what
> > is affected by the changes in the CFS patches is /proc/PID/stat
> > (i.e. the per-task information that 'top' and 'ps' displays, _not_
> > /proc/stat) - but more accurate /proc/stat could certainly come
> > later on too.
>
> Aha. I see, it's just that integral load for hog is vastly improved
> compared to vanilla 2.6.21 [...]

hm, which ones are improved? Could this be due to some other property of
CFS? If your app relies on /proc/stat then there's no extra precision in
those cpustat values yet.

i've Cc:-ed Balbir Singh and Dmitry Adamushko who are the main authors
of the current precise accounting code in CFS. Maybe i missed some
detail :-)

Ingo

2007-06-14 21:37:55

by malc

[permalink] [raw]
Subject: Re: [patch] sched: accurate user accounting

On Thu, 14 Jun 2007, Ingo Molnar wrote:

>
> * malc <[email protected]> wrote:
>
>>> the alternating balancing might be due to an uneven number of tasks
>>> perhaps? If you have 3 tasks on 2 cores then there's no other
>>> solution to achieve even performance of each task but to rotate them
>>> amongst the cores.
>>
>> One task, one thread. I have also tried to watch fairly demanding
>> video (Elephants Dream in 1920x1080/MPEG4) with mplayer, and CFS moves
>> the only task between cores almost every second.
>
> hm, mplayer is not running alone when it does video playback: Xorg is
> also pretty active. Furthermore, the task you are using to monitor
> mplayer counts too. The Core2Duo has a shared L2 cache between cores, so
> it is pretty cheap to move tasks between the cores.
>

Well just to be sure i reran the test with `-vo null' (and fwiw i tried
few completely different output drivers) the behavior is the same. I'm
not running Core2Duo but X2, but guess that does not really matter here.

As for the task that monitors, i've written it myself (there are two
monitoring methods, one(the accurate) does not depend on contets of
`/proc/stat' at all), so it can be cheaply (for me) changed in any
way one wants. Sources are available at the same place where screenshot
was found.

>>> well, precise/finegrained accounting patches have been available for
>>> years, the thing with CFS is that there we get them 'for free',
>>> because CFS needs those metrics for its own logic. That's why this
>>> information is much closer to reality now. But note: right now what
>>> is affected by the changes in the CFS patches is /proc/PID/stat
>>> (i.e. the per-task information that 'top' and 'ps' displays, _not_
>>> /proc/stat) - but more accurate /proc/stat could certainly come
>>> later on too.
>>
>> Aha. I see, it's just that integral load for hog is vastly improved
>> compared to vanilla 2.6.21 [...]
>
> hm, which ones are improved? Could this be due to some other property of
> CFS? If your app relies on /proc/stat then there's no extra precision in
> those cpustat values yet.

This is what it looked like before:
http://www.boblycat.org/~malc/apc/load-x2-hog.png

Now integral load matches the one obtained via the "accurate" method.
However the report for individual cores are of by around 20% percent.

Though i'm not quite sure what you mean by "which ones are improved".

> i've Cc:-ed Balbir Singh and Dmitry Adamushko who are the main authors
> of the current precise accounting code in CFS. Maybe i missed some
> detail :-)

Oh, the famous "With enough eyeballs, all bugs are shallow." in action.

--
vale

2007-06-15 03:44:46

by Balbir Singh

[permalink] [raw]
Subject: Re: [patch] sched: accurate user accounting

malc wrote:
> On Thu, 14 Jun 2007, Ingo Molnar wrote:
>
>>
>> * malc <[email protected]> wrote:
>>
>>>> the alternating balancing might be due to an uneven number of tasks
>>>> perhaps? If you have 3 tasks on 2 cores then there's no other
>>>> solution to achieve even performance of each task but to rotate them
>>>> amongst the cores.
>>>
>>> One task, one thread. I have also tried to watch fairly demanding
>>> video (Elephants Dream in 1920x1080/MPEG4) with mplayer, and CFS moves
>>> the only task between cores almost every second.
>>
>> hm, mplayer is not running alone when it does video playback: Xorg is
>> also pretty active. Furthermore, the task you are using to monitor
>> mplayer counts too. The Core2Duo has a shared L2 cache between cores, so
>> it is pretty cheap to move tasks between the cores.
>>
>
> Well just to be sure i reran the test with `-vo null' (and fwiw i tried
> few completely different output drivers) the behavior is the same. I'm
> not running Core2Duo but X2, but guess that does not really matter here.
>
> As for the task that monitors, i've written it myself (there are two
> monitoring methods, one(the accurate) does not depend on contets of
> `/proc/stat' at all), so it can be cheaply (for me) changed in any
> way one wants. Sources are available at the same place where screenshot
> was found.
>
>>>> well, precise/finegrained accounting patches have been available for
>>>> years, the thing with CFS is that there we get them 'for free',
>>>> because CFS needs those metrics for its own logic. That's why this
>>>> information is much closer to reality now. But note: right now what
>>>> is affected by the changes in the CFS patches is /proc/PID/stat
>>>> (i.e. the per-task information that 'top' and 'ps' displays, _not_
>>>> /proc/stat) - but more accurate /proc/stat could certainly come
>>>> later on too.
>>>
>>> Aha. I see, it's just that integral load for hog is vastly improved
>>> compared to vanilla 2.6.21 [...]
>>
>> hm, which ones are improved? Could this be due to some other property of
>> CFS? If your app relies on /proc/stat then there's no extra precision in
>> those cpustat values yet.
>
> This is what it looked like before:
> http://www.boblycat.org/~malc/apc/load-x2-hog.png
>
> Now integral load matches the one obtained via the "accurate" method.
> However the report for individual cores are of by around 20% percent.
>

I think I missed some of the context, is the accounting of individual tasks
or cpustat values off by 20%? I'll try and reproduce this problem.

Could you provide more details on the APC tool that you are using -- I
do not understand the orange and yellow lines, do they represent system
and user time?

NOTE: There is some inconsistency in the values reported by /usr/bin/time
(getrusage) and values reported in /proc or through delay accounting.


> Though i'm not quite sure what you mean by "which ones are improved".
>
>> i've Cc:-ed Balbir Singh and Dmitry Adamushko who are the main authors
>> of the current precise accounting code in CFS. Maybe i missed some
>> detail :-)
>
> Oh, the famous "With enough eyeballs, all bugs are shallow." in action.
>


--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2007-06-15 06:08:40

by malc

[permalink] [raw]
Subject: Re: [patch] sched: accurate user accounting

On Fri, 15 Jun 2007, Balbir Singh wrote:

> malc wrote:
>> On Thu, 14 Jun 2007, Ingo Molnar wrote:
>>

[..snip..]

>>
>> Now integral load matches the one obtained via the "accurate" method.
>> However the report for individual cores are of by around 20% percent.
>>
>
> I think I missed some of the context, is the accounting of individual tasks
> or cpustat values off by 20%? I'll try and reproduce this problem.

Neither actually, the individual core idle times reported via `/proc/stat'
are off by 20 percent, one over estimates and the other under estimates
and the sum is right on the mark.

>
> Could you provide more details on the APC tool that you are using -- I
> do not understand the orange and yellow lines, do they represent system
> and user time?

It's somewhat documented on the page: http://www.boblycat.org/~malc/apc
Anyway the left bar is based on information from `/proc/stat' the right
one is derived from the kernel module that just times how much time was
spent in idle handler. The graphs: red - `/proc/stat', yellow - module.

> NOTE: There is some inconsistency in the values reported by /usr/bin/time
> (getrusage) and values reported in /proc or through delay accounting.

I don't really use `getrusage'.

--
vale

2007-06-16 13:21:49

by Balbir Singh

[permalink] [raw]
Subject: Re: [patch] sched: accurate user accounting

malc wrote:
> On Fri, 15 Jun 2007, Balbir Singh wrote:
>
>> malc wrote:
>>> On Thu, 14 Jun 2007, Ingo Molnar wrote:
>>>
>
> [..snip..]
>
>>>
>>> Now integral load matches the one obtained via the "accurate" method.
>>> However the report for individual cores are of by around 20% percent.
>>>
>>
>> I think I missed some of the context, is the accounting of individual
>> tasks
>> or cpustat values off by 20%? I'll try and reproduce this problem.
>
> Neither actually, the individual core idle times reported via `/proc/stat'
> are off by 20 percent, one over estimates and the other under estimates
> and the sum is right on the mark.
>

Interesting, the idle time accounting (done from account_system_time())
has not changed. Has your .config changed? Could you please send
it across. I've downloaded apc and I am trying to reproduce your problem.

>>
>> Could you provide more details on the APC tool that you are using -- I
>> do not understand the orange and yellow lines, do they represent system
>> and user time?
>
> It's somewhat documented on the page: http://www.boblycat.org/~malc/apc
> Anyway the left bar is based on information from `/proc/stat' the right
> one is derived from the kernel module that just times how much time was
> spent in idle handler. The graphs: red - `/proc/stat', yellow - module.
>
>> NOTE: There is some inconsistency in the values reported by /usr/bin/time
>> (getrusage) and values reported in /proc or through delay accounting.
>
> I don't really use `getrusage'.
>

Tools like /usr/bin/time do.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2007-06-16 14:08:18

by malc

[permalink] [raw]
Subject: Re: [patch] sched: accurate user accounting

On Sat, 16 Jun 2007, Balbir Singh wrote:

> malc wrote:
>> On Fri, 15 Jun 2007, Balbir Singh wrote:
>>
>>> malc wrote:
>>>> On Thu, 14 Jun 2007, Ingo Molnar wrote:
>>>>
>>
>> [..snip..]
>>
>>>>
>>>> Now integral load matches the one obtained via the "accurate" method.
>>>> However the report for individual cores are of by around 20% percent.
>>>>
>>>
>>> I think I missed some of the context, is the accounting of individual
>>> tasks
>>> or cpustat values off by 20%? I'll try and reproduce this problem.
>>
>> Neither actually, the individual core idle times reported via `/proc/stat'
>> are off by 20 percent, one over estimates and the other under estimates
>> and the sum is right on the mark.
>>
>
> Interesting, the idle time accounting (done from account_system_time())
> has not changed. Has your .config changed? Could you please send
> it across. I've downloaded apc and I am trying to reproduce your problem.

http://www.boblycat.org/~malc/apc/cfs/ has config for 2.6.21 an the
diff against 2.6.21.4-cfs-v16.

I updated hog (can be found in the above directory) to call setitimer
with a bit saner values (apprantly tickless has profound effect on
itimers interface). While fooling around with this version of hog
on 2.6.21.4-cfs-v16 i stumbled upon rather strange behavior (the
screenshot is also at the address above, note that kernel was booted
with maxcpus=1)

[..snip..]

--
vale

2007-06-16 18:41:20

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] sched: accurate user accounting


* malc <[email protected]> wrote:

> > Interesting, the idle time accounting (done from
> > account_system_time()) has not changed. Has your .config changed?
> > Could you please send it across. I've downloaded apc and I am trying
> > to reproduce your problem.
>
> http://www.boblycat.org/~malc/apc/cfs/ has config for 2.6.21 an the
> diff against 2.6.21.4-cfs-v16.

hm. Could you add this to your kernel boot command line:

highres=off nohz=off

and retest, to inquire whether this problem is independent of any
timer-events effects?

Ingo

2007-06-16 20:33:31

by malc

[permalink] [raw]
Subject: Re: [patch] sched: accurate user accounting

On Sat, 16 Jun 2007, Ingo Molnar wrote:

>
> * malc <[email protected]> wrote:
>
>>> Interesting, the idle time accounting (done from
>>> account_system_time()) has not changed. Has your .config changed?
>>> Could you please send it across. I've downloaded apc and I am trying
>>> to reproduce your problem.
>>
>> http://www.boblycat.org/~malc/apc/cfs/ has config for 2.6.21 an the
>> diff against 2.6.21.4-cfs-v16.
>
> hm. Could you add this to your kernel boot command line:
>
> highres=off nohz=off
>
> and retest, to inquire whether this problem is independent of any
> timer-events effects?

It certainly makes a difference. Without dynticks however scheduler
still moves the task (be it hog or mplayer) between CPUs for no good
reason, for hog the switching is every few seconds (instead of more or
less all the time in case of dynticks). What's rather strange is that
while it hogs 7x% of CPU on core#1 it only hogs 3x% on core#2
(percentage is obtained by timing idle handler not form /proc/stat,
according to /proc/stat either core is 0% loaded)..

Live report... After a while it stabilized and was running on core#2
all the time, when the process was stopped and restarted it started to
run constantly on core#1 (with ~70% load)

Btw. i don't want to waste anyones time here. All i originally wanted
is to know if something was done (as per LWN article) with increasing
the accuracy of system wide statistics (in /proc/stat), turns out that
nothing really happened in this area, but latest developments
(CFS/dynticks) have some peculiar effect on hog. Plus this constant
migration of hog/mplayer is somewhat strange.

Live report... again.. Okay now that hog stabilized on running
exclusively on core#1 and at 70% load i switched to the machine
where it runs and after just switching the windows in IceWM
resulted in system load dropping to 30%.. Quite reproducible too.

--
vale