LinuxLists.cc - (ondemand) CPU governor regression between 2.6.23 and 2.6.24

2008-01-26 14:06:44

Subject: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

I use a 1-liner for a simple performance check : "time factor 819734028463158891"
Here is the result for the new (Gentoo) kernel 2.6.24:

With the ondemand governor of the I get:

tfoerste@n22 ~/tmp $ time factor 819734028463158891
819734028463158891: 3 273244676154386297

real 0m32.997s
user 0m15.732s
sys 0m0.014s

With the ondemand governor the CPU runs at 600 MHz,
whereas with the performance governor I get :

tfoerste@n22 ~/tmp $ time factor 819734028463158891
819734028463158891: 3 273244676154386297

real 0m10.893s
user 0m5.444s
sys 0m0.000s

(~5.5 sec as I expected) b/c the CPU is set to 1.7 GHz.

The ondeman governor of previous kernel versions however automatically increased
the CPU speed from 600 MHz to 1.7 GHz.

My system is a ThinkPad T41, I'll attach the .config

--
MfG/Sincerely

Toralf F?rster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

Attachments:

(No filename) (0.00 B)
signature.asc (189.00 B)
This is a digitally signed message part. Download all attachments

2008-01-26 17:12:01

by Tomasz Chmielewski

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

Toralf Förster wrote:

> I use a 1-liner for a simple performance check : "time factor 819734028463158891"
> Here is the result for the new (Gentoo) kernel 2.6.24:
>
> With the ondemand governor of the I get:
>
> tfoerste@n22 ~/tmp $ time factor 819734028463158891
> 819734028463158891: 3 273244676154386297
>
> real 0m32.997s
> user 0m15.732s
> sys 0m0.014s
>
> With the ondemand governor the CPU runs at 600 MHz,
> whereas with the performance governor I get :
>
> tfoerste@n22 ~/tmp $ time factor 819734028463158891
> 819734028463158891: 3 273244676154386297
>
> real 0m10.893s
> user 0m5.444s
> sys 0m0.000s
>
> (~5.5 sec as I expected) b/c the CPU is set to 1.7 GHz.
>
> The ondeman governor of previous kernel versions however automatically increased
> the CPU speed from 600 MHz to 1.7 GHz.
>
> My system is a ThinkPad T41, I'll attach the .config

During the test, run top, and watch your CPU usage. Does it go above 80%
(the default for
/sys/devices/system/cpu/cpu0/cpufreq/ondemand/up_threshold).

ondemand CPUfreq governor has a few tunables, described in
Documentation/cpu-freq. One of them is up_threshold:

up_threshold: defines what the average CPU usaged between the samplings
of 'sampling_rate' needs to be for the kernel to make a decision on
whether it should increase the frequency. For example when it is set
to its default value of '80' it means that between the checking
intervals the CPU needs to be on average more than 80% in use to then
decide that the CPU frequency needs to be increased.

What CPUFreq processor driver are you using?

I had a similar problem with CPUfreq and dm-crypt (slow reads), see
(more setup problem than something kernel-related):

http://blog.wpkg.org/2008/01/22/cpufreq-and-dm-crypt-performance-problems/

--
Tomasz Chmielewski

2008-01-26 18:47:17

by Toralf Förster

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

The problem is the same as described here : http://lkml.org/lkml/2007/10/21/85
If I run dnetc even with lowest prority than the CPU stays at 600 MHz regardless
of any other load (eg. rsyncing, svn update, compiling, ...)

Stopping the dnetc process immediately speeds up the CPU up to 1.7 GHz.

Am Samstag, 26. Januar 2008 schrieben Sie:
> During the test, run top, and watch your CPU usage. Does it go above 80%
> (the default for
> /sys/devices/system/cpu/cpu0/cpufreq/ondemand/up_threshold).

No, instead I get :

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7294 dnetc 39 19 664 348 264 R 49.5 0.0 0:48.68 dnetc
7310 tfoerste 20 0 1796 492 428 R 48.5 0.0 0:07.19 factor
7050 root 20 0 96736 8872 3972 S 0.7 0.9 0:02.99 X

> What CPUFreq processor driver are you using?
I use the native kernel built-in ondemand governor. BTW, here are the settings:

n22 /sys/devices/system/cpu/cpu0/cpufreq/ondemand # tail -v *
==> ignore_nice_load <==
1

==> powersave_bias <==
0

==> sampling_rate <==
500000

==> sampling_rate_max <==
250000000

==> sampling_rate_min <==
250000

==> up_threshold <==
80

--
MfG/Sincerely

Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

Attachments:

(No filename) (1.23 kB)
signature.asc (189.00 B)
This is a digitally signed message part. Download all attachments

2008-01-26 21:38:31

by Toralf Förster

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

It seems to be rather a scheduler issue than a governor issue b/c
the issue went away after unsetting CONFIG_FAIR_GROUP_SCHED.

If I unselect CONFIG_FAIR_GROUP_SCHED then the %CPU value raises 80%
- which forces the ondemand governor do speed up the CPU frequency:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7137 tfoerste 20 0 1796 488 428 R 95.5 0.0 0:01.40 factor
7083 dnetc 39 19 664 348 264 R 2.1 0.0 3:08.33 dnetc
4033 root 20 0 97252 9420 4008 R 0.7 0.9 0:09.43 X

Am Samstag, 26. Januar 2008 schrieben Sie:
> Toralf Förster wrote:
>
> > I use a 1-liner for a simple performance check : "time factor 819734028463158891"
> > Here is the result for the new (Gentoo) kernel 2.6.24:
> >
> > With the ondemand governor of the I get:
> >
> > tfoerste@n22 ~/tmp $ time factor 819734028463158891
> > 819734028463158891: 3 273244676154386297
> >
> > real 0m32.997s
> > user 0m15.732s
> > sys 0m0.014s
> >
> > With the ondemand governor the CPU runs at 600 MHz,
> > whereas with the performance governor I get :
> >
> > tfoerste@n22 ~/tmp $ time factor 819734028463158891
> > 819734028463158891: 3 273244676154386297
> >
> > real 0m10.893s
> > user 0m5.444s
> > sys 0m0.000s
> >
> > (~5.5 sec as I expected) b/c the CPU is set to 1.7 GHz.
> >
> > The ondeman governor of previous kernel versions however automatically increased
> > the CPU speed from 600 MHz to 1.7 GHz.
> >
> > My system is a ThinkPad T41, I'll attach the .config
>
> During the test, run top, and watch your CPU usage. Does it go above 80%
> (the default for
> /sys/devices/system/cpu/cpu0/cpufreq/ondemand/up_threshold).
>
> ondemand CPUfreq governor has a few tunables, described in
> Documentation/cpu-freq. One of them is up_threshold:
>
> up_threshold: defines what the average CPU usaged between the samplings
> of 'sampling_rate' needs to be for the kernel to make a decision on
> whether it should increase the frequency. For example when it is set
> to its default value of '80' it means that between the checking
> intervals the CPU needs to be on average more than 80% in use to then
> decide that the CPU frequency needs to be increased.
>
> What CPUFreq processor driver are you using?
>
>
> I had a similar problem with CPUfreq and dm-crypt (slow reads), see
> (more setup problem than something kernel-related):
>
> http://blog.wpkg.org/2008/01/22/cpufreq-and-dm-crypt-performance-problems/
>
>

--
MfG/Sincerely

Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

Attachments:

(No filename) (2.53 kB)
signature.asc (189.00 B)
This is a digitally signed message part. Download all attachments

2008-01-26 21:45:35

by Sam Ravnborg

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

Added Ingo + Peter.

Sam

On Sat, Jan 26, 2008 at 10:38:15PM +0100, Toralf F?rster wrote:
> It seems to be rather a scheduler issue than a governor issue b/c
> the issue went away after unsetting CONFIG_FAIR_GROUP_SCHED.
>
> If I unselect CONFIG_FAIR_GROUP_SCHED then the %CPU value raises 80%
> - which forces the ondemand governor do speed up the CPU frequency:
>
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 7137 tfoerste 20 0 1796 488 428 R 95.5 0.0 0:01.40 factor
> 7083 dnetc 39 19 664 348 264 R 2.1 0.0 3:08.33 dnetc
> 4033 root 20 0 97252 9420 4008 R 0.7 0.9 0:09.43 X
>
>
> Am Samstag, 26. Januar 2008 schrieben Sie:
> > Toralf F?rster wrote:
> >
> > > I use a 1-liner for a simple performance check : "time factor 819734028463158891"
> > > Here is the result for the new (Gentoo) kernel 2.6.24:
> > >
> > > With the ondemand governor of the I get:
> > >
> > > tfoerste@n22 ~/tmp $ time factor 819734028463158891
> > > 819734028463158891: 3 273244676154386297
> > >
> > > real 0m32.997s
> > > user 0m15.732s
> > > sys 0m0.014s
> > >
> > > With the ondemand governor the CPU runs at 600 MHz,
> > > whereas with the performance governor I get :
> > >
> > > tfoerste@n22 ~/tmp $ time factor 819734028463158891
> > > 819734028463158891: 3 273244676154386297
> > >
> > > real 0m10.893s
> > > user 0m5.444s
> > > sys 0m0.000s
> > >
> > > (~5.5 sec as I expected) b/c the CPU is set to 1.7 GHz.
> > >
> > > The ondeman governor of previous kernel versions however automatically increased
> > > the CPU speed from 600 MHz to 1.7 GHz.
> > >
> > > My system is a ThinkPad T41, I'll attach the .config
> >
> > During the test, run top, and watch your CPU usage. Does it go above 80%
> > (the default for
> > /sys/devices/system/cpu/cpu0/cpufreq/ondemand/up_threshold).
> >
> > ondemand CPUfreq governor has a few tunables, described in
> > Documentation/cpu-freq. One of them is up_threshold:
> >
> > up_threshold: defines what the average CPU usaged between the samplings
> > of 'sampling_rate' needs to be for the kernel to make a decision on
> > whether it should increase the frequency. For example when it is set
> > to its default value of '80' it means that between the checking
> > intervals the CPU needs to be on average more than 80% in use to then
> > decide that the CPU frequency needs to be increased.
> >
> > What CPUFreq processor driver are you using?
> >
> >
> > I had a similar problem with CPUfreq and dm-crypt (slow reads), see
> > (more setup problem than something kernel-related):
> >
> > http://blog.wpkg.org/2008/01/22/cpufreq-and-dm-crypt-performance-problems/
> >
> >
>
> --
> MfG/Sincerely
>
> Toralf F?rster
> pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2008-01-27 12:41:29

by Toralf Förster

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

Am Sonntag, 27. Januar 2008 schrieben Sie:
>
> On Sun, 2008-01-27 at 12:00 +0100, Toralf Förster wrote:
> > BTW the dnetc process runs under the user "dnetc" with nice level -19,
> > my process runs under my own user id "tfoerste" therefore I wouldn't expect
> > that both processes got the same processor resources isn't it ? :
>
> Normal. Nice level controls cpu distribution _within_ a task group,
> whereas distribution between groups is controlled by cpu_share. It's
> going to take a while for folks to get used to having two levels of cpu
> distribution.

Ough, does this mean that for a multi-user scenario of 2 non-root users "A" and
"B" each running exactly 1 process with nice level 0 and 19 rerspectively
that both share ~50% of the CPU *and furthermore* that that user "B" does never
ever have a chance to be nice to user "A" although his process should really
use only those CPU cycles not eated by any other user ?

If the answer is yes what's about extending the current behaviour to consider
(optionally) nice level of running processes in the case where
CONFIG_FAIR_GROUP_SCHED is set ?

But anyway the initial email reports not a regression related to the ondemand
governor.

--
MfG/Sincerely

Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

Attachments:

(No filename) (1.27 kB)
signature.asc (189.00 B)
This is a digitally signed message part. Download all attachments

2008-01-27 14:32:55

by Srivatsa Vaddagiri

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

On Sat, Jan 26, 2008 at 07:46:51PM +0100, Toralf F?rster wrote:
>
> The problem is the same as described here : http://lkml.org/lkml/2007/10/21/85
> If I run dnetc even with lowest prority than the CPU stays at 600 MHz regardless
> of any other load (eg. rsyncing, svn update, compiling, ...)
>
> Stopping the dnetc process immediately speeds up the CPU up to 1.7 GHz.
>
>
> Am Samstag, 26. Januar 2008 schrieben Sie:
> > During the test, run top, and watch your CPU usage. Does it go above 80%
> > (the default for
> > /sys/devices/system/cpu/cpu0/cpufreq/ondemand/up_threshold).
>
> No, instead I get :
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 7294 dnetc 39 19 664 348 264 R 49.5 0.0 0:48.68 dnetc
> 7310 tfoerste 20 0 1796 492 428 R 48.5 0.0 0:07.19 factor
> 7050 root 20 0 96736 8872 3972 S 0.7 0.9 0:02.99 X

Hi Toralf,
Can you list the o/p you see for overall cpu usage? You should
see something like below right at the top of the o/p:

top - 20:03:59 up 12 days, 21:39, 18 users, load average: 0.22, 0.20, 0.25
Tasks: 200 total, 5 running, 193 sleeping, 0 stopped, 2 zombie
Cpu(s): 2.6% us, 1.3% sy, 0.0% ni, 96.0% id, 0.0% wa, 0.0% hi, 0.0% si, 0.0% st

The third line (giving overall cpu usage stats) is what is interesting here.
If you have more than one cpu, you can get cpu usage stats for each cpu
in top by pressing 1. Can you provide this information with and w/o
CONFIG_FAIR_GROUP_SCHED?

If I am not mistaken, cpu ondemand gov goes by the cpu idle time stats,
which should not be affected by FAIR_GROUP_SCHED. I will lookaround for
other possible causes.

--
Regards,
vatsa

2008-01-27 15:06:34

by Toralf Förster

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

Am Sonntag, 27. Januar 2008 schrieb Srivatsa Vaddagiri:
> On Sat, Jan 26, 2008 at 07:46:51PM +0100, Toralf F?rster wrote:
> >
> > The problem is the same as described here : http://lkml.org/lkml/2007/10/21/85
> > If I run dnetc even with lowest prority than the CPU stays at 600 MHz regardless
> > of any other load (eg. rsyncing, svn update, compiling, ...)
> >
> > Stopping the dnetc process immediately speeds up the CPU up to 1.7 GHz.
> >
> >
> > Am Samstag, 26. Januar 2008 schrieben Sie:
> > > During the test, run top, and watch your CPU usage. Does it go above 80%
> > > (the default for
> > > /sys/devices/system/cpu/cpu0/cpufreq/ondemand/up_threshold).
> >
> > No, instead I get :
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 7294 dnetc 39 19 664 348 264 R 49.5 0.0 0:48.68 dnetc
> > 7310 tfoerste 20 0 1796 492 428 R 48.5 0.0 0:07.19 factor
> > 7050 root 20 0 96736 8872 3972 S 0.7 0.9 0:02.99 X
>
> Hi Toralf,
> Can you list the o/p you see for overall cpu usage? You should
> see something like below right at the top of the o/p:
>
> top - 20:03:59 up 12 days, 21:39, 18 users, load average: 0.22, 0.20, 0.25
> Tasks: 200 total, 5 running, 193 sleeping, 0 stopped, 2 zombie
> Cpu(s): 2.6% us, 1.3% sy, 0.0% ni, 96.0% id, 0.0% wa, 0.0% hi, 0.0% si, 0.0% st
>
> The third line (giving overall cpu usage stats) is what is interesting here.
> If you have more than one cpu, you can get cpu usage stats for each cpu
> in top by pressing 1. Can you provide this information with and w/o
> CONFIG_FAIR_GROUP_SCHED?

This is what I get if I set CONFIG_FAIR_GROUP_SCHED to "y"

top - 16:00:59 up 2 min, 1 user, load average: 2.56, 1.60, 0.65
Tasks: 84 total, 3 running, 81 sleeping, 0 stopped, 0 zombie
Cpu(s): 49.7%us, 0.3%sy, 49.7%ni, 0.0%id, 0.0%wa, 0.3%hi, 0.0%si, 0.0%st
Mem: 1036180k total, 322876k used, 713304k free, 13164k buffers
Swap: 997880k total, 0k used, 997880k free, 149208k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6070 dnetc 39 19 664 348 264 R 49.7 0.0 1:09.71 dnetc
6676 tfoerste 20 0 1796 488 428 R 49.3 0.0 0:02.72 factor

Stopping dnetc gives:

top - 16:02:36 up 4 min, 1 user, load average: 2.50, 1.87, 0.83
Tasks: 89 total, 3 running, 86 sleeping, 0 stopped, 0 zombie
Cpu(s): 99.3%us, 0.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1036180k total, 378760k used, 657420k free, 14736k buffers
Swap: 997880k total, 0k used, 997880k free, 180868k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6766 tfoerste 20 0 1796 488 428 R 84.9 0.0 0:05.41 factor

> If I am not mistaken, cpu ondemand gov goes by the cpu idle time stats,
> which should not be affected by FAIR_GROUP_SCHED. I will lookaround for
> other possible causes.

As I stated our in http://lkml.org/lkml/2008/1/26/207 the issue is solved
after unselecting FAIR_GROUP_SCHED.

BTW my answer to an email of Mike Galbraith was Cced to the lkml here :
http://lkml.org/lkml/2008/1/27/116

--
MfG/Sincerely

Toralf F?rster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

Attachments:

(No filename) (3.18 kB)
signature.asc (189.00 B)
This is a digitally signed message part. Download all attachments

2008-01-27 16:41:04

by Srivatsa Vaddagiri

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

On Sun, Jan 27, 2008 at 04:06:17PM +0100, Toralf F?rster wrote:
> > The third line (giving overall cpu usage stats) is what is interesting here.
> > If you have more than one cpu, you can get cpu usage stats for each cpu
> > in top by pressing 1. Can you provide this information with and w/o
> > CONFIG_FAIR_GROUP_SCHED?
>
> This is what I get if I set CONFIG_FAIR_GROUP_SCHED to "y"
>
> top - 16:00:59 up 2 min, 1 user, load average: 2.56, 1.60, 0.65
> Tasks: 84 total, 3 running, 81 sleeping, 0 stopped, 0 zombie
> Cpu(s): 49.7%us, 0.3%sy, 49.7%ni, 0.0%id, 0.0%wa, 0.3%hi, 0.0%si, 0.0%st
> Mem: 1036180k total, 322876k used, 713304k free, 13164k buffers
> Swap: 997880k total, 0k used, 997880k free, 149208k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 6070 dnetc 39 19 664 348 264 R 49.7 0.0 1:09.71 dnetc
> 6676 tfoerste 20 0 1796 488 428 R 49.3 0.0 0:02.72 factor
>
> Stopping dnetc gives:
>
> top - 16:02:36 up 4 min, 1 user, load average: 2.50, 1.87, 0.83
> Tasks: 89 total, 3 running, 86 sleeping, 0 stopped, 0 zombie
> Cpu(s): 99.3%us, 0.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Mem: 1036180k total, 378760k used, 657420k free, 14736k buffers
> Swap: 997880k total, 0k used, 997880k free, 180868k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 6766 tfoerste 20 0 1796 488 428 R 84.9 0.0 0:05.41 factor

Thanks for this respone. This confirms that cpu's idle time is close to
zero, as I intended to verify.

> > If I am not mistaken, cpu ondemand gov goes by the cpu idle time stats,
> > which should not be affected by FAIR_GROUP_SCHED. I will lookaround for
> > other possible causes.

On further examination, ondemand governor seems to have a tunable to
ignore nice load. In your case, I see that dnetc is running at a
positive nice value (19) which could explain why ondemand gov thinks
that the cpu is only ~50% loaded.

Can you check what is the setting of this knob in your case?

# cat /sys/devices/system/cpu/cpu0/cpufreq/ondemand/ignore_nice_load

You can set that to 0 to ask ondemand gov to include nice load into
account while calculating cpu freq changes:

# echo 0 > /sys/devices/system/cpu/cpu0/cpufreq/ondemand/ignore_nice_load

This should restore the behavior of ondemand governor as seen in 2.6.23
in your case (even with CONFIG_FAIR_GROUP_SCHED enabled). Can you pls confirm
if that happens?

> As I stated our in http://lkml.org/lkml/2008/1/26/207 the issue is solved
> after unselecting FAIR_GROUP_SCHED.

I understand, but we want to keep CONFIG_FAIR_GROUP_SCHED enabled by
default.

Ingo,
Most folks seem to be used to a global nice-domain, where a nice 19
task gives up cpu in competetion to a nice-0 task (irrespective of which
userid's they belong to). CONFIG_FAIR_USER_SCHED brings noticeable changes wrt
that. We could possibly let it be as it is (since that is what a server
admin may possibly want when managing university servers) or modify it to be
aware of nice-level (priority of user-sched entity is equivalent to highest
prio task it has).

In any case, I will send across a patch to turn off CONFIG_FAIR_USER_SCHED by
default (and instead turn on CONFIG_FAIR_CGROUP_SCHED by default).

--
Regards,
vatsa

2008-01-27 16:57:23

by Toralf Förster

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

At Sunday 27 January 2008 Srivatsa Vaddagiri wrote :
> You can set that to 0 to ask ondemand gov to include nice load into
> account while calculating cpu freq changes:
>
> # echo 0 > /sys/devices/system/cpu/cpu0/cpufreq/ondemand/ignore_nice_load
>
> This should restore the behavior of ondemand governor as seen in 2.6.23
> in your case (even with CONFIG_FAIR_GROUP_SCHED enabled). Can you pls confirm
> if that happens?

Yes, of course, unfortunately this speeds up the CPU up to max power consumption
which isn't wanted at least at a notebook b/c temperature and fan speed are at
maximum in that case :-(

It would be nice to run a grid application at lowest priority without impact to
power / fan / temperature but OTOH have full performance for desktop
applications, isn't it ?

--
MfG/Sincerely

Toralf F?rster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

Attachments:

(No filename) (890.00 B)
signature.asc (189.00 B)
This is a digitally signed message part. Download all attachments

2008-01-27 18:59:06

by Mike Galbraith

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

On Sun, 2008-01-27 at 13:39 +0100, Toralf Förster wrote:
> Am Sonntag, 27. Januar 2008 schrieben Sie:
> >
> > On Sun, 2008-01-27 at 12:00 +0100, Toralf Förster wrote:
> > > BTW the dnetc process runs under the user "dnetc" with nice level -19,
> > > my process runs under my own user id "tfoerste" therefore I wouldn't expect
> > > that both processes got the same processor resources isn't it ? :
> >
> > Normal. Nice level controls cpu distribution _within_ a task group,
> > whereas distribution between groups is controlled by cpu_share. It's
> > going to take a while for folks to get used to having two levels of cpu
> > distribution.
>
> Ough, does this mean that for a multi-user scenario of 2 non-root users "A" and
> "B" each running exactly 1 process with nice level 0 and 19 rerspectively
> that both share ~50% of the CPU *and furthermore* that that user "B" does never
> ever have a chance to be nice to user "A" although his process should really
> use only those CPU cycles not eated by any other user ?

Yes. If you want one task group to receive less cpu cycles, you have to
'nice' that task group by reducing it's share.

> If the answer is yes what's about extending the current behaviour to consider
> (optionally) nice level of running processes in the case where
> CONFIG_FAIR_GROUP_SCHED is set ?

I think it's better to just disable fair group scheduling if it doesn't
suit your needs. It's not going to be everyone's cup of tea.

-Mike

2008-01-27 21:15:03

by Toralf Förster

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

At Sunday 27 January 2008 Mike Galbraith wrote :
>
> On Sun, 2008-01-27 at 13:39 +0100, Toralf Förster wrote:
> > Ough, does this mean that for a multi-user scenario of 2 non-root users "A" and
> > "B" each running exactly 1 process with nice level 0 and 19 rerspectively
> > that both share ~50% of the CPU *and furthermore* that that user "B" does never
> > ever have a chance to be nice to user "A" although his process should really
> > use only those CPU cycles not eated by any other user ?
>
> Yes. If you want one task group to receive less cpu cycles, you have to
> 'nice' that task group by reducing it's share.

> I think it's better to just disable fair group scheduling if it doesn't
> suit your needs. It's not going to be everyone's cup of tea.

Yes, disabling this kernel option is much better for me as a notebook user.

BTW t I've one more question related to this topic:

Is it correct that within the scenario described above user "A" never gets more
than 50% of the CPU as soon as user "B" is logged into the system (because of
the login process itself) ?

> -Mike
>

--
MfG/Sincerely

Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

Attachments:

(No filename) (1.17 kB)
signature.asc (189.00 B)
This is a digitally signed message part. Download all attachments

2008-01-27 21:26:17

by Peter Zijlstra

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

On Sun, 2008-01-27 at 22:14 +0100, Toralf Förster wrote:

> Is it correct that within the scenario described above user "A" never gets more
> than 50% of the CPU as soon as user "B" is logged into the system (because of
> the login process itself) ?

No, the login process doesn't normally consume any significant amount of
cpu time.

2008-01-27 21:27:40

by Peter Zijlstra

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

On Sun, 2008-01-27 at 17:57 +0100, Toralf Förster wrote:

> It would be nice to run a grid application at lowest priority without impact to
> power / fan / temperature but OTOH have full performance for desktop
> applications, isn't it ?

This can be achieved by giving the group/uid the grid application uses a
weight of 2.

2008-01-27 22:32:41

by Ingo Molnar

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

* Peter Zijlstra <[email protected]> wrote:

> On Sun, 2008-01-27 at 17:57 +0100, Toralf F?rster wrote:
>
> > It would be nice to run a grid application at lowest priority
> > without impact to power / fan / temperature but OTOH have full
> > performance for desktop applications, isn't it ?
>
> This can be achieved by giving the group/uid the grid application uses
> a weight of 2.

yes, that's the correct solution. For example, the following line in
/etc/rc.d/rc.local:

echo 2 > /sys/kernel/uids/`grep -w nobody /etc/passwd | cut -d: -f3`/cpu_share

sets user 'nobody' to a very low cpu weight. If there's any grid user,
it can be done similarly. The default is 1024.

Ingo

2008-01-28 08:39:36

by Helge Hafting

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

Toralf Förster wrote:
> At Sunday 27 January 2008 Srivatsa Vaddagiri wrote :
>
>> You can set that to 0 to ask ondemand gov to include nice load into
>> account while calculating cpu freq changes:
>>
>> # echo 0 > /sys/devices/system/cpu/cpu0/cpufreq/ondemand/ignore_nice_load
>>
>> This should restore the behavior of ondemand governor as seen in 2.6.23
>> in your case (even with CONFIG_FAIR_GROUP_SCHED enabled). Can you pls confirm
>> if that happens?
>>
>
> Yes, of course, unfortunately this speeds up the CPU up to max power consumption
> which isn't wanted at least at a notebook b/c temperature and fan speed are at
> maximum in that case :-(
>
> It would be nice to run a grid application at lowest priority without impact to
> power / fan / temperature but OTOH have full performance for desktop
> applications, isn't it ?
>
In theory, the fix is simple:
If _non-niced_ tasks use more than 80% of the cputime _made available to
them_,
then increase the processor speed.

The cputime allocated to niced tasks (that may be cpu intensive but
shouldn't
cause max speed on their own) won't matter then.

Helge Hafting

2008-01-28 13:18:49

by Ingo Molnar

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

Toralf, for me the group scheduler offers superior interactivity on my
laptop for a number of reasons. The biggest practical effect is because
it splits the CPU time between Xorg (root UID) and desktop apps. This
helps particularly well when there's compile jobs going on, etc. - Xorg
still gets a guaranteed share of CPU time which is a nice touch. The
mouse does not lag that much under load, etc. It's not always possible
to renice every aspect of my destop.

i wasnt using dnetc myself, so i never triggered your particular issue -
but i met a similar issue with the distcc user. I think it's more robust
in general to isolate the dnetc user a bit from the rest of the system -
even at nice +19 dnetc can interact with your desktop apps.

( In the long run, dnetc (and distcc, and all the other batch/clustering
apps) would automatically set their uid to a lower cpu_share value, so
this manual tweaking would not be needed. )

So if you have some time to play with this, could you please try the
following experiment. Put the following line into your
/etc/rc.d/rc.local file:

echo 2 > /sys/kernel/uids/`grep -w dnetc /etc/passwd | cut -d: -f3`/cpu_share

with group scheduling (CONFIG_FAIR_GROUP_SCHED=y) enabled. Also apply
the patch attached below as well - which fixes some interactivity
problems with group scheduling.

Could you try that kernel and compare it to a FAIR_GROUP_SCHED-disabled
kernel's interactivity, and send us your observations?

the group scheduler needs tuning in your case, but in the end, i believe
it can offer even better interactivity than what we had before - so it
would be nice if you could try it and compare.

If this still doesnt do the trick and the group scheduler is worse in
your testing then there's something else going on as well which we need
to fix. (even if you ultimately decide to disable the group scheduler)
At minimum we should be able to reach a "works just as well as with
group scheduling disabled" state. Thanks,

Ingo

Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -520,7 +520,7 @@ place_entity(struct cfs_rq *cfs_rq, stru

if (!initial) {
/* sleeps upto a single latency don't count. */
- if (sched_feat(NEW_FAIR_SLEEPERS) && entity_is_task(se))
+ if (sched_feat(NEW_FAIR_SLEEPERS))
vruntime -= sysctl_sched_latency;

/* ensure we never gain time by being placed backwards. */
@@ -1106,7 +1106,11 @@ static void check_preempt_wakeup(struct
}

gran = sysctl_sched_wakeup_granularity;
- if (unlikely(se->load.weight != NICE_0_LOAD))
+ /*
+ * More easily preempt - nice tasks, while not making
+ * it harder for + nice tasks.
+ */
+ if (unlikely(se->load.weight > NICE_0_LOAD))
gran = calc_delta_fair(gran, &se->load);

if (pse->vruntime + gran < se->vruntime)

2008-01-28 15:16:22

by Toralf Förster

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

Hello,

At Monday 28 January 2008 Ingo Molnar wrote :
>
> it splits the CPU time between Xorg (root UID) and desktop apps. This
> helps particularly well when there's compile jobs going on, etc. - Xorg
good news for all Gentoo users ;)

> So if you have some time to play with this, could you please try the
> following experiment. Put the following line into your
> /etc/rc.d/rc.local file:
>
> echo 2 > /sys/kernel/uids/`grep -w dnetc /etc/passwd | cut -d: -f3`/cpu_share
>
> with group scheduling (CONFIG_FAIR_GROUP_SCHED=y) enabled. Also apply
> the patch attached below as well - which fixes some interactivity
> problems with group scheduling.
> Could you try that kernel and compare it to a FAIR_GROUP_SCHED-disabled
> kernel's interactivity, and send us your observations?

With the patch and the sysfs-option my system works ok and last but not least
with the expected behaviour compared to a previous kernel.

In addition my first impression is that its responseness is better compared to
previous kernel versions and nearly the same compared to a kernel without
FAIR_GROUP_SCHED .

Compared to kernel 2.6.23 the 1-liner "time factor 819734028463158891" needs
now ~5.61 sec which is a little bit higher than the previous value of 5.44 sec.

Thanks for the solution (BTW b/c FAIR_GROUP_SCHED defaults to "y" I could bet
that more peoples run into this case).

--
MfG/Sincerely

Toralf F?rster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

Attachments:

(No filename) (1.45 kB)
signature.asc (189.00 B)
This is a digitally signed message part. Download all attachments

2008-02-04 00:32:49

by Andrew Morton

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

On Sat, 26 Jan 2008 15:06:25 +0100 Toralf F?rster <[email protected]> wrote:

> I use a 1-liner for a simple performance check : "time factor 819734028463158891"
> Here is the result for the new (Gentoo) kernel 2.6.24:
>
> With the ondemand governor of the I get:
>
> tfoerste@n22 ~/tmp $ time factor 819734028463158891
> 819734028463158891: 3 273244676154386297
>
> real 0m32.997s
> user 0m15.732s
> sys 0m0.014s
>
> With the ondemand governor the CPU runs at 600 MHz,
> whereas with the performance governor I get :
>
> tfoerste@n22 ~/tmp $ time factor 819734028463158891
> 819734028463158891: 3 273244676154386297
>
> real 0m10.893s
> user 0m5.444s
> sys 0m0.000s
>
> (~5.5 sec as I expected) b/c the CPU is set to 1.7 GHz.
>
> The ondeman governor of previous kernel versions however automatically increased
> the CPU speed from 600 MHz to 1.7 GHz.
>
> My system is a ThinkPad T41, I'll attach the .config
>

Let's cc the cpufreq list.

If nothing happens (often the case), please raise a report at
bugzilla.kernel.org so we can track the presence of the regression.

Thanks.

2008-02-04 00:36:18

by Andrew Morton

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

On Sun, 3 Feb 2008 16:32:55 -0800 Andrew Morton <[email protected]> wrote:

> If nothing happens (often the case), please raise a report at
> bugzilla.kernel.org so we can track the presence of the regression.

argh, please ignore. I got bitten by the
im-too-lame-to-get-my-References:-header-right blight. Again.

2008-02-04 17:43:24

by Pallipadi, Venkatesh

[permalink] [raw]

Subject: RE: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

>-----Original Message-----
>From: [email protected]
>[mailto:[email protected]] On Behalf Of Andrew Morton
>Sent: Sunday, February 03, 2008 4:33 PM
>To: Toralf F?rster
>Cc: [email protected]; cpufreq@http://www.linux.org.uk
>Subject: Re: (ondemand) CPU governor regression between 2.6.23
>and 2.6.24
>
>On Sat, 26 Jan 2008 15:06:25 +0100 Toralf F?rster
><[email protected]> wrote:
>
>> I use a 1-liner for a simple performance check : "time
>factor 819734028463158891"
>> Here is the result for the new (Gentoo) kernel 2.6.24:
>>
>> With the ondemand governor of the I get:
>>
>> tfoerste@n22 ~/tmp $ time factor 819734028463158891
>> 819734028463158891: 3 273244676154386297
>>
>> real 0m32.997s
>> user 0m15.732s
>> sys 0m0.014s
>>
>> With the ondemand governor the CPU runs at 600 MHz,
>> whereas with the performance governor I get :
>>
>> tfoerste@n22 ~/tmp $ time factor 819734028463158891
>> 819734028463158891: 3 273244676154386297
>>
>> real 0m10.893s
>> user 0m5.444s
>> sys 0m0.000s
>>
>> (~5.5 sec as I expected) b/c the CPU is set to 1.7 GHz.
>>
>> The ondeman governor of previous kernel versions however
>automatically increased
>> the CPU speed from 600 MHz to 1.7 GHz.
>>
>> My system is a ThinkPad T41, I'll attach the .config
>>
>

This looks like is related to the report here
http://www.ussg.iu.edu/hypermail/linux/kernel/0801.3/1260.html

Can you try the workarounds on that thread and see whether the problem goes away.

Thanks,
Venki

2008-02-04 19:18:32

by Toralf Förster

[permalink] [raw]

Subject: Re: (ondemand) CPU governor regression between 2.6.23 and 2.6.24

At Monday 04 February 2008 Pallipadi, Venkatesh wrote :
>
> This looks like is related to the report here
> http://www.ussg.iu.edu/hypermail/linux/kernel/0801.3/1260.html
>
> Can you try the workarounds on that thread and see whether the problem goes away.
>
Yes, I already answered here :

http://lkml.org/lkml/2008/1/28/195

:-)
--
MfG/Sincerely

Toralf F?rster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

Attachments:

(No filename) (436.00 B)
signature.asc (189.00 B)
This is a digitally signed message part. Download all attachments