LinuxLists.cc - [PATCH] cpufreq

2004-10-17 22:29:30

Subject: [PATCH] cpufreq_ondemand

Hi all,

After playing with the cpufreq_ondemand governor (many thanks to those whom
made it) I made a number of alterations which suit me at least. Really
looking for feedback and of course once people have fixed any bugs they find
and made the code look neater, possible inclusion?

The improvements (well I think they are) I have made:

1. I have replaced the algoritm it used to one which calculates the number of
cpu idle cycles that have passed and compares it to the number of cpu
cycles it would have expected to pass (for, the defaults, 20%/80%)

this means a couple of divisions have been removed, which is always
nice and it lead to clearer code (for me at least), that was
until I added the handful of 'if' conditionals though.... :-/

2. controllable through
/sys/.../ondemand/ignore_nice, you can tell it to consider 'nice'
time as also idle cpu cycles. Set it to '1' to treat 'nice' as cpu
in an active state.

3. (major) the scaling up and down of the cpufreq is now smoother. I found
it really nasty that if it tripped < 20% idle time that the freq was
set to 100%. This code smoothly increases the cpufreq as well as
doing a better job of decreasing it too

4. (minor) I changed DEF_SAMPLING_RATE_LATENCY_MULTIPLIER to 50000 and
DEF_SAMPLING_DOWN_FACTOR to 5 as I found the defaults a bit annoying
on my system and resulted in the cpufreq constantly jumping.

For my patch it works far better if the sampling rate is much lower
anyway, which can only be good for cpu efficiency in the long run

5. the grainity of how much cpufreq is increased or decreased is controlled
with sending a percentage to /sys/.../ondemand/freq_step_percent

6. debugging (with 'watch -n1 cat /sys/.../ondemand/requested_freq') and
backwards 'compatibility' to act like the 'userspace' governor is
avaliable with /sys/.../ondemand/requested_freq if
'freq_step_percent' is set to zero

7. there are extra checks to not bother to try increasing/decreasing the
cpufreq if there is nothing to do, or even can be done as it might
already be at min/max (or freq_step_percent is zero)

The code seems to work for me fine. This is my first patch and the first
thing I have really posted here so be gentle with me :)

Comments and improvements are of course more than welcome.

Of course full thanks go to all the original authors, my C coding is naff and
I would of not been able to do this if it was not for the pretty much
complete (for my needs) cpufreq_ondemand module; Venkatesh did say we could
rip out the core algorithm and replace it with our own easily, he was right
:)

Cheers

Alex

--
___________________________________
< Two is company, three is an orgy. >
-----------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

Attachments:

(No filename) (0.00 B)
signature.asc (189.00 B)
Digital signature Download all attachments

2004-10-17 22:36:04

by Con Kolivas

[permalink] [raw]

Subject: Re: [PATCH] cpufreq_ondemand

Alexander Clouter wrote:
>> 3. (major) the scaling up and down of the cpufreq is now smoother. I found
> it really nasty that if it tripped < 20% idle time that the freq was
> set to 100%. This code smoothly increases the cpufreq as well as
> doing a better job of decreasing it too

I'd much prefer it shot up to 100% or else every time the cpu usage went
up there'd be an obvious lag till the machine ran at it's capable speed.
I very much doubt the small amount of time it spent at 100% speed with
the default design would decrease the battery life significantly as well.

Cheers,
Con

2004-10-17 22:44:36

by Alexander Clouter

[permalink] [raw]

Subject: Re: [PATCH] cpufreq_ondemand

On Oct 18, Con Kolivas wrote:
>
> I'd much prefer it shot up to 100% or else every time the cpu usage went
> up there'd be an obvious lag till the machine ran at it's capable speed.
> I very much doubt the small amount of time it spent at 100% speed with
> the default design would decrease the battery life significantly as well.
>
The issue I found was that if you are running a process that is io bound, for
example, then you may never need to run your cpu at 100%, it will speed up
bit by bit[1] till it gets to a speed that is fast enough to to deal with it
without max'ing the cpufreq.

This is after all exactly want most (if not all) the userspace daemons try to
do anyway.

Cheers

Alex

[1] also you might find that the task does not last long enough to warrant
jumping and lurking at 100% speed anyway

--
_________________________________________
/ It's always darkest just before it gets \
\ pitch black. /
-----------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

Attachments:

(No filename) (1.10 kB)
signature.asc (189.00 B)
Digital signature Download all attachments

2004-10-18 04:56:55

by Pallipadi, Venkatesh

[permalink] [raw]

Subject: RE: [PATCH] cpufreq_ondemand

>-----Original Message-----
>From: Con Kolivas [mailto:[email protected]]
>Sent: Sunday, October 17, 2004 3:36 PM
>To: Alexander Clouter
>Cc: Pallipadi, Venkatesh; cpufreq@http://www.linux.org.uk;
>[email protected]
>Subject: Re: [PATCH] cpufreq_ondemand
>
>Alexander Clouter wrote:
>>> 3. (major) the scaling up and down of the cpufreq is now
>smoother. I found
>> it really nasty that if it tripped < 20% idle time that
>the freq was
>> set to 100%. This code smoothly increases the cpufreq
>as well as
>> doing a better job of decreasing it too
>
>I'd much prefer it shot up to 100% or else every time the cpu
>usage went
>up there'd be an obvious lag till the machine ran at it's
>capable speed.
> I very much doubt the small amount of time it spent at 100%
>speed with
>the default design would decrease the battery life
>significantly as well.
>

True. The current ondemand behaviour is by design. When CPU
is at the lowest freq, and there is a sudden surge in load,
we want it to go to max freq immediately, rather than wait
for some more polling intervals. If max freq is too high,
it will naturally lower to some intermediate freq later.

We can never accurately predict freq for some future load.
Say a CPU capable of 600, 800, 1000, 1200 and 1400 KHz, is
running at 600 and we have sudden 100% CPU utilization, then
we cannot precisely say which should be the next freq. It
can be any of the higher possible freqs. And we felt performance
should get a higher priority whenever there is some
tradeoffs like this.

Thanks,
Venki

2004-10-18 07:29:27

by Dominik Brodowski

[permalink] [raw]

Subject: Re: [PATCH] cpufreq_ondemand

Hi,

On Sun, Oct 17, 2004 at 11:29:16PM +0100, Alexander Clouter wrote:
> After playing with the cpufreq_ondemand governor (many thanks to those whom
> made it) I made a number of alterations which suit me at least. Really
> looking for feedback and of course once people have fixed any bugs they find
> and made the code look neater, possible inclusion?

Or possibly a "fork" -- different dynamic cpufreq governors aren't a bad
thing to have. Else the whole modular approach would be wrong... So, even
if it doesn't get merged into cpufreq_ondemand, you can maintain it as a
differently named cpufreq governor.

> 2. controllable through
> /sys/.../ondemand/ignore_nice, you can tell it to consider 'nice'
> time as also idle cpu cycles. Set it to '1' to treat 'nice' as cpu
> in an active state.

Interesting bit, IIRC some userspace tool also does that.

> 4. (minor) I changed DEF_SAMPLING_RATE_LATENCY_MULTIPLIER to 50000 and
> DEF_SAMPLING_DOWN_FACTOR to 5 as I found the defaults a bit annoying
> on my system and resulted in the cpufreq constantly jumping.
>
> For my patch it works far better if the sampling rate is much lower
> anyway, which can only be good for cpu efficiency in the long run

However, this means it takes much longer for the system to react to changes
in load... it's a tricky issue.

> 6. debugging (with 'watch -n1 cat /sys/.../ondemand/requested_freq') and
> backwards 'compatibility' to act like the 'userspace' governor is
> avaliable with /sys/.../ondemand/requested_freq if
> 'freq_step_percent' is set to zero

Please don't do that. Userspace is the governor for userspace frequency
setting; if you want it, switch to userspace, if you want dynamic frequency
selection, use the original ondemand or your governor.

Dominik

2004-10-18 08:12:09

by Mattia Dongili

[permalink] [raw]

Subject: Re: [PATCH] cpufreq_ondemand

On Mon, Oct 18, 2004 at 09:20:45AM +0200, Dominik Brodowski wrote:
> Hi,
[...]
> > 2. controllable through
> > /sys/.../ondemand/ignore_nice, you can tell it to consider 'nice'
> > time as also idle cpu cycles. Set it to '1' to treat 'nice' as cpu
> > in an active state.
>
> Interesting bit, IIRC some userspace tool also does that.

I'm implementing an "nice_scale" parameter in cpufreqd that offers more
control on nice cpu time. It's just a parameter (whose value must be >=
1 or if 0 don't care nice time at all) that tells _how_much_ the nice
time has to be take into consideration. It would be nice to have it in
the ondemand governor too.

bye
--
mattia
:wq!

2004-10-18 08:25:50

by Alexander Clouter

[permalink] [raw]

Subject: Re: [PATCH] cpufreq_ondemand

Morning all,

On Oct 18, Dominik Brodowski wrote:
>
> Or possibly a "fork" -- different dynamic cpufreq governors aren't a bad
> thing to have. Else the whole modular approach would be wrong... So, even
> if it doesn't get merged into cpufreq_ondemand, you can maintain it as a
> differently named cpufreq governor.
>
but but...that ruins my plans for world domination....

>
> > 2. controllable through
> > /sys/.../ondemand/ignore_nice, you can tell it to consider 'nice'
> > time as also idle cpu cycles. Set it to '1' to treat 'nice' as cpu
> > in an active state.
>
> Interesting bit, IIRC some userspace tool also does that.
>
if I recall they have to munch through the whole of /proc to get this
information; then again there is probably a clean and fast way of pulling
those time values from /proc that I do not know of.

> > 4. (minor) I changed DEF_SAMPLING_RATE_LATENCY_MULTIPLIER to 50000 and
> > DEF_SAMPLING_DOWN_FACTOR to 5 as I found the defaults a bit annoying
> > on my system and resulted in the cpufreq constantly jumping.
> >
> > For my patch it works far better if the sampling rate is much lower
> > anyway, which can only be good for cpu efficiency in the long run
>
> However, this means it takes much longer for the system to react to changes
> in load... it's a tricky issue.
>
its all a case of trade-offs and of course everyones millage will vary. For
me I want the CPU to slowly get faster and faster as a task might complete
fast enough without vamping it up to 100%. Then again Con will probably
point out "pah, then the difference in battery saving is negligable" :)

On a laptop (regardless of whether it gives an overall order of magnitude
power saving or not) I would prefer the cpu speed to be as low as possible.
Again everyone (well here in the UK) I chat to seems to prefer the slow
increasing method which many of the userspace tools try to do anyway; then of
course the argument "userland userland userland....".

> > 6. debugging (with 'watch -n1 cat /sys/.../ondemand/requested_freq') and
> > backwards 'compatibility' to act like the 'userspace' governor is
> > avaliable with /sys/.../ondemand/requested_freq if
> > 'freq_step_percent' is set to zero
>
> Please don't do that. Userspace is the governor for userspace frequency
> setting; if you want it, switch to userspace, if you want dynamic frequency
> selection, use the original ondemand or your governor.
>
I thought a few people would grumble about that. I needed a way to store the
variable speed knob and that struct was the best place for it; looks like me
tarting it up as a 'debugging' feature was not good enough :)

Cheers

Alex

--
________________________________________
/ All articles that coruscate with \
\ resplendence are not truly auriferous. /
----------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

Attachments:

(No filename) (2.91 kB)
signature.asc (189.00 B)
Digital signature Download all attachments

2004-10-18 08:38:56

by Alexander Clouter

[permalink] [raw]

Subject: Re: [PATCH] cpufreq_ondemand

On Oct 17, Pallipadi, Venkatesh wrote:
>
> [snipped]
>
> We can never accurately predict freq for some future load.
> Say a CPU capable of 600, 800, 1000, 1200 and 1400 KHz, is
> running at 600 and we have sudden 100% CPU utilization, then
> we cannot precisely say which should be the next freq. It
> can be any of the higher possible freqs. And we felt performance
> should get a higher priority whenever there is some
> tradeoffs like this.
>
it took me a while to work out why speed decreasing was 'working' whilst
speed increasing was not with my method; a good hour finding out that the
cpufreq (correctly) goes to the lowest match.

My approach was not to try and avoid predicting the desired freq, it was just
to increase it...well on demand at a steady rate towards 100% and then once
the load disappears to reduce it. Having used powernowd and found it do that
rather nicely, then seeing the inclusion of cpufreq_ondemand, I tweaked
cpufreq_ondemand to replace powernowd.

I'm all for "this really should be done in userspace", but for something like
this I have a nagging feeling that its neater in kernel-space. Of course the
userspace one has the advantage (I think cpufreqd does it) that you can
decide if you want to increase the freq depending on what applications are
running.

Of course you are using CPU cycles, though bearly any, to have this floating
requested_freq variable. Of course I would love this to be in the kernel,
mainly though I wanted people to improve upon it and such.

Meanwhile I am thinking of moving that freq_step variable bits to the /sys
show/store functions to remove a avoidable divide.

Cheers

Alex

--
____________________________________
/ Let your conscience be your guide. \
| |
\ -- Pope /
------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

Attachments:

(No filename) (1.95 kB)
signature.asc (189.00 B)
Digital signature Download all attachments

2004-10-18 22:49:39

by Pallipadi, Venkatesh

[permalink] [raw]

Subject: RE: [PATCH] cpufreq_ondemand

>-----Original Message-----
>From: Alexander Clouter <alex-kernel () digriz ! org ! uk>
>Date: 2004-10-17 22:29:16
>Message-ID: <20041017222916.GA30841 () inskipp ! digriz ! org ! uk>
>[Download message RAW]
>
>[Attachment #2 (multipart/mixed)]
>
>
>Hi all,
>
>After playing with the cpufreq_ondemand governor (many thanks
>to those whom
>made it) I made a number of alterations which suit me at
>least. Really
>looking for feedback and of course once people have fixed any
>bugs they find
>and made the code look neater, possible inclusion?
>
>The improvements (well I think they are) I have made:
>
>1. I have replaced the algoritm it used to one which
>calculates the number of
> cpu idle cycles that have passed and compares it to the
>number of cpu
> cycles it would have expected to pass (for, the
>defaults, 20%/80%)
>
> this means a couple of divisions have been removed,
>which is always
> nice and it lead to clearer code (for me at least), that was
> until I added the handful of 'if' conditionals though.... :-/

Good idea. This part of the patch has to go into ondemand governor.
But, I think there is a minor bug in the code though.
With current ondemand governor, we poll at some X freq and check
whether we need to increase the freq. And with some Y freq (Y > X and
a multiple of it), we check whether we need to decrase the freq.
That is the reason I have two different variables
prev_cpu_idle_down and prev_cpu_idle_up to store the previous idle
times at these two different polling intervals (X and Y).
Now, you have previous idle time at only one point. So, this may
not work cleanly. From the code I feel what will happen is
You will only see the CPU activity in last X time and decide on
frequency down decisions (even though you check this with Y polling
interval). Not sure whether I was clear with this explanation.

Note, I haven't really run your version yet. This is what
I feel by looking at the patch. I may well be wrong.

> 2. controllable through
> /sys/.../ondemand/ignore_nice, you can tell it to
>consider 'nice'
> time as also idle cpu cycles. Set it to '1' to treat
>'nice' as cpu
> in an active state.
>

OK. This has to be in ondemand governor as well.

>3. (major) the scaling up and down of the cpufreq is now
>smoother. I found
> it really nasty that if it tripped < 20% idle time that
>the freq was
> set to 100%. This code smoothly increases the cpufreq
>as well as
> doing a better job of decreasing it too
>
>4. (minor) I changed DEF_SAMPLING_RATE_LATENCY_MULTIPLIER to 50000 and
> DEF_SAMPLING_DOWN_FACTOR to 5 as I found the defaults a
>bit annoying
> on my system and resulted in the cpufreq constantly jumping.
>
> For my patch it works far better if the sampling rate
>is much lower
> anyway, which can only be good for cpu efficiency in
>the long run

Somehow, I feel quick response time for increased load is more
important than smooth increase in frequency. As the CPU latency for
doing the freq transition is lower, I think governor should use that
and do quick adjustments to the freq depending on the load. Probably, I
am thinking more in terms of places where performance is critical.
As Dominik pointed out, it's the time to fork put a new ondemand
governor with this algorithm....

>5. the grainity of how much cpufreq is increased or decreased
>is controlled
> with sending a percentage to /sys/.../ondemand/freq_step_percent
>
>6. debugging (with 'watch -n1 cat
>/sys/.../ondemand/requested_freq') and
> backwards 'compatibility' to act like the 'userspace'
>governor is
> avaliable with /sys/.../ondemand/requested_freq if
> 'freq_step_percent' is set to zero

I again agree with Dominik's opinion on this :)

Thanks for all the experiments and all these improvements.
I will rollout a patch for ondemand governor soon, by
stealing some code from your patch below :)

Thanks,
Venki

2004-10-18 23:20:37

by Alexander Clouter

[permalink] [raw]

Subject: Re: [PATCH] cpufreq_ondemand

On Oct 18, Pallipadi, Venkatesh wrote:
>
> >The improvements (well I think they are) I have made:
> >
> >1. I have replaced the algoritm it used to one which
> >calculates the number of
> > cpu idle cycles that have passed and compares it to the
> >number of cpu
> > cycles it would have expected to pass (for, the
> >defaults, 20%/80%)
> >
> > this means a couple of divisions have been removed,
> >which is always
> > nice and it lead to clearer code (for me at least), that was
> > until I added the handful of 'if' conditionals though.... :-/
>
>
> Good idea. This part of the patch has to go into ondemand governor.
>
What I will do over the next few days is split up the patch to little bits
(seems to keep the kernel gods happier, cannot say I blame them) and then
post that for you all to pull apart and mull over?

> But, I think there is a minor bug in the code though.
> With current ondemand governor, we poll at some X freq and check
> whether we need to increase the freq. And with some Y freq (Y > X and
> a multiple of it), we check whether we need to decrase the freq.
> That is the reason I have two different variables
> prev_cpu_idle_down and prev_cpu_idle_up to store the previous idle
> times at these two different polling intervals (X and Y).
> Now, you have previous idle time at only one point. So, this may
> not work cleanly. From the code I feel what will happen is
> You will only see the CPU activity in last X time and decide on
> frequency down decisions (even though you check this with Y polling
> interval). Not sure whether I was clear with this explanation.
>
My code records the number of both the total idle ticks and the overall ticks
at the last interval. This means if I subtract those values for the ones at
the next interval I can work out what the 'cpu use' is over that period thats
just passed by looking at the percentage difference between (total-idle) and
if it trips the expected values if an increase or decrease in frequency was
needed.

This is really the main reason why the polling interval has to be decreased
by a large amount (I make it occur 50 times fewer times) so the period does
not get skewed by *very* brief cpu spikes.

> Note, I haven't really run your version yet. This is what
> I feel by looking at the patch. I may well be wrong.
>
Well in the fashion of the netfilter folk, "Works for Me(tm)" :) Sitting
there with 'watch' on /sys/.../ondemand/requested_freq seems to return
perfectly sane results.

> > 2. controllable through
> > /sys/.../ondemand/ignore_nice, you can tell it to
> >consider 'nice'
> > time as also idle cpu cycles. Set it to '1' to treat
> >'nice' as cpu
> > in an active state.
> >
>
> OK. This has to be in ondemand governor as well.
>
I'll split this out as I think it should be in there.

> >3. (major) the scaling up and down of the cpufreq is now
> >smoother. I found
> > it really nasty that if it tripped < 20% idle time that
> >the freq was
> > set to 100%. This code smoothly increases the cpufreq
> >as well as
> > doing a better job of decreasing it too
> >
> >4. (minor) I changed DEF_SAMPLING_RATE_LATENCY_MULTIPLIER to 50000 and
> > DEF_SAMPLING_DOWN_FACTOR to 5 as I found the defaults a
> >bit annoying
> > on my system and resulted in the cpufreq constantly jumping.
> >
> > For my patch it works far better if the sampling rate
> >is much lower
> > anyway, which can only be good for cpu efficiency in
> >the long run
>
> Somehow, I feel quick response time for increased load is more
> important than smooth increase in frequency. As the CPU latency for
> doing the freq transition is lower, I think governor should use that
> and do quick adjustments to the freq depending on the load. Probably, I
> am thinking more in terms of places where performance is critical.
> As Dominik pointed out, it's the time to fork put a new ondemand
> governor with this algorithm....
>
I have been chatting to a few people and on desktop machines this is the
behaviour they of course prefer. Overshoot and then pull down. However all
us laptop users have a crotch to protect :) We (well four people out of the
*whole* linux community; better than a US poll I hear though) we prefer a
overly conservative approach; hence my approach. I did write it to suit more
my needs obviously :)

> >5. the grainity of how much cpufreq is increased or decreased
> >is controlled
> > with sending a percentage to /sys/.../ondemand/freq_step_percent
> >
> >6. debugging (with 'watch -n1 cat
> >/sys/.../ondemand/requested_freq') and
> > backwards 'compatibility' to act like the 'userspace'
> >governor is
> > avaliable with /sys/.../ondemand/requested_freq if
> > 'freq_step_percent' is set to zero
>
> I again agree with Dominik's opinion on this :)
>
guess the world domination plans go back to....

1. steal all the pants....
2. ....
3. rule the world

I do though think the step_freq bit should be there.

> Thanks for all the experiments and all these improvements.
> I will rollout a patch for ondemand governor soon, by
> stealing some code from your patch below :)
>
Not a problem. I'm in a 'powersaving' mode so will race you if you want to
produce those patches :) After that I have to tell 'wmpower' its a Bad
Idea(tm) to suck up 5% cpu time to poll for the whole ACPI state every 0.5s
with a host of other major issues :-/ Then there is....and...<complains to
himself>

Cheers all

Alex

--
_____________________________________
/ A bird in the hand is worth what it \
\ will bring. /
-------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

Attachments:

(No filename) (5.61 kB)
signature.asc (189.00 B)
Digital signature Download all attachments

2004-10-19 05:06:33

by Willy Tarreau

[permalink] [raw]

Subject: Re: [PATCH] cpufreq_ondemand

Hi,

On Mon, Oct 18, 2004 at 09:39:05AM +0100, Alexander Clouter wrote:
> I'm all for "this really should be done in userspace", but for something like
> this I have a nagging feeling that its neater in kernel-space. Of course the
> userspace one has the advantage (I think cpufreqd does it) that you can
> decide if you want to increase the freq depending on what applications are
> running.

Well, I've used a very simple daemon I wrote for more than a year now on a
vaio, and considering that I sometimes wanted to change it or even stop it,
I clearly prefer it in userspace than in kernel. It was so convenient to
issue a "killall cpufrqd" whenever I wanted 'time' to return accurate values
on a particular process, that I cannot imagine what it would have been if it
had been in the kernel. Moreover, the vaio was unreliable with certain
intermediate frequencies, and it too me a lot of time to discover this
(burnBX was the only reliable trigger). I simply had to change a few lines
in my daemon to use different frequencies and that was all.

Cheers,
Willy

2004-10-19 19:04:45

by Bruno Ducrot

[permalink] [raw]

Subject: Re: [PATCH] cpufreq_ondemand

Hi,

On Mon, Oct 18, 2004 at 08:35:49AM +1000, Con Kolivas wrote:
> Alexander Clouter wrote:
> >>3. (major) the scaling up and down of the cpufreq is now smoother. I
> >>found
> > it really nasty that if it tripped < 20% idle time that the freq was
> > set to 100%. This code smoothly increases the cpufreq as well as
> > doing a better job of decreasing it too
>
> I'd much prefer it shot up to 100% or else every time the cpu usage went
> up there'd be an obvious lag till the machine ran at it's capable speed.
> I very much doubt the small amount of time it spent at 100% speed with
> the default design would decrease the battery life significantly as well.
>

I'm almost ok with your words, but the amd64 do have unacceptable
latency between min and max freq transition, due to the step-by-step
requirements (200MHz IIRC).
Alexander's governor may be then OK for those kind of processors.

--
Bruno Ducrot

-- Which is worse: ignorance or apathy?
-- Don't know. Don't care.

2004-10-20 05:05:10

by Andre Eisenbach

[permalink] [raw]

Subject: Re: [PATCH] cpufreq_ondemand

On Mon, 18 Oct 2004 08:35:49 +1000, Con Kolivas <[email protected]> wrote:
> I'd much prefer it shot up to 100% or else every time the cpu usage went
> up there'd be an obvious lag till the machine ran at it's capable speed.
> I very much doubt the small amount of time it spent at 100% speed with
> the default design would decrease the battery life significantly as well.

I like Alexanders idea better and will give it a good try. If the
speed steps down slowly but shoots up 100% quickly (as it is right
now), even a small task (like opening a folder, or scrolling down in a
document) will cause a tiny spike to 100% which takes a while to go
back down. The result is that the CPU spends most of it's time at 100%
or calming down. I wrote a small test program on my notebook which
confirms this.

It's either or. Either you go up AND down slowly (which I would
prefer), or you go up and down immediately. But spiking up and slowly
going back down is not a good combo.

Alex has my vote, even so I have to give if some more testing.

Cheers,
Andre

2004-10-20 07:43:00

by Brown, Len

[permalink] [raw]

Subject: Re: [PATCH] cpufreq_ondemand

On Wed, 2004-10-20 at 01:03, Andre Eisenbach wrote:

> ... If the
> speed steps down slowly but shoots up 100% quickly (as it is right
> now), even a small task (like opening a folder, or scrolling down in a
> document) will cause a tiny spike to 100% which takes a while to go
> back down. The result is that the CPU spends most of it's time at 100%
> or calming down. I wrote a small test program on my notebook which
> confirms this.

The question is what POLICY we're trying to implement. If the goal is
to to be energy efficient while the user notices no performance hit,
then fast-up/slow-down is an EXCELLENT strategy. But if the goal is to
optimize for power savings at the cost of impacting performance, then
another strategy may work better.

The point is that no strategy will be optimal for all policies. Linux
needs a global power policy manager that the rest of the system can ask
about the current policy. This way sub-systems can (automatically)
implement whatever local strategies are consistent with that global
policy.

-Len

2004-10-20 15:18:34

by Dominik Brodowski

[permalink] [raw]

Subject: Re: [PATCH] cpufreq_ondemand

On Wed, Oct 20, 2004 at 03:35:35AM -0400, Len Brown wrote:
> On Wed, 2004-10-20 at 01:03, Andre Eisenbach wrote:
>
> > ... If the
> > speed steps down slowly but shoots up 100% quickly (as it is right
> > now), even a small task (like opening a folder, or scrolling down in a
> > document) will cause a tiny spike to 100% which takes a while to go
> > back down. The result is that the CPU spends most of it's time at 100%
> > or calming down. I wrote a small test program on my notebook which
> > confirms this.
>
> The question is what POLICY we're trying to implement.

This is why there may be DIFFERENT policies a.k.a. governors in cpufreq.

> If the goal is
> to to be energy efficient while the user notices no performance hit,
> then fast-up/slow-down is an EXCELLENT strategy. But if the goal is to
> optimize for power savings at the cost of impacting performance, then
> another strategy may work better.

> The point is that no strategy will be optimal for all policies. Linux
> needs a global power policy manager that the rest of the system can ask
> about the current policy. This way sub-systems can (automatically)
> implement whatever local strategies are consistent with that global
> policy.

Put it in userspace, and let it ask the cpufreq core in the kernel to use a
specific governor or another depending on what you want. That's what certain
userspace daemons / scripts already do, btw.

Dominik

2004-10-20 21:04:58

by Brown, Len

[permalink] [raw]

Subject: Re: [PATCH] cpufreq_ondemand

On Wed, 2004-10-20 at 10:30, Dominik Brodowski wrote:
> On Wed, Oct 20, 2004 at 03:35:35AM -0400, Len Brown wrote:

> > The question is what POLICY we're trying to implement.
>
> This is why there may be DIFFERENT policies a.k.a. governors in
> cpufreq.
....
>
> Put it in userspace, and let it ask the cpufreq core in the kernel to
> use a specific governor or another depending on what you want. That's
> what certain userspace daemons / scripts already do, btw.

Processors are not the only devices with power management. When a
device driver, say USB, or any ACPI or PCI power-managed device,
recognizes that its device is idle, who does it ask to find out what
power state to put the hardware in? Today there is nobody to tell it
what to do.

The user's global desired power policy needs to be represented in the
kernel where all devices can get at it so they can make low-latency
policy-based decisions. It isn't clear that the cpufreq multiple
governor implementation model would work well for the system as whole.

-Len

2004-10-20 21:26:50

by Dominik Brodowski

[permalink] [raw]

Subject: Re: [PATCH] cpufreq_ondemand

On Wed, Oct 20, 2004 at 05:03:45PM -0400, Len Brown wrote:
> On Wed, 2004-10-20 at 10:30, Dominik Brodowski wrote:
> > On Wed, Oct 20, 2004 at 03:35:35AM -0400, Len Brown wrote:
>
> > > The question is what POLICY we're trying to implement.
> >
> > This is why there may be DIFFERENT policies a.k.a. governors in
> > cpufreq.
> ....
> >
> > Put it in userspace, and let it ask the cpufreq core in the kernel to
> > use a specific governor or another depending on what you want. That's
> > what certain userspace daemons / scripts already do, btw.
>
> Processors are not the only devices with power management. When a
> device driver, say USB, or any ACPI or PCI power-managed device,
> recognizes that its device is idle, who does it ask to find out what
> power state to put the hardware in? Today there is nobody to tell it
> what to do.

Something like sysfs' "detach_state" comes to my mind...

> The user's global desired power policy needs to be represented in the
> kernel where all devices can get at it so they can make low-latency
> policy-based decisions. It isn't clear that the cpufreq multiple
> governor implementation model would work well for the system as whole.

The question is how much policy we want in the kernel instead of in
userspace. The actual implementation (i.e. fast transitions to idle states)
must be in the kernel, of course. However the policy decision of whether to
do such idling can and IMHO should be done in userspace.

My $0.02,

Dominik