Hi,
depending on HZ set to:
100
250
1000
the ondemand governor is currently limited to poll the CPU load
and adjust the frequency (sampling rate sysfs variable) every:
200ms
80ms
20ms
This limitation does not consider NO_HZ which looks wrong?
If this is correct, can someone give me a pointer, I'd like
to understand why.
If NO_HZ can/should go down to 20ms polling and more (current
CPUs are able to switch fast enough, so that the ondemand governor
would calculate the default polling interval below 80ms for them),
this would hurt in respect of C-states at some point.
For performance reasons, one wants to poll as much as possible, for
powersaving reasons (C-states), one wants to poll as seldom as
possible.
I wonder whether it makes sense to dynamically adjust the polling
interval (e.g. by a hint (and initial wakeup) from the scheduler or
taking C-states into account) to:
- increase the sampling rate, e.g. based on context switching
activity
- lower sampling rate when the system is idle (to gain
full C-state efficiency)
Or in what other way deep C-states could be taken into account
in respect of ondemand polling?
Thanks,
Thomas
On Fri, 2009-01-30 at 06:59 -0800, Thomas Renninger wrote:
> Hi,
>
> depending on HZ set to:
>
> 100
> 250
> 1000
>
> the ondemand governor is currently limited to poll the CPU load
> and adjust the frequency (sampling rate sysfs variable) every:
>
> 200ms
> 80ms
> 20ms
>
> This limitation does not consider NO_HZ which looks wrong?
> If this is correct, can someone give me a pointer, I'd like
> to understand why.
>
That is wrong. ondemand sampling_rate should not limit the sampling rate
based on HZ when NO_HZ is configured. The idle statistics is not limited
by HZ rate with NO_HZ, as we will have idle microaccounting.
> If NO_HZ can/should go down to 20ms polling and more (current
> CPUs are able to switch fast enough, so that the ondemand governor
> would calculate the default polling interval below 80ms for them),
> this would hurt in respect of C-states at some point.
>
> For performance reasons, one wants to poll as much as possible, for
> powersaving reasons (C-states), one wants to poll as seldom as
> possible.
>
> I wonder whether it makes sense to dynamically adjust the polling
> interval (e.g. by a hint (and initial wakeup) from the scheduler or
> taking C-states into account) to:
> - increase the sampling rate, e.g. based on context switching
> activity
> - lower sampling rate when the system is idle (to gain
> full C-state efficiency)
> Or in what other way deep C-states could be taken into account
> in respect of ondemand polling?
>
ondemand polling uses deferrable timer and hence will not be called
frequently on a totally idle CPU. The main reason we did not do the
dynamic sampling_rate is because it increases the ondemand response time
with a sudden increase of load, which is not liked by most workloads.
Thanks,
Venki