2012-02-14 02:26:21

by MyungJoo Ham

[permalink] [raw]
Subject: [RFC PATCH] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency

1. CPU_DMA_THROUGHPUT

This might look simliar to CPU_DMA_LATENCY. However, there are H/W
blocks that creates QoS requirement based on DMA throughput, not
latency, while their (those QoS requester H/W blocks) services are
short-term bursts that cannot be effectively responsed by DVFS
mechanisms (CPUFreq and Devfreq).

In the Exynos4412 systems that are being tested, such H/W blocks include
MFC (multi-function codec)'s decoding and enconding features, TV-out
(including HDMI), and Cameras. When the display is operated at 60Hz,
each chunk of task should be done within 16ms and the workload on DMA is
not well spread and fluctuates between frames; some frame requires more
and some do not and within a frame, the workload also fluctuates
heavily and the tasks within a frame are usually not parallelized; they
are processed through specific H/W blocks, not CPU cores. They often
have PPMU capabilities; however, they need to be polled very frequently
in order to let DVFS mechanisms react properly. (less than 5ms).

For such specific tasks, allowing them to request QoS requirements seems
adequete because DVFS mechanisms (as long as the polling rate is 5ms or
longer) cannot follow up with them. Besides, the device drivers know
when to request and cancel QoS exactly.

2. DVFS_LATENCY

Both CPUFreq and Devfreq have response latency to a sudden workload
increase. With near-100% (e.g., 95%) up-threshold, the average response
latency is approximately 1.5 x polling-rate.

A specific polling rate (e.g., 100ms) may generally fit for its system;
however, there could be exceptions for that. For example,
- When a user input suddenly starts: typing, clicking, moving cursors, and
such, the user might need the full performance immediately. However,
we do not know whether the full performance is actually needed or not
until we calculate the utilization; thus, we need to calculate it
faster with user inputs or any similar events. Specifying QoS on CPU
processing power or Memory bandwidth at every user input is an
overkill because there are many cases where such speed-up isn't
necessary.
- When a device driver needs a faster performance response from DVFS
mechanism. This could be addressed by simply putting QoS requests.
However, such QoS requests may keep the system running fast
unnecessary in some cases, especially if a) the device's resource
usage bursts with some duration (e.g., 100ms-long bursts) and
b) the driver doesn't know when such burst come. MMC/WiFi often had
such behaviors although there are possibilities that part (b) might
be addressed with further efforts.

The cases shown above can be tackled with putting QoS requests on the
response time or latency of DVFS mechanism, which is directly related to
its polling interval (if the DVFS mechanism is polling based).

Signed-off-by: MyungJoo Ham <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/pm_qos.h | 6 +++++-
kernel/power/qos.c | 31 ++++++++++++++++++++++++++++++-
2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
index e5bbcba..f8ccb7b 100644
--- a/include/linux/pm_qos.h
+++ b/include/linux/pm_qos.h
@@ -13,13 +13,17 @@
#define PM_QOS_CPU_DMA_LATENCY 1
#define PM_QOS_NETWORK_LATENCY 2
#define PM_QOS_NETWORK_THROUGHPUT 3
+#define PM_QOS_CPU_DMA_THROUGHPUT 4
+#define PM_QOS_DVFS_RESPONSE_LATENCY 5

-#define PM_QOS_NUM_CLASSES 4
+#define PM_QOS_NUM_CLASSES 6
#define PM_QOS_DEFAULT_VALUE -1

#define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
#define PM_QOS_NETWORK_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
#define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE 0
+#define PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE 0
+#define PM_QOS_DVFS_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
#define PM_QOS_DEV_LAT_DEFAULT_VALUE 0

struct pm_qos_request {
diff --git a/kernel/power/qos.c b/kernel/power/qos.c
index 995e3bd..b15e0b7 100644
--- a/kernel/power/qos.c
+++ b/kernel/power/qos.c
@@ -101,11 +101,40 @@ static struct pm_qos_object network_throughput_pm_qos = {
};


+static BLOCKING_NOTIFIER_HEAD(cpu_dma_throughput_notifier);
+static struct pm_qos_constraints cpu_dma_tput_constraints = {
+ .list = PLIST_HEAD_INIT(cpu_dma_tput_constraints.list),
+ .target_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
+ .default_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
+ .type = PM_QOS_MAX,
+ .notifiers = &cpu_dma_throughput_notifier,
+};
+static struct pm_qos_object cpu_dma_throughput_pm_qos = {
+ .constraints = &cpu_dma_tput_constraints,
+ .name = "cpu_dma_throughput",
+};
+
+
+static BLOCKING_NOTIFIER_HEAD(dvfs_lat_notifier);
+static struct pm_qos_constraints dvfs_lat_constraints = {
+ .list = PLIST_HEAD_INIT(dvfs_lat_constraints.list),
+ .target_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
+ .default_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
+ .type = PM_QOS_MIN,
+ .notifiers = &dvfs_lat_notifier,
+};
+static struct pm_qos_object dvfs_lat_pm_qos = {
+ .constraints = &dvfs_lat_constraints,
+ .name = "dvfs_latency",
+};
+
static struct pm_qos_object *pm_qos_array[] = {
&null_pm_qos,
&cpu_dma_pm_qos,
&network_lat_pm_qos,
- &network_throughput_pm_qos
+ &network_throughput_pm_qos,
+ &cpu_dma_throughput_pm_qos,
+ &dvfs_lat_pm_qos,
};

static ssize_t pm_qos_power_write(struct file *filp, const char __user *buf,
--
1.7.4.1


2012-02-14 05:04:15

by MyungJoo Ham

[permalink] [raw]
Subject: Re: [RFC PATCH] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency

On Tue, Feb 14, 2012 at 11:26 AM, MyungJoo Ham <[email protected]> wrote:
> 1. CPU_DMA_THROUGHPUT
>
> This might look simliar to CPU_DMA_LATENCY. However, there are H/W
> blocks that creates QoS requirement based on DMA throughput, not
> latency, while their (those QoS requester H/W blocks) services are
> short-term bursts that cannot be effectively responsed by DVFS
> mechanisms (CPUFreq and Devfreq).
>
> In the Exynos4412 systems that are being tested, such H/W blocks include
> MFC (multi-function codec)'s decoding and enconding features, TV-out
> (including HDMI), and Cameras. When the display is operated at 60Hz,
> each chunk of task should be done within 16ms and the workload on DMA is
> not well spread and fluctuates between frames; some frame requires more
> and some do not and within a frame, the workload also fluctuates
> heavily and the tasks within a frame are usually not parallelized; they
> are processed through specific H/W blocks, not CPU cores. They often
> have PPMU capabilities; however, they need to be polled very frequently
> in order to let DVFS mechanisms react properly. (less than 5ms).
>
> For such specific tasks, allowing them to request QoS requirements seems
> adequete because DVFS mechanisms (as long as the polling rate is 5ms or
> longer) cannot follow up with them. Besides, the device drivers know
> when to request and cancel QoS exactly.
>
> 2. DVFS_LATENCY
>
> Both CPUFreq and Devfreq have response latency to a sudden workload
> increase. With near-100% (e.g., 95%) up-threshold, the average response
> latency is approximately 1.5 x polling-rate.
>
> A specific polling rate (e.g., 100ms) may generally fit for its system;
> however, there could be exceptions for that. For example,
> - When a user input suddenly starts: typing, clicking, moving cursors, and
> ?such, the user might need the full performance immediately. However,
> ?we do not know whether the full performance is actually needed or not
> ?until we calculate the utilization; thus, we need to calculate it
> ?faster with user inputs or any similar events. Specifying QoS on CPU
> ?processing power or Memory bandwidth at every user input is an
> ?overkill because there are many cases where such speed-up isn't
> ?necessary.
> - When a device driver needs a faster performance response from DVFS
> ?mechanism. This could be addressed by simply putting QoS requests.
> ?However, such QoS requests may keep the system running fast
> ?unnecessary in some cases, especially if a) the device's resource
> ?usage bursts with some duration (e.g., 100ms-long bursts) and
> ?b) the driver doesn't know when such burst come. MMC/WiFi often had
> ?such behaviors although there are possibilities that part (b) might
> ?be addressed with further efforts.
>
> The cases shown above can be tackled with putting QoS requests on the
> response time or latency of DVFS mechanism, which is directly related to
> its polling interval (if the DVFS mechanism is polling based).
>
> Signed-off-by: MyungJoo Ham <[email protected]>
> Signed-off-by: Kyungmin Park <[email protected]>

In this PM-QoS patch, register_pm_qos_misc() for the new classes in
pm_qos_power_init() is missing.

Those will be included in the next version of the patch.



> ---
> ?include/linux/pm_qos.h | ? ?6 +++++-
> ?kernel/power/qos.c ? ? | ? 31 ++++++++++++++++++++++++++++++-
> ?2 files changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
> index e5bbcba..f8ccb7b 100644
> --- a/include/linux/pm_qos.h
> +++ b/include/linux/pm_qos.h
> @@ -13,13 +13,17 @@
> ?#define PM_QOS_CPU_DMA_LATENCY 1
> ?#define PM_QOS_NETWORK_LATENCY 2
> ?#define PM_QOS_NETWORK_THROUGHPUT 3
> +#define PM_QOS_CPU_DMA_THROUGHPUT 4
> +#define PM_QOS_DVFS_RESPONSE_LATENCY 5
>
> -#define PM_QOS_NUM_CLASSES 4
> +#define PM_QOS_NUM_CLASSES 6
> ?#define PM_QOS_DEFAULT_VALUE -1
>
> ?#define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE ? ? ? (2000 * USEC_PER_SEC)
> ?#define PM_QOS_NETWORK_LAT_DEFAULT_VALUE ? ? ? (2000 * USEC_PER_SEC)
> ?#define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE ? ? ? ?0
> +#define PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE ? ? ? ?0
> +#define PM_QOS_DVFS_LAT_DEFAULT_VALUE ?(2000 * USEC_PER_SEC)
> ?#define PM_QOS_DEV_LAT_DEFAULT_VALUE ? ? ? ? ? 0
>
> ?struct pm_qos_request {
> diff --git a/kernel/power/qos.c b/kernel/power/qos.c
> index 995e3bd..b15e0b7 100644
> --- a/kernel/power/qos.c
> +++ b/kernel/power/qos.c
> @@ -101,11 +101,40 @@ static struct pm_qos_object network_throughput_pm_qos = {
> ?};
>
>
> +static BLOCKING_NOTIFIER_HEAD(cpu_dma_throughput_notifier);
> +static struct pm_qos_constraints cpu_dma_tput_constraints = {
> + ? ? ? .list = PLIST_HEAD_INIT(cpu_dma_tput_constraints.list),
> + ? ? ? .target_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
> + ? ? ? .default_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
> + ? ? ? .type = PM_QOS_MAX,
> + ? ? ? .notifiers = &cpu_dma_throughput_notifier,
> +};
> +static struct pm_qos_object cpu_dma_throughput_pm_qos = {
> + ? ? ? .constraints = &cpu_dma_tput_constraints,
> + ? ? ? .name = "cpu_dma_throughput",
> +};
> +
> +
> +static BLOCKING_NOTIFIER_HEAD(dvfs_lat_notifier);
> +static struct pm_qos_constraints dvfs_lat_constraints = {
> + ? ? ? .list = PLIST_HEAD_INIT(dvfs_lat_constraints.list),
> + ? ? ? .target_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
> + ? ? ? .default_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
> + ? ? ? .type = PM_QOS_MIN,
> + ? ? ? .notifiers = &dvfs_lat_notifier,
> +};
> +static struct pm_qos_object dvfs_lat_pm_qos = {
> + ? ? ? .constraints = &dvfs_lat_constraints,
> + ? ? ? .name = "dvfs_latency",
> +};
> +
> ?static struct pm_qos_object *pm_qos_array[] = {
> ? ? ? ?&null_pm_qos,
> ? ? ? ?&cpu_dma_pm_qos,
> ? ? ? ?&network_lat_pm_qos,
> - ? ? ? &network_throughput_pm_qos
> + ? ? ? &network_throughput_pm_qos,
> + ? ? ? &cpu_dma_throughput_pm_qos,
> + ? ? ? &dvfs_lat_pm_qos,
> ?};
>
> ?static ssize_t pm_qos_power_write(struct file *filp, const char __user *buf,
> --
> 1.7.4.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html



--
MyungJoo Ham, Ph.D.
System S/W Lab, S/W Center, Samsung Electronics

2012-02-14 22:07:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency

Hi,

On Tuesday, February 14, 2012, MyungJoo Ham wrote:
> 1. CPU_DMA_THROUGHPUT
>
> This might look simliar to CPU_DMA_LATENCY. However, there are H/W
> blocks that creates QoS requirement based on DMA throughput, not
> latency, while their (those QoS requester H/W blocks) services are
> short-term bursts that cannot be effectively responsed by DVFS
> mechanisms (CPUFreq and Devfreq).
>
> In the Exynos4412 systems that are being tested, such H/W blocks include
> MFC (multi-function codec)'s decoding and enconding features, TV-out
> (including HDMI), and Cameras. When the display is operated at 60Hz,
> each chunk of task should be done within 16ms and the workload on DMA is
> not well spread and fluctuates between frames; some frame requires more
> and some do not and within a frame, the workload also fluctuates
> heavily and the tasks within a frame are usually not parallelized; they
> are processed through specific H/W blocks, not CPU cores. They often
> have PPMU capabilities; however, they need to be polled very frequently
> in order to let DVFS mechanisms react properly. (less than 5ms).
>
> For such specific tasks, allowing them to request QoS requirements seems
> adequete because DVFS mechanisms (as long as the polling rate is 5ms or
> longer) cannot follow up with them. Besides, the device drivers know
> when to request and cancel QoS exactly.
>
> 2. DVFS_LATENCY
>
> Both CPUFreq and Devfreq have response latency to a sudden workload
> increase. With near-100% (e.g., 95%) up-threshold, the average response
> latency is approximately 1.5 x polling-rate.
>
> A specific polling rate (e.g., 100ms) may generally fit for its system;
> however, there could be exceptions for that. For example,
> - When a user input suddenly starts: typing, clicking, moving cursors, and
> such, the user might need the full performance immediately. However,
> we do not know whether the full performance is actually needed or not
> until we calculate the utilization; thus, we need to calculate it
> faster with user inputs or any similar events. Specifying QoS on CPU
> processing power or Memory bandwidth at every user input is an
> overkill because there are many cases where such speed-up isn't
> necessary.
> - When a device driver needs a faster performance response from DVFS
> mechanism. This could be addressed by simply putting QoS requests.
> However, such QoS requests may keep the system running fast
> unnecessary in some cases, especially if a) the device's resource
> usage bursts with some duration (e.g., 100ms-long bursts) and
> b) the driver doesn't know when such burst come. MMC/WiFi often had
> such behaviors although there are possibilities that part (b) might
> be addressed with further efforts.
>
> The cases shown above can be tackled with putting QoS requests on the
> response time or latency of DVFS mechanism, which is directly related to
> its polling interval (if the DVFS mechanism is polling based).
>
> Signed-off-by: MyungJoo Ham <[email protected]>
> Signed-off-by: Kyungmin Park <[email protected]>

Who's going to use the new classes?

Rafael


> ---
> include/linux/pm_qos.h | 6 +++++-
> kernel/power/qos.c | 31 ++++++++++++++++++++++++++++++-
> 2 files changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
> index e5bbcba..f8ccb7b 100644
> --- a/include/linux/pm_qos.h
> +++ b/include/linux/pm_qos.h
> @@ -13,13 +13,17 @@
> #define PM_QOS_CPU_DMA_LATENCY 1
> #define PM_QOS_NETWORK_LATENCY 2
> #define PM_QOS_NETWORK_THROUGHPUT 3
> +#define PM_QOS_CPU_DMA_THROUGHPUT 4
> +#define PM_QOS_DVFS_RESPONSE_LATENCY 5
>
> -#define PM_QOS_NUM_CLASSES 4
> +#define PM_QOS_NUM_CLASSES 6
> #define PM_QOS_DEFAULT_VALUE -1
>
> #define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
> #define PM_QOS_NETWORK_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
> #define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE 0
> +#define PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE 0
> +#define PM_QOS_DVFS_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
> #define PM_QOS_DEV_LAT_DEFAULT_VALUE 0
>
> struct pm_qos_request {
> diff --git a/kernel/power/qos.c b/kernel/power/qos.c
> index 995e3bd..b15e0b7 100644
> --- a/kernel/power/qos.c
> +++ b/kernel/power/qos.c
> @@ -101,11 +101,40 @@ static struct pm_qos_object network_throughput_pm_qos = {
> };
>
>
> +static BLOCKING_NOTIFIER_HEAD(cpu_dma_throughput_notifier);
> +static struct pm_qos_constraints cpu_dma_tput_constraints = {
> + .list = PLIST_HEAD_INIT(cpu_dma_tput_constraints.list),
> + .target_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
> + .default_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
> + .type = PM_QOS_MAX,
> + .notifiers = &cpu_dma_throughput_notifier,
> +};
> +static struct pm_qos_object cpu_dma_throughput_pm_qos = {
> + .constraints = &cpu_dma_tput_constraints,
> + .name = "cpu_dma_throughput",
> +};
> +
> +
> +static BLOCKING_NOTIFIER_HEAD(dvfs_lat_notifier);
> +static struct pm_qos_constraints dvfs_lat_constraints = {
> + .list = PLIST_HEAD_INIT(dvfs_lat_constraints.list),
> + .target_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
> + .default_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
> + .type = PM_QOS_MIN,
> + .notifiers = &dvfs_lat_notifier,
> +};
> +static struct pm_qos_object dvfs_lat_pm_qos = {
> + .constraints = &dvfs_lat_constraints,
> + .name = "dvfs_latency",
> +};
> +
> static struct pm_qos_object *pm_qos_array[] = {
> &null_pm_qos,
> &cpu_dma_pm_qos,
> &network_lat_pm_qos,
> - &network_throughput_pm_qos
> + &network_throughput_pm_qos,
> + &cpu_dma_throughput_pm_qos,
> + &dvfs_lat_pm_qos,
> };
>
> static ssize_t pm_qos_power_write(struct file *filp, const char __user *buf,
>

2012-02-15 10:44:29

by MyungJoo Ham

[permalink] [raw]
Subject: Re: [RFC PATCH] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency

2012/2/15 Rafael J. Wysocki <[email protected]>:
> Hi,
>
> On Tuesday, February 14, 2012, MyungJoo Ham wrote:
>> 1. CPU_DMA_THROUGHPUT
>> 2. DVFS_LATENCY
> Who's going to use the new classes?
>
> Rafael
>

Hello,

1. CPU_DMA_THROUGHPUT:
QoS-request handler: bus/memory devfreq driver.
QoS-requester: multimedia block device drivers (MFC decoder, MFC
endoder, TV-out, Camera, ...)

2. DVFS_LATENCY:
QoS-request handler: devfreq framework and cpufreq governors
(ondemand/conservative).
QoS-requester: those who need faster DVFS reactions temporarily as
mentioned in the thread of pm_qos_update_request_timeout().


Cheers!
MyungJoo.

>
>> ---
>> ?include/linux/pm_qos.h | ? ?6 +++++-
>> ?kernel/power/qos.c ? ? | ? 31 ++++++++++++++++++++++++++++++-
>> ?2 files changed, 35 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
>> index e5bbcba..f8ccb7b 100644
>> --- a/include/linux/pm_qos.h
>> +++ b/include/linux/pm_qos.h
>> @@ -13,13 +13,17 @@
>> ?#define PM_QOS_CPU_DMA_LATENCY 1
>> ?#define PM_QOS_NETWORK_LATENCY 2
>> ?#define PM_QOS_NETWORK_THROUGHPUT 3
>> +#define PM_QOS_CPU_DMA_THROUGHPUT 4
>> +#define PM_QOS_DVFS_RESPONSE_LATENCY 5
>>
>> -#define PM_QOS_NUM_CLASSES 4
>> +#define PM_QOS_NUM_CLASSES 6
>> ?#define PM_QOS_DEFAULT_VALUE -1
>>
>> ?#define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE ? ? (2000 * USEC_PER_SEC)
>> ?#define PM_QOS_NETWORK_LAT_DEFAULT_VALUE ? ? (2000 * USEC_PER_SEC)
>> ?#define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE ? ? ?0
>> +#define PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE ? ? ?0
>> +#define PM_QOS_DVFS_LAT_DEFAULT_VALUE ? ? ? ?(2000 * USEC_PER_SEC)
>> ?#define PM_QOS_DEV_LAT_DEFAULT_VALUE ? ? ? ? 0
>>
>> ?struct pm_qos_request {
>> diff --git a/kernel/power/qos.c b/kernel/power/qos.c
>> index 995e3bd..b15e0b7 100644
>> --- a/kernel/power/qos.c
>> +++ b/kernel/power/qos.c
>> @@ -101,11 +101,40 @@ static struct pm_qos_object network_throughput_pm_qos = {
>> ?};
>>
>>
>> +static BLOCKING_NOTIFIER_HEAD(cpu_dma_throughput_notifier);
>> +static struct pm_qos_constraints cpu_dma_tput_constraints = {
>> + ? ? .list = PLIST_HEAD_INIT(cpu_dma_tput_constraints.list),
>> + ? ? .target_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
>> + ? ? .default_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
>> + ? ? .type = PM_QOS_MAX,
>> + ? ? .notifiers = &cpu_dma_throughput_notifier,
>> +};
>> +static struct pm_qos_object cpu_dma_throughput_pm_qos = {
>> + ? ? .constraints = &cpu_dma_tput_constraints,
>> + ? ? .name = "cpu_dma_throughput",
>> +};
>> +
>> +
>> +static BLOCKING_NOTIFIER_HEAD(dvfs_lat_notifier);
>> +static struct pm_qos_constraints dvfs_lat_constraints = {
>> + ? ? .list = PLIST_HEAD_INIT(dvfs_lat_constraints.list),
>> + ? ? .target_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
>> + ? ? .default_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
>> + ? ? .type = PM_QOS_MIN,
>> + ? ? .notifiers = &dvfs_lat_notifier,
>> +};
>> +static struct pm_qos_object dvfs_lat_pm_qos = {
>> + ? ? .constraints = &dvfs_lat_constraints,
>> + ? ? .name = "dvfs_latency",
>> +};
>> +
>> ?static struct pm_qos_object *pm_qos_array[] = {
>> ? ? ? &null_pm_qos,
>> ? ? ? &cpu_dma_pm_qos,
>> ? ? ? &network_lat_pm_qos,
>> - ? ? &network_throughput_pm_qos
>> + ? ? &network_throughput_pm_qos,
>> + ? ? &cpu_dma_throughput_pm_qos,
>> + ? ? &dvfs_lat_pm_qos,
>> ?};
>>
>> ?static ssize_t pm_qos_power_write(struct file *filp, const char __user *buf,
>>
>



--
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab, DMC Business, Samsung Electronics

2012-02-22 08:03:36

by MyungJoo Ham

[permalink] [raw]
Subject: [RFC PATCH 1/2] CPUfreq ondemand: update sampling rate without waiting for next sampling

When a new sampling rate is shorter than the current one, (e.g., 1 sec
--> 10 ms) regardless how short the new one is, the current ondemand
mechanism wait for the previously set timer to be expired.

For example, if the user has just expressed that the sampling rate
should be 10 ms from now and the previous was 1000 ms, the new rate may
become effective 999 ms later, which could be not acceptable for the
user if the user has intended to speed up sampling because the system is
expected to react to CPU load fluctuation quickly from __now__.

In order to address this issue, we need to cancel the previously set
timer (schedule_delayed_work) and reset the timer if resetting timer is
expected to trigger the delayed_work ealier.

Signed-off-by: MyungJoo Ham <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
drivers/cpufreq/cpufreq_ondemand.c | 58 +++++++++++++++++++++++++++++++++++-
1 files changed, 57 insertions(+), 1 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index c3e0652..2d66649 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -257,6 +257,62 @@ show_one(sampling_down_factor, sampling_down_factor);
show_one(ignore_nice_load, ignore_nice);
show_one(powersave_bias, powersave_bias);

+/**
+ * update_sampling_rate - update sampling rate effective immediately if needed.
+ * @new_rate: new sampling rate
+ *
+ * If new rate is smaller than the old, simply updaing
+ * dbs_tuners_int.sampling_rate might not be appropriate. For example,
+ * if the original sampling_rate was 1 second and the requested new sampling
+ * rate is 10 ms because the user needs immediate reaction from ondemand
+ * governor, but not sure if higher frequency will be required or not,
+ * then, the governor may change the sampling rate too late; up to 1 second
+ * later. Thus, if we are reducing the sampling rate, we need to make the
+ * new value effective immediately.
+ */
+static void update_sampling_rate(unsigned int new_rate)
+{
+ int cpu;
+
+ dbs_tuners_ins.sampling_rate = new_rate
+ = max(new_rate, min_sampling_rate);
+
+ for_each_online_cpu(cpu) {
+ struct cpufreq_policy *policy;
+ struct cpu_dbs_info_s *dbs_info;
+ struct timer_list *timer;
+ unsigned long appointed_at;
+
+ policy = cpufreq_cpu_get(cpu);
+ if (!policy)
+ continue;
+ dbs_info = &per_cpu(od_cpu_dbs_info, policy->cpu);
+ cpufreq_cpu_put(policy);
+
+ mutex_lock(&dbs_info->timer_mutex);
+
+ if (!delayed_work_pending(&dbs_info->work))
+ goto next;
+
+ timer = &dbs_info->work.timer;
+ appointed_at = timer->expires;
+
+ if (time_before(jiffies + usecs_to_jiffies(new_rate),
+ appointed_at)) {
+
+ mutex_unlock(&dbs_info->timer_mutex);
+ cancel_delayed_work_sync(&dbs_info->work);
+ mutex_lock(&dbs_info->timer_mutex);
+
+ schedule_delayed_work_on(dbs_info->cpu, &dbs_info->work,
+ usecs_to_jiffies(new_rate));
+
+ }
+next:
+ mutex_unlock(&dbs_info->timer_mutex);
+ }
+}
+
static ssize_t store_sampling_rate(struct kobject *a, struct attribute *b,
const char *buf, size_t count)
{
@@ -265,7 +321,7 @@ static ssize_t store_sampling_rate(struct kobject *a, struct attribute *b,
ret = sscanf(buf, "%u", &input);
if (ret != 1)
return -EINVAL;
- dbs_tuners_ins.sampling_rate = max(input, min_sampling_rate);
+ update_sampling_rate(input);
return count;
}

--
1.7.4.1

2012-02-22 08:03:35

by MyungJoo Ham

[permalink] [raw]
Subject: [RFC PATCH 0/2] CPUFreq / Ondemand update

1. A new sampling rate should be effective immediately at update.

2. The sampling rate may be controlled through QoS requests.


MyungJoo Ham (2):
CPUfreq ondemand: update sampling rate without waiting for next
sampling
CPUfreq ondemand: handle QoS request on DVFS response latency

drivers/cpufreq/cpufreq_ondemand.c | 150 ++++++++++++++++++++++++++++++++++-
1 files changed, 145 insertions(+), 5 deletions(-)

--
1.7.4.1

2012-02-22 08:03:58

by MyungJoo Ham

[permalink] [raw]
Subject: [RFC PATCH 2/2] CPUfreq ondemand: handle QoS request on DVFS response latency

With QoS class, DVFS_RESPONSE_LATENCY, users (device drivers and
userspace processes) may express the desired maximum response latency
from DVFS mechanisms such as CPUfreq's ondemand governors. Based on such
QoS requests, the ondemand governor may flexibly adjust sampling rate
accordingly unless it goes below the min_sampling_rate.

The benefit of having DVFS_RESPONSE_LATENCY is to have faster response
from user inputs (mouse clicks, keyboard inputs, touchscreen touches,
and others) without increasing frequency unconditionally. Because some
input events may not require any performance increases, increasing the
frequency unconditionally for inputs may simply consume too much energy.
Adjusting sampling rate based on user inputs enabled to increase
frequency with less latency if it requires and not to increase frequency
if it does not require.

Signed-off-by: MyungJoo Ham <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>

--
This patch depends on the patch
"PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency".
and the patch
"CPUfreq ondemand: update sampling rate without waiting for next
sampling"
---
drivers/cpufreq/cpufreq_ondemand.c | 108 ++++++++++++++++++++++++++++++++----
1 files changed, 96 insertions(+), 12 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 2d66649..b9188f1 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -22,6 +22,7 @@
#include <linux/tick.h>
#include <linux/ktime.h>
#include <linux/sched.h>
+#include <linux/pm_qos.h>

/*
* dbs is used in this file as a shortform for demandbased switching
@@ -93,6 +94,7 @@ struct cpu_dbs_info_s {
* when user is changing the governor or limits.
*/
struct mutex timer_mutex;
+ bool activated; /* dbs_timer_init is in effect */
};
static DEFINE_PER_CPU(struct cpu_dbs_info_s, od_cpu_dbs_info);

@@ -111,6 +113,8 @@ static struct dbs_tuners {
unsigned int sampling_down_factor;
unsigned int powersave_bias;
unsigned int io_is_busy;
+ struct notifier_block dvfs_lat_qos_db;
+ unsigned int dvfs_lat_qos_wants;
} dbs_tuners_ins = {
.up_threshold = DEF_FREQUENCY_UP_THRESHOLD,
.sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR,
@@ -164,6 +168,23 @@ static inline cputime64_t get_cpu_iowait_time(unsigned int cpu, cputime64_t *wal
}

/*
+ * Find right sampling rate based on sampling_rate and
+ * QoS requests on dvfs latency.
+ */
+static unsigned int effective_sampling_rate(void)
+{
+ unsigned int effective;
+
+ if (dbs_tuners_ins.dvfs_lat_qos_wants)
+ effective = min(dbs_tuners_ins.dvfs_lat_qos_wants,
+ dbs_tuners_ins.sampling_rate);
+ else
+ effective = dbs_tuners_ins.sampling_rate;
+
+ return max(effective, min_sampling_rate);
+}
+
+/*
* Find right freq to be set now with powersave_bias on.
* Returns the freq_hi to be used right now and will set freq_hi_jiffies,
* freq_lo, and freq_lo_jiffies in percpu area for averaging freqs.
@@ -207,7 +228,7 @@ static unsigned int powersave_bias_target(struct cpufreq_policy *policy,
dbs_info->freq_lo_jiffies = 0;
return freq_lo;
}
- jiffies_total = usecs_to_jiffies(dbs_tuners_ins.sampling_rate);
+ jiffies_total = usecs_to_jiffies(effective_sampling_rate());
jiffies_hi = (freq_avg - freq_lo) * jiffies_total;
jiffies_hi += ((freq_hi - freq_lo) / 2);
jiffies_hi /= (freq_hi - freq_lo);
@@ -259,7 +280,8 @@ show_one(powersave_bias, powersave_bias);

/**
* update_sampling_rate - update sampling rate effective immediately if needed.
- * @new_rate: new sampling rate
+ * @new_rate: new sampling rate. if it is 0, regard sampling rate is not
+ * changed and assume that qos request value is changed.
*
* If new rate is smaller than the old, simply updaing
* dbs_tuners_int.sampling_rate might not be appropriate. For example,
@@ -273,9 +295,13 @@ show_one(powersave_bias, powersave_bias);
static void update_sampling_rate(unsigned int new_rate)
{
int cpu;
+ unsigned int effective;
+
+
+ if (new_rate)
+ dbs_tuners_ins.sampling_rate = max(new_rate, min_sampling_rate);

- dbs_tuners_ins.sampling_rate = new_rate
- = max(new_rate, min_sampling_rate);
+ effective = effective_sampling_rate();

for_each_online_cpu(cpu) {
struct cpufreq_policy *policy;
@@ -283,21 +309,31 @@ static void update_sampling_rate(unsigned int new_rate)
struct timer_list *timer;
unsigned long appointed_at;

+ /*
+ * mutex_destory(&dbs_info->timer_mutex) should not happen
+ * in this context.
+ */
+ mutex_lock(&dbs_mutex);
+
policy = cpufreq_cpu_get(cpu);
if (!policy)
- continue;
+ goto next;
dbs_info = &per_cpu(od_cpu_dbs_info, policy->cpu);
cpufreq_cpu_put(policy);

+ /* timer_mutex destroyed or will be destoyed soon */
+ if (!dbs_info->activated)
+ goto next;
+
mutex_lock(&dbs_info->timer_mutex);

if (!delayed_work_pending(&dbs_info->work))
- goto next;
+ goto next_timer_mutex;

timer = &dbs_info->work.timer;
appointed_at = timer->expires;

- if (time_before(jiffies + usecs_to_jiffies(new_rate),
+ if (time_before(jiffies + usecs_to_jiffies(effective),
appointed_at)) {

mutex_unlock(&dbs_info->timer_mutex);
@@ -305,12 +341,15 @@ static void update_sampling_rate(unsigned int new_rate)
mutex_lock(&dbs_info->timer_mutex);

schedule_delayed_work_on(dbs_info->cpu, &dbs_info->work,
- usecs_to_jiffies(new_rate));
+ usecs_to_jiffies(effective));

}
-next:
+next_timer_mutex:
mutex_unlock(&dbs_info->timer_mutex);
+next:
+ mutex_unlock(&dbs_mutex);
}
+
}

static ssize_t store_sampling_rate(struct kobject *a, struct attribute *b,
@@ -620,7 +659,7 @@ static void do_dbs_timer(struct work_struct *work)
/* We want all CPUs to do sampling nearly on
* same jiffy
*/
- delay = usecs_to_jiffies(dbs_tuners_ins.sampling_rate
+ delay = usecs_to_jiffies(effective_sampling_rate()
* dbs_info->rate_mult);

if (num_online_cpus() > 1)
@@ -638,7 +677,7 @@ static void do_dbs_timer(struct work_struct *work)
static inline void dbs_timer_init(struct cpu_dbs_info_s *dbs_info)
{
/* We want all CPUs to do sampling nearly on same jiffy */
- int delay = usecs_to_jiffies(dbs_tuners_ins.sampling_rate);
+ int delay = usecs_to_jiffies(effective_sampling_rate());

if (num_online_cpus() > 1)
delay -= jiffies % delay;
@@ -646,10 +685,12 @@ static inline void dbs_timer_init(struct cpu_dbs_info_s *dbs_info)
dbs_info->sample_type = DBS_NORMAL_SAMPLE;
INIT_DELAYED_WORK_DEFERRABLE(&dbs_info->work, do_dbs_timer);
schedule_delayed_work_on(dbs_info->cpu, &dbs_info->work, delay);
+ dbs_info->activated = true;
}

static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
{
+ dbs_info->activated = false;
cancel_delayed_work_sync(&dbs_info->work);
}

@@ -767,10 +808,39 @@ static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
return 0;
}

+/**
+ * qos_dvfs_lat_notify - PM QoS Notifier for DVFS_LATENCY QoS Request
+ * @nb notifier block struct
+ * @value QoS value
+ * @dummy
+ */
+static int qos_dvfs_lat_notify(struct notifier_block *nb, unsigned long value,
+ void *dummy)
+{
+ /*
+ * In the worst case, with a continuous up-treshold + e cpu load
+ * from up-threshold - e load, the ondemand governor will react
+ * sampling_rate * 2.
+ *
+ * Thus, based on the worst case scenario, we use value / 2;
+ */
+ dbs_tuners_ins.dvfs_lat_qos_wants = value / 2;
+
+ /* Update sampling rate */
+ update_sampling_rate(0);
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block ondemand_qos_dvfs_lat_nb = {
+ .notifier_call = qos_dvfs_lat_notify,
+};
+
static int __init cpufreq_gov_dbs_init(void)
{
u64 idle_time;
int cpu = get_cpu();
+ int err = 0;

idle_time = get_cpu_idle_time_us(cpu, NULL);
put_cpu();
@@ -791,11 +861,25 @@ static int __init cpufreq_gov_dbs_init(void)
MIN_SAMPLING_RATE_RATIO * jiffies_to_usecs(10);
}

- return cpufreq_register_governor(&cpufreq_gov_ondemand);
+ err = pm_qos_add_notifier(PM_QOS_DVFS_RESPONSE_LATENCY,
+ &ondemand_qos_dvfs_lat_nb);
+ if (err)
+ return err;
+
+ err = cpufreq_register_governor(&cpufreq_gov_ondemand);
+ if (err) {
+ pm_qos_remove_notifier(PM_QOS_DVFS_RESPONSE_LATENCY,
+ &ondemand_qos_dvfs_lat_nb);
+ }
+
+ return err;
}

static void __exit cpufreq_gov_dbs_exit(void)
{
+ pm_qos_remove_notifier(PM_QOS_DVFS_RESPONSE_LATENCY,
+ &ondemand_qos_dvfs_lat_nb);
+
cpufreq_unregister_governor(&cpufreq_gov_ondemand);
}

--
1.7.4.1

2012-02-25 11:30:13

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] CPUfreq ondemand: handle QoS request on DVFS response latency

On Wed 2012-02-22 17:03:35, MyungJoo Ham wrote:
> With QoS class, DVFS_RESPONSE_LATENCY, users (device drivers and
> userspace processes) may express the desired maximum response latency
> from DVFS mechanisms such as CPUfreq's ondemand governors. Based on such
> QoS requests, the ondemand governor may flexibly adjust sampling rate
> accordingly unless it goes below the min_sampling_rate.
>
> The benefit of having DVFS_RESPONSE_LATENCY is to have faster response
> from user inputs (mouse clicks, keyboard inputs, touchscreen touches,
> and others) without increasing frequency unconditionally. Because some
> input events may not require any performance increases, increasing the
> frequency unconditionally for inputs may simply consume too much energy.
> Adjusting sampling rate based on user inputs enabled to increase
> frequency with less latency if it requires and not to increase frequency
> if it does not require.
>
> Signed-off-by: MyungJoo Ham <[email protected]>
> Signed-off-by: Kyungmin Park <[email protected]>
>
> --
> This patch depends on the patch
> "PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency".
> and the patch
> "CPUfreq ondemand: update sampling rate without waiting for next
> sampling"
> ---
> drivers/cpufreq/cpufreq_ondemand.c | 108 ++++++++++++++++++++++++++++++++----
> 1 files changed, 96 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
> index 2d66649..b9188f1 100644
> --- a/drivers/cpufreq/cpufreq_ondemand.c
> +++ b/drivers/cpufreq/cpufreq_ondemand.c
> @@ -22,6 +22,7 @@
> #include <linux/tick.h>
> #include <linux/ktime.h>
> #include <linux/sched.h>
> +#include <linux/pm_qos.h>
>
> /*
> * dbs is used in this file as a shortform for demandbased switching
> @@ -93,6 +94,7 @@ struct cpu_dbs_info_s {
> * when user is changing the governor or limits.
> */
> struct mutex timer_mutex;
> + bool activated; /* dbs_timer_init is in effect */
> };
> static DEFINE_PER_CPU(struct cpu_dbs_info_s, od_cpu_dbs_info);
>
> @@ -111,6 +113,8 @@ static struct dbs_tuners {
> unsigned int sampling_down_factor;
> unsigned int powersave_bias;
> unsigned int io_is_busy;
> + struct notifier_block dvfs_lat_qos_db;
> + unsigned int dvfs_lat_qos_wants;
> } dbs_tuners_ins = {
> .up_threshold = DEF_FREQUENCY_UP_THRESHOLD,
> .sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR,
> @@ -164,6 +168,23 @@ static inline cputime64_t get_cpu_iowait_time(unsigned int cpu, cputime64_t *wal
> }
>
> /*
> + * Find right sampling rate based on sampling_rate and
> + * QoS requests on dvfs latency.
> + */
> +static unsigned int effective_sampling_rate(void)
> +{
> + unsigned int effective;
> +
> + if (dbs_tuners_ins.dvfs_lat_qos_wants)
> + effective = min(dbs_tuners_ins.dvfs_lat_qos_wants,
> + dbs_tuners_ins.sampling_rate);
> + else
> + effective = dbs_tuners_ins.sampling_rate;
> +
> + return max(effective, min_sampling_rate);
> +}
> +
> +/*
> * Find right freq to be set now with powersave_bias on.
> * Returns the freq_hi to be used right now and will set freq_hi_jiffies,
> * freq_lo, and freq_lo_jiffies in percpu area for averaging freqs.
> @@ -207,7 +228,7 @@ static unsigned int powersave_bias_target(struct cpufreq_policy *policy,
> dbs_info->freq_lo_jiffies = 0;
> return freq_lo;
> }
> - jiffies_total = usecs_to_jiffies(dbs_tuners_ins.sampling_rate);
> + jiffies_total = usecs_to_jiffies(effective_sampling_rate());
> jiffies_hi = (freq_avg - freq_lo) * jiffies_total;
> jiffies_hi += ((freq_hi - freq_lo) / 2);
> jiffies_hi /= (freq_hi - freq_lo);
> @@ -259,7 +280,8 @@ show_one(powersave_bias, powersave_bias);
>
> /**
> * update_sampling_rate - update sampling rate effective immediately if needed.
> - * @new_rate: new sampling rate
> + * @new_rate: new sampling rate. if it is 0, regard sampling rate is not
> + * changed and assume that qos request value is changed.
> *
> * If new rate is smaller than the old, simply updaing
> * dbs_tuners_int.sampling_rate might not be appropriate. For example,
> @@ -273,9 +295,13 @@ show_one(powersave_bias, powersave_bias);
> static void update_sampling_rate(unsigned int new_rate)
> {
> int cpu;
> + unsigned int effective;
> +
> +
> + if (new_rate)
> + dbs_tuners_ins.sampling_rate = max(new_rate, min_sampling_rate);
>
> - dbs_tuners_ins.sampling_rate = new_rate
> - = max(new_rate, min_sampling_rate);
> + effective = effective_sampling_rate();
>
> for_each_online_cpu(cpu) {
> struct cpufreq_policy *policy;
> @@ -283,21 +309,31 @@ static void update_sampling_rate(unsigned int new_rate)
> struct timer_list *timer;
> unsigned long appointed_at;
>
> + /*
> + * mutex_destory(&dbs_info->timer_mutex) should not happen
> + * in this context.
> + */
> + mutex_lock(&dbs_mutex);
> +
> policy = cpufreq_cpu_get(cpu);
> if (!policy)
> - continue;
> + goto next;
> dbs_info = &per_cpu(od_cpu_dbs_info, policy->cpu);
> cpufreq_cpu_put(policy);
>
> + /* timer_mutex destroyed or will be destoyed soon */
> + if (!dbs_info->activated)
> + goto next;
> +
> mutex_lock(&dbs_info->timer_mutex);
>
> if (!delayed_work_pending(&dbs_info->work))
> - goto next;
> + goto next_timer_mutex;
>
> timer = &dbs_info->work.timer;
> appointed_at = timer->expires;
>
> - if (time_before(jiffies + usecs_to_jiffies(new_rate),
> + if (time_before(jiffies + usecs_to_jiffies(effective),
> appointed_at)) {
>
> mutex_unlock(&dbs_info->timer_mutex);
> @@ -305,12 +341,15 @@ static void update_sampling_rate(unsigned int new_rate)
> mutex_lock(&dbs_info->timer_mutex);
>
> schedule_delayed_work_on(dbs_info->cpu, &dbs_info->work,
> - usecs_to_jiffies(new_rate));
> + usecs_to_jiffies(effective));
>
> }
> -next:
> +next_timer_mutex:
> mutex_unlock(&dbs_info->timer_mutex);
> +next:
> + mutex_unlock(&dbs_mutex);
> }
> +
> }

I don't think gotos are helpful here. Can you use normal program
structure or move it to subroutine...?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2012-02-25 17:59:19

by mark gross

[permalink] [raw]
Subject: Re: [RFC PATCH] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency

FWIW this looks ok to me. --mark

On Tue, Feb 14, 2012 at 11:26:12AM +0900, MyungJoo Ham wrote:
> 1. CPU_DMA_THROUGHPUT
>
> This might look simliar to CPU_DMA_LATENCY. However, there are H/W
> blocks that creates QoS requirement based on DMA throughput, not
> latency, while their (those QoS requester H/W blocks) services are
> short-term bursts that cannot be effectively responsed by DVFS
> mechanisms (CPUFreq and Devfreq).
>
> In the Exynos4412 systems that are being tested, such H/W blocks include
> MFC (multi-function codec)'s decoding and enconding features, TV-out
> (including HDMI), and Cameras. When the display is operated at 60Hz,
> each chunk of task should be done within 16ms and the workload on DMA is
> not well spread and fluctuates between frames; some frame requires more
> and some do not and within a frame, the workload also fluctuates
> heavily and the tasks within a frame are usually not parallelized; they
> are processed through specific H/W blocks, not CPU cores. They often
> have PPMU capabilities; however, they need to be polled very frequently
> in order to let DVFS mechanisms react properly. (less than 5ms).
>
> For such specific tasks, allowing them to request QoS requirements seems
> adequete because DVFS mechanisms (as long as the polling rate is 5ms or
> longer) cannot follow up with them. Besides, the device drivers know
> when to request and cancel QoS exactly.
>
> 2. DVFS_LATENCY
>
> Both CPUFreq and Devfreq have response latency to a sudden workload
> increase. With near-100% (e.g., 95%) up-threshold, the average response
> latency is approximately 1.5 x polling-rate.
>
> A specific polling rate (e.g., 100ms) may generally fit for its system;
> however, there could be exceptions for that. For example,
> - When a user input suddenly starts: typing, clicking, moving cursors, and
> such, the user might need the full performance immediately. However,
> we do not know whether the full performance is actually needed or not
> until we calculate the utilization; thus, we need to calculate it
> faster with user inputs or any similar events. Specifying QoS on CPU
> processing power or Memory bandwidth at every user input is an
> overkill because there are many cases where such speed-up isn't
> necessary.
> - When a device driver needs a faster performance response from DVFS
> mechanism. This could be addressed by simply putting QoS requests.
> However, such QoS requests may keep the system running fast
> unnecessary in some cases, especially if a) the device's resource
> usage bursts with some duration (e.g., 100ms-long bursts) and
> b) the driver doesn't know when such burst come. MMC/WiFi often had
> such behaviors although there are possibilities that part (b) might
> be addressed with further efforts.
>
> The cases shown above can be tackled with putting QoS requests on the
> response time or latency of DVFS mechanism, which is directly related to
> its polling interval (if the DVFS mechanism is polling based).
>
> Signed-off-by: MyungJoo Ham <[email protected]>
> Signed-off-by: Kyungmin Park <[email protected]>
> ---
> include/linux/pm_qos.h | 6 +++++-
> kernel/power/qos.c | 31 ++++++++++++++++++++++++++++++-
> 2 files changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
> index e5bbcba..f8ccb7b 100644
> --- a/include/linux/pm_qos.h
> +++ b/include/linux/pm_qos.h
> @@ -13,13 +13,17 @@
> #define PM_QOS_CPU_DMA_LATENCY 1
> #define PM_QOS_NETWORK_LATENCY 2
> #define PM_QOS_NETWORK_THROUGHPUT 3
> +#define PM_QOS_CPU_DMA_THROUGHPUT 4
> +#define PM_QOS_DVFS_RESPONSE_LATENCY 5
>
> -#define PM_QOS_NUM_CLASSES 4
> +#define PM_QOS_NUM_CLASSES 6
> #define PM_QOS_DEFAULT_VALUE -1
>
> #define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
> #define PM_QOS_NETWORK_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
> #define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE 0
> +#define PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE 0
> +#define PM_QOS_DVFS_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
> #define PM_QOS_DEV_LAT_DEFAULT_VALUE 0
>
> struct pm_qos_request {
> diff --git a/kernel/power/qos.c b/kernel/power/qos.c
> index 995e3bd..b15e0b7 100644
> --- a/kernel/power/qos.c
> +++ b/kernel/power/qos.c
> @@ -101,11 +101,40 @@ static struct pm_qos_object network_throughput_pm_qos = {
> };
>
>
> +static BLOCKING_NOTIFIER_HEAD(cpu_dma_throughput_notifier);
> +static struct pm_qos_constraints cpu_dma_tput_constraints = {
> + .list = PLIST_HEAD_INIT(cpu_dma_tput_constraints.list),
> + .target_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
> + .default_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
> + .type = PM_QOS_MAX,
> + .notifiers = &cpu_dma_throughput_notifier,
> +};
> +static struct pm_qos_object cpu_dma_throughput_pm_qos = {
> + .constraints = &cpu_dma_tput_constraints,
> + .name = "cpu_dma_throughput",
> +};
> +
> +
> +static BLOCKING_NOTIFIER_HEAD(dvfs_lat_notifier);
> +static struct pm_qos_constraints dvfs_lat_constraints = {
> + .list = PLIST_HEAD_INIT(dvfs_lat_constraints.list),
> + .target_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
> + .default_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
> + .type = PM_QOS_MIN,
> + .notifiers = &dvfs_lat_notifier,
> +};
> +static struct pm_qos_object dvfs_lat_pm_qos = {
> + .constraints = &dvfs_lat_constraints,
> + .name = "dvfs_latency",
> +};
> +
> static struct pm_qos_object *pm_qos_array[] = {
> &null_pm_qos,
> &cpu_dma_pm_qos,
> &network_lat_pm_qos,
> - &network_throughput_pm_qos
> + &network_throughput_pm_qos,
> + &cpu_dma_throughput_pm_qos,
> + &dvfs_lat_pm_qos,
> };
>
> static ssize_t pm_qos_power_write(struct file *filp, const char __user *buf,
> --
> 1.7.4.1
>

2012-02-25 23:39:53

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency

On Tuesday, February 14, 2012, MyungJoo Ham wrote:
> On Tue, Feb 14, 2012 at 11:26 AM, MyungJoo Ham <[email protected]> wrote:
> > 1. CPU_DMA_THROUGHPUT
> >
> > This might look simliar to CPU_DMA_LATENCY. However, there are H/W
> > blocks that creates QoS requirement based on DMA throughput, not
> > latency, while their (those QoS requester H/W blocks) services are
> > short-term bursts that cannot be effectively responsed by DVFS
> > mechanisms (CPUFreq and Devfreq).
> >
> > In the Exynos4412 systems that are being tested, such H/W blocks include
> > MFC (multi-function codec)'s decoding and enconding features, TV-out
> > (including HDMI), and Cameras. When the display is operated at 60Hz,
> > each chunk of task should be done within 16ms and the workload on DMA is
> > not well spread and fluctuates between frames; some frame requires more
> > and some do not and within a frame, the workload also fluctuates
> > heavily and the tasks within a frame are usually not parallelized; they
> > are processed through specific H/W blocks, not CPU cores. They often
> > have PPMU capabilities; however, they need to be polled very frequently
> > in order to let DVFS mechanisms react properly. (less than 5ms).
> >
> > For such specific tasks, allowing them to request QoS requirements seems
> > adequete because DVFS mechanisms (as long as the polling rate is 5ms or
> > longer) cannot follow up with them. Besides, the device drivers know
> > when to request and cancel QoS exactly.
> >
> > 2. DVFS_LATENCY
> >
> > Both CPUFreq and Devfreq have response latency to a sudden workload
> > increase. With near-100% (e.g., 95%) up-threshold, the average response
> > latency is approximately 1.5 x polling-rate.
> >
> > A specific polling rate (e.g., 100ms) may generally fit for its system;
> > however, there could be exceptions for that. For example,
> > - When a user input suddenly starts: typing, clicking, moving cursors, and
> > such, the user might need the full performance immediately. However,
> > we do not know whether the full performance is actually needed or not
> > until we calculate the utilization; thus, we need to calculate it
> > faster with user inputs or any similar events. Specifying QoS on CPU
> > processing power or Memory bandwidth at every user input is an
> > overkill because there are many cases where such speed-up isn't
> > necessary.
> > - When a device driver needs a faster performance response from DVFS
> > mechanism. This could be addressed by simply putting QoS requests.
> > However, such QoS requests may keep the system running fast
> > unnecessary in some cases, especially if a) the device's resource
> > usage bursts with some duration (e.g., 100ms-long bursts) and
> > b) the driver doesn't know when such burst come. MMC/WiFi often had
> > such behaviors although there are possibilities that part (b) might
> > be addressed with further efforts.
> >
> > The cases shown above can be tackled with putting QoS requests on the
> > response time or latency of DVFS mechanism, which is directly related to
> > its polling interval (if the DVFS mechanism is polling based).
> >
> > Signed-off-by: MyungJoo Ham <[email protected]>
> > Signed-off-by: Kyungmin Park <[email protected]>
>
> In this PM-QoS patch, register_pm_qos_misc() for the new classes in
> pm_qos_power_init() is missing.
>
> Those will be included in the next version of the patch.

Has the new version been posted already? I seem to have missed it if so.

Thanks,
Rafael

2012-02-25 23:43:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] CPUfreq ondemand: handle QoS request on DVFS response latency

On Saturday, February 25, 2012, Pavel Machek wrote:
> On Wed 2012-02-22 17:03:35, MyungJoo Ham wrote:
> > With QoS class, DVFS_RESPONSE_LATENCY, users (device drivers and
> > userspace processes) may express the desired maximum response latency
> > from DVFS mechanisms such as CPUfreq's ondemand governors. Based on such
> > QoS requests, the ondemand governor may flexibly adjust sampling rate
> > accordingly unless it goes below the min_sampling_rate.
> >
> > The benefit of having DVFS_RESPONSE_LATENCY is to have faster response
> > from user inputs (mouse clicks, keyboard inputs, touchscreen touches,
> > and others) without increasing frequency unconditionally. Because some
> > input events may not require any performance increases, increasing the
> > frequency unconditionally for inputs may simply consume too much energy.
> > Adjusting sampling rate based on user inputs enabled to increase
> > frequency with less latency if it requires and not to increase frequency
> > if it does not require.
> >
> > Signed-off-by: MyungJoo Ham <[email protected]>
> > Signed-off-by: Kyungmin Park <[email protected]>
> >
> > --
> > This patch depends on the patch
> > "PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency".
> > and the patch
> > "CPUfreq ondemand: update sampling rate without waiting for next
> > sampling"
> > ---
> > drivers/cpufreq/cpufreq_ondemand.c | 108 ++++++++++++++++++++++++++++++++----
> > 1 files changed, 96 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
> > index 2d66649..b9188f1 100644
> > --- a/drivers/cpufreq/cpufreq_ondemand.c
> > +++ b/drivers/cpufreq/cpufreq_ondemand.c
> > @@ -22,6 +22,7 @@
> > #include <linux/tick.h>
> > #include <linux/ktime.h>
> > #include <linux/sched.h>
> > +#include <linux/pm_qos.h>
> >
> > /*
> > * dbs is used in this file as a shortform for demandbased switching
> > @@ -93,6 +94,7 @@ struct cpu_dbs_info_s {
> > * when user is changing the governor or limits.
> > */
> > struct mutex timer_mutex;
> > + bool activated; /* dbs_timer_init is in effect */
> > };
> > static DEFINE_PER_CPU(struct cpu_dbs_info_s, od_cpu_dbs_info);
> >
> > @@ -111,6 +113,8 @@ static struct dbs_tuners {
> > unsigned int sampling_down_factor;
> > unsigned int powersave_bias;
> > unsigned int io_is_busy;
> > + struct notifier_block dvfs_lat_qos_db;
> > + unsigned int dvfs_lat_qos_wants;
> > } dbs_tuners_ins = {
> > .up_threshold = DEF_FREQUENCY_UP_THRESHOLD,
> > .sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR,
> > @@ -164,6 +168,23 @@ static inline cputime64_t get_cpu_iowait_time(unsigned int cpu, cputime64_t *wal
> > }
> >
> > /*
> > + * Find right sampling rate based on sampling_rate and
> > + * QoS requests on dvfs latency.
> > + */
> > +static unsigned int effective_sampling_rate(void)
> > +{
> > + unsigned int effective;
> > +
> > + if (dbs_tuners_ins.dvfs_lat_qos_wants)
> > + effective = min(dbs_tuners_ins.dvfs_lat_qos_wants,
> > + dbs_tuners_ins.sampling_rate);
> > + else
> > + effective = dbs_tuners_ins.sampling_rate;
> > +
> > + return max(effective, min_sampling_rate);
> > +}
> > +
> > +/*
> > * Find right freq to be set now with powersave_bias on.
> > * Returns the freq_hi to be used right now and will set freq_hi_jiffies,
> > * freq_lo, and freq_lo_jiffies in percpu area for averaging freqs.
> > @@ -207,7 +228,7 @@ static unsigned int powersave_bias_target(struct cpufreq_policy *policy,
> > dbs_info->freq_lo_jiffies = 0;
> > return freq_lo;
> > }
> > - jiffies_total = usecs_to_jiffies(dbs_tuners_ins.sampling_rate);
> > + jiffies_total = usecs_to_jiffies(effective_sampling_rate());
> > jiffies_hi = (freq_avg - freq_lo) * jiffies_total;
> > jiffies_hi += ((freq_hi - freq_lo) / 2);
> > jiffies_hi /= (freq_hi - freq_lo);
> > @@ -259,7 +280,8 @@ show_one(powersave_bias, powersave_bias);
> >
> > /**
> > * update_sampling_rate - update sampling rate effective immediately if needed.
> > - * @new_rate: new sampling rate
> > + * @new_rate: new sampling rate. if it is 0, regard sampling rate is not
> > + * changed and assume that qos request value is changed.
> > *
> > * If new rate is smaller than the old, simply updaing
> > * dbs_tuners_int.sampling_rate might not be appropriate. For example,
> > @@ -273,9 +295,13 @@ show_one(powersave_bias, powersave_bias);
> > static void update_sampling_rate(unsigned int new_rate)
> > {
> > int cpu;
> > + unsigned int effective;
> > +
> > +
> > + if (new_rate)
> > + dbs_tuners_ins.sampling_rate = max(new_rate, min_sampling_rate);
> >
> > - dbs_tuners_ins.sampling_rate = new_rate
> > - = max(new_rate, min_sampling_rate);
> > + effective = effective_sampling_rate();
> >
> > for_each_online_cpu(cpu) {
> > struct cpufreq_policy *policy;
> > @@ -283,21 +309,31 @@ static void update_sampling_rate(unsigned int new_rate)
> > struct timer_list *timer;
> > unsigned long appointed_at;
> >
> > + /*
> > + * mutex_destory(&dbs_info->timer_mutex) should not happen
> > + * in this context.
> > + */
> > + mutex_lock(&dbs_mutex);
> > +
> > policy = cpufreq_cpu_get(cpu);
> > if (!policy)
> > - continue;
> > + goto next;
> > dbs_info = &per_cpu(od_cpu_dbs_info, policy->cpu);
> > cpufreq_cpu_put(policy);
> >
> > + /* timer_mutex destroyed or will be destoyed soon */
> > + if (!dbs_info->activated)
> > + goto next;
> > +
> > mutex_lock(&dbs_info->timer_mutex);
> >
> > if (!delayed_work_pending(&dbs_info->work))
> > - goto next;
> > + goto next_timer_mutex;
> >
> > timer = &dbs_info->work.timer;
> > appointed_at = timer->expires;
> >
> > - if (time_before(jiffies + usecs_to_jiffies(new_rate),
> > + if (time_before(jiffies + usecs_to_jiffies(effective),
> > appointed_at)) {
> >
> > mutex_unlock(&dbs_info->timer_mutex);
> > @@ -305,12 +341,15 @@ static void update_sampling_rate(unsigned int new_rate)
> > mutex_lock(&dbs_info->timer_mutex);
> >
> > schedule_delayed_work_on(dbs_info->cpu, &dbs_info->work,
> > - usecs_to_jiffies(new_rate));
> > + usecs_to_jiffies(effective));
> >
> > }
> > -next:
> > +next_timer_mutex:
> > mutex_unlock(&dbs_info->timer_mutex);
> > +next:
> > + mutex_unlock(&dbs_mutex);
> > }
> > +
> > }
>
> I don't think gotos are helpful here. Can you use normal program
> structure or move it to subroutine...?

I agree with Pavel that gotos don't make that code particularly clear.

Thanks,
Rafael

2012-02-25 23:55:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] CPUfreq ondemand: update sampling rate without waiting for next sampling

On Wednesday, February 22, 2012, MyungJoo Ham wrote:
> When a new sampling rate is shorter than the current one, (e.g., 1 sec
> --> 10 ms) regardless how short the new one is, the current ondemand
> mechanism wait for the previously set timer to be expired.
>
> For example, if the user has just expressed that the sampling rate
> should be 10 ms from now and the previous was 1000 ms, the new rate may
> become effective 999 ms later, which could be not acceptable for the
> user if the user has intended to speed up sampling because the system is
> expected to react to CPU load fluctuation quickly from __now__.
>
> In order to address this issue, we need to cancel the previously set
> timer (schedule_delayed_work) and reset the timer if resetting timer is
> expected to trigger the delayed_work ealier.
>
> Signed-off-by: MyungJoo Ham <[email protected]>
> Signed-off-by: Kyungmin Park <[email protected]>
> ---
> drivers/cpufreq/cpufreq_ondemand.c | 58 +++++++++++++++++++++++++++++++++++-
> 1 files changed, 57 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
> index c3e0652..2d66649 100644
> --- a/drivers/cpufreq/cpufreq_ondemand.c
> +++ b/drivers/cpufreq/cpufreq_ondemand.c
> @@ -257,6 +257,62 @@ show_one(sampling_down_factor, sampling_down_factor);
> show_one(ignore_nice_load, ignore_nice);
> show_one(powersave_bias, powersave_bias);
>
> +/**
> + * update_sampling_rate - update sampling rate effective immediately if needed.
> + * @new_rate: new sampling rate
> + *
> + * If new rate is smaller than the old, simply updaing
> + * dbs_tuners_int.sampling_rate might not be appropriate. For example,
> + * if the original sampling_rate was 1 second and the requested new sampling
> + * rate is 10 ms because the user needs immediate reaction from ondemand
> + * governor, but not sure if higher frequency will be required or not,
> + * then, the governor may change the sampling rate too late; up to 1 second
> + * later. Thus, if we are reducing the sampling rate, we need to make the
> + * new value effective immediately.
> + */
> +static void update_sampling_rate(unsigned int new_rate)
> +{
> + int cpu;
> +
> + dbs_tuners_ins.sampling_rate = new_rate
> + = max(new_rate, min_sampling_rate);
> +
> + for_each_online_cpu(cpu) {
> + struct cpufreq_policy *policy;
> + struct cpu_dbs_info_s *dbs_info;
> + struct timer_list *timer;
> + unsigned long appointed_at;
> +
> + policy = cpufreq_cpu_get(cpu);
> + if (!policy)
> + continue;
> + dbs_info = &per_cpu(od_cpu_dbs_info, policy->cpu);
> + cpufreq_cpu_put(policy);
> +
> + mutex_lock(&dbs_info->timer_mutex);
> +
> + if (!delayed_work_pending(&dbs_info->work))
> + goto next;
> +

What about doing

mutex_unlock(&dbs_info->timer_mutex);
continue;

here instead of the jump?


> + timer = &dbs_info->work.timer;
> + appointed_at = timer->expires;
> +

I would do

next_sampling = jiffies + usecs_to_jiffies(new_rate);

and compare that with timer->expires. Then, the if () below would look better.
Or perhaps use new_rate_jiffies = usecs_to_jiffies(new_rate) and use that here
and below?

> + if (time_before(jiffies + usecs_to_jiffies(new_rate),
> + appointed_at)) {
> +
> + mutex_unlock(&dbs_info->timer_mutex);

I'm not sure if this isn't going to be racy. Have you verified that?

> + cancel_delayed_work_sync(&dbs_info->work);
> + mutex_lock(&dbs_info->timer_mutex);
> +
> + schedule_delayed_work_on(dbs_info->cpu, &dbs_info->work,
> + usecs_to_jiffies(new_rate));
> +
> + }
> +next:
> + mutex_unlock(&dbs_info->timer_mutex);
> + }
> +}
> +
> static ssize_t store_sampling_rate(struct kobject *a, struct attribute *b,
> const char *buf, size_t count)
> {
> @@ -265,7 +321,7 @@ static ssize_t store_sampling_rate(struct kobject *a, struct attribute *b,
> ret = sscanf(buf, "%u", &input);
> if (ret != 1)
> return -EINVAL;
> - dbs_tuners_ins.sampling_rate = max(input, min_sampling_rate);
> + update_sampling_rate(input);
> return count;
> }

Thanks,
Rafael

2012-02-27 06:14:32

by MyungJoo Ham

[permalink] [raw]
Subject: Re: [RFC PATCH] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency

On Sun, Feb 26, 2012 at 8:43 AM, Rafael J. Wysocki <[email protected]> wrote:
> On Tuesday, February 14, 2012, MyungJoo Ham wrote:
>> On Tue, Feb 14, 2012 at 11:26 AM, MyungJoo Ham <[email protected]> wrote:
>> > 1. CPU_DMA_THROUGHPUT
>> >
>> > This might look simliar to CPU_DMA_LATENCY. However, there are H/W
>> > blocks that creates QoS requirement based on DMA throughput, not
>> > latency, while their (those QoS requester H/W blocks) services are
>> > short-term bursts that cannot be effectively responsed by DVFS
>> > mechanisms (CPUFreq and Devfreq).
>> >
>> > In the Exynos4412 systems that are being tested, such H/W blocks include
>> > MFC (multi-function codec)'s decoding and enconding features, TV-out
>> > (including HDMI), and Cameras. When the display is operated at 60Hz,
>> > each chunk of task should be done within 16ms and the workload on DMA is
>> > not well spread and fluctuates between frames; some frame requires more
>> > and some do not and within a frame, the workload also fluctuates
>> > heavily and the tasks within a frame are usually not parallelized; they
>> > are processed through specific H/W blocks, not CPU cores. They often
>> > have PPMU capabilities; however, they need to be polled very frequently
>> > in order to let DVFS mechanisms react properly. (less than 5ms).
>> >
>> > For such specific tasks, allowing them to request QoS requirements seems
>> > adequete because DVFS mechanisms (as long as the polling rate is 5ms or
>> > longer) cannot follow up with them. Besides, the device drivers know
>> > when to request and cancel QoS exactly.
>> >
>> > 2. DVFS_LATENCY
>> >
>> > Both CPUFreq and Devfreq have response latency to a sudden workload
>> > increase. With near-100% (e.g., 95%) up-threshold, the average response
>> > latency is approximately 1.5 x polling-rate.
>> >
>> > A specific polling rate (e.g., 100ms) may generally fit for its system;
>> > however, there could be exceptions for that. For example,
>> > - When a user input suddenly starts: typing, clicking, moving cursors, and
>> > ?such, the user might need the full performance immediately. However,
>> > ?we do not know whether the full performance is actually needed or not
>> > ?until we calculate the utilization; thus, we need to calculate it
>> > ?faster with user inputs or any similar events. Specifying QoS on CPU
>> > ?processing power or Memory bandwidth at every user input is an
>> > ?overkill because there are many cases where such speed-up isn't
>> > ?necessary.
>> > - When a device driver needs a faster performance response from DVFS
>> > ?mechanism. This could be addressed by simply putting QoS requests.
>> > ?However, such QoS requests may keep the system running fast
>> > ?unnecessary in some cases, especially if a) the device's resource
>> > ?usage bursts with some duration (e.g., 100ms-long bursts) and
>> > ?b) the driver doesn't know when such burst come. MMC/WiFi often had
>> > ?such behaviors although there are possibilities that part (b) might
>> > ?be addressed with further efforts.
>> >
>> > The cases shown above can be tackled with putting QoS requests on the
>> > response time or latency of DVFS mechanism, which is directly related to
>> > its polling interval (if the DVFS mechanism is polling based).
>> >
>> > Signed-off-by: MyungJoo Ham <[email protected]>
>> > Signed-off-by: Kyungmin Park <[email protected]>
>>
>> In this PM-QoS patch, register_pm_qos_misc() for the new classes in
>> pm_qos_power_init() is missing.
>>
>> Those will be included in the next version of the patch.
>
> Has the new version been posted already? ?I seem to have missed it if so.
>
> Thanks,
> Rafael

No. I've not posted the new version, yet, as I was waiting for other
comments for this patch. I'll send a V2 patch with the omitted part
included today.

Thanks.


Cheers!
MyungJoo.


--
MyungJoo Ham, Ph.D.
System S/W Lab, S/W Center, Samsung Electronics

2012-02-27 06:22:25

by MyungJoo Ham

[permalink] [raw]
Subject: [PATCH v2] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency

1. CPU_DMA_THROUGHPUT

This might look simliar to CPU_DMA_LATENCY. However, there are H/W
blocks that creates QoS requirement based on DMA throughput, not
latency, while their (those QoS requester H/W blocks) services are
short-term bursts that cannot be effectively responsed by DVFS
mechanisms (CPUFreq and Devfreq).

In the Exynos4412 systems that are being tested, such H/W blocks include
MFC (multi-function codec)'s decoding and enconding features, TV-out
(including HDMI), and Cameras. When the display is operated at 60Hz,
each chunk of task should be done within 16ms and the workload on DMA is
not well spread and fluctuates between frames; some frame requires more
and some do not and within a frame, the workload also fluctuates
heavily and the tasks within a frame are usually not parallelized; they
are processed through specific H/W blocks, not CPU cores. They often
have PPMU capabilities; however, they need to be polled very frequently
in order to let DVFS mechanisms react properly. (less than 5ms).

For such specific tasks, allowing them to request QoS requirements seems
adequete because DVFS mechanisms (as long as the polling rate is 5ms or
longer) cannot follow up with them. Besides, the device drivers know
when to request and cancel QoS exactly.

2. DVFS_LATENCY

Both CPUFreq and Devfreq have response latency to a sudden workload
increase. With near-100% (e.g., 95%) up-threshold, the average response
latency is approximately 1.5 x polling-rate.

A specific polling rate (e.g., 100ms) may generally fit for its system;
however, there could be exceptions for that. For example,
- When a user input suddenly starts: typing, clicking, moving cursors, and
such, the user might need the full performance immediately. However,
we do not know whether the full performance is actually needed or not
until we calculate the utilization; thus, we need to calculate it
faster with user inputs or any similar events. Specifying QoS on CPU
processing power or Memory bandwidth at every user input is an
overkill because there are many cases where such speed-up isn't
necessary.
- When a device driver needs a faster performance response from DVFS
mechanism. This could be addressed by simply putting QoS requests.
However, such QoS requests may keep the system running fast
unnecessary in some cases, especially if a) the device's resource
usage bursts with some duration (e.g., 100ms-long bursts) and
b) the driver doesn't know when such burst come. MMC/WiFi often had
such behaviors although there are possibilities that part (b) might
be addressed with further efforts.

The cases shown above can be tackled with putting QoS requests on the
response time or latency of DVFS mechanism, which is directly related to
its polling interval (if the DVFS mechanism is polling based).

Signed-off-by: MyungJoo Ham <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>

--
Changes from RFC(v1)
- Added omitted part (registering new classes)
---
include/linux/pm_qos.h | 6 +++++-
kernel/power/qos.c | 48 +++++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
index e5bbcba..f8ccb7b 100644
--- a/include/linux/pm_qos.h
+++ b/include/linux/pm_qos.h
@@ -13,13 +13,17 @@
#define PM_QOS_CPU_DMA_LATENCY 1
#define PM_QOS_NETWORK_LATENCY 2
#define PM_QOS_NETWORK_THROUGHPUT 3
+#define PM_QOS_CPU_DMA_THROUGHPUT 4
+#define PM_QOS_DVFS_RESPONSE_LATENCY 5

-#define PM_QOS_NUM_CLASSES 4
+#define PM_QOS_NUM_CLASSES 6
#define PM_QOS_DEFAULT_VALUE -1

#define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
#define PM_QOS_NETWORK_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
#define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE 0
+#define PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE 0
+#define PM_QOS_DVFS_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
#define PM_QOS_DEV_LAT_DEFAULT_VALUE 0

struct pm_qos_request {
diff --git a/kernel/power/qos.c b/kernel/power/qos.c
index 995e3bd..c149af3 100644
--- a/kernel/power/qos.c
+++ b/kernel/power/qos.c
@@ -101,11 +101,40 @@ static struct pm_qos_object network_throughput_pm_qos = {
};


+static BLOCKING_NOTIFIER_HEAD(cpu_dma_throughput_notifier);
+static struct pm_qos_constraints cpu_dma_tput_constraints = {
+ .list = PLIST_HEAD_INIT(cpu_dma_tput_constraints.list),
+ .target_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
+ .default_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
+ .type = PM_QOS_MAX,
+ .notifiers = &cpu_dma_throughput_notifier,
+};
+static struct pm_qos_object cpu_dma_throughput_pm_qos = {
+ .constraints = &cpu_dma_tput_constraints,
+ .name = "cpu_dma_throughput",
+};
+
+
+static BLOCKING_NOTIFIER_HEAD(dvfs_lat_notifier);
+static struct pm_qos_constraints dvfs_lat_constraints = {
+ .list = PLIST_HEAD_INIT(dvfs_lat_constraints.list),
+ .target_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
+ .default_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
+ .type = PM_QOS_MIN,
+ .notifiers = &dvfs_lat_notifier,
+};
+static struct pm_qos_object dvfs_lat_pm_qos = {
+ .constraints = &dvfs_lat_constraints,
+ .name = "dvfs_latency",
+};
+
static struct pm_qos_object *pm_qos_array[] = {
&null_pm_qos,
&cpu_dma_pm_qos,
&network_lat_pm_qos,
- &network_throughput_pm_qos
+ &network_throughput_pm_qos,
+ &cpu_dma_throughput_pm_qos,
+ &dvfs_lat_pm_qos,
};

static ssize_t pm_qos_power_write(struct file *filp, const char __user *buf,
@@ -472,18 +501,27 @@ static int __init pm_qos_power_init(void)

ret = register_pm_qos_misc(&cpu_dma_pm_qos);
if (ret < 0) {
- printk(KERN_ERR "pm_qos_param: cpu_dma_latency setup failed\n");
+ pr_err("pm_qos_param: cpu_dma_latency setup failed\n");
return ret;
}
ret = register_pm_qos_misc(&network_lat_pm_qos);
if (ret < 0) {
- printk(KERN_ERR "pm_qos_param: network_latency setup failed\n");
+ pr_err("pm_qos_param: network_latency setup failed\n");
return ret;
}
ret = register_pm_qos_misc(&network_throughput_pm_qos);
+ if (ret < 0) {
+ pr_err("pm_qos_param: network_throughput setup failed\n");
+ return ret;
+ }
+ ret = register_pm_qos_misc(&cpu_dma_throughput_pm_qos);
+ if (ret < 0) {
+ pr_err("pm_qos_param: cpu_dma_throughput setup failed\n");
+ return ret;
+ }
+ ret = register_pm_qos_misc(&dvfs_lat_pm_qos);
if (ret < 0)
- printk(KERN_ERR
- "pm_qos_param: network_throughput setup failed\n");
+ pr_err("pm_qos_param: dvfs_latency setup failed\n");

return ret;
}
--
1.7.4.1

2012-02-28 00:39:20

by MyungJoo Ham

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] CPUfreq ondemand: handle QoS request on DVFS response latency

On Sun, Feb 26, 2012 at 8:47 AM, Rafael J. Wysocki <[email protected]> wrote:
> On Saturday, February 25, 2012, Pavel Machek wrote:
>> On Wed 2012-02-22 17:03:35, MyungJoo Ham wrote:
>> > With QoS class, DVFS_RESPONSE_LATENCY, users (device drivers and
>> > userspace processes) may express the desired maximum response latency
>> > from DVFS mechanisms such as CPUfreq's ondemand governors. Based on such
>> > QoS requests, the ondemand governor may flexibly adjust sampling rate
>> > accordingly unless it goes below the min_sampling_rate.
>> >
>> > The benefit of having DVFS_RESPONSE_LATENCY is to have faster response
>> > from user inputs (mouse clicks, keyboard inputs, touchscreen touches,
>> > and others) without increasing frequency unconditionally. Because some
>> > input events may not require any performance increases, increasing the
>> > frequency unconditionally for inputs may simply consume too much energy.
>> > Adjusting sampling rate based on user inputs enabled to increase
>> > frequency with less latency if it requires and not to increase frequency
>> > if it does not require.
>> >
>> > Signed-off-by: MyungJoo Ham <[email protected]>
>> > Signed-off-by: Kyungmin Park <[email protected]>
>> >
>> > --
>> > This patch depends on the patch
>> > "PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency".
>> > and the patch
>> > "CPUfreq ondemand: update sampling rate without waiting for next
>> > sampling"
>> > ---
>> > ?drivers/cpufreq/cpufreq_ondemand.c | ?108 ++++++++++++++++++++++++++++++++----
>> > ?1 files changed, 96 insertions(+), 12 deletions(-)
>> >
>> > diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
>> > index 2d66649..b9188f1 100644
>> > --- a/drivers/cpufreq/cpufreq_ondemand.c
>> > +++ b/drivers/cpufreq/cpufreq_ondemand.c
>> > @@ -22,6 +22,7 @@
>> > ?#include <linux/tick.h>
>> > ?#include <linux/ktime.h>
>> > ?#include <linux/sched.h>
>> > +#include <linux/pm_qos.h>
>> >
>> > ?/*
>> > ? * dbs is used in this file as a shortform for demandbased switching
>> > @@ -93,6 +94,7 @@ struct cpu_dbs_info_s {
>> > ? ? ?* when user is changing the governor or limits.
>> > ? ? ?*/
>> > ? ? struct mutex timer_mutex;
>> > + ? bool activated; /* dbs_timer_init is in effect */
>> > ?};
>> > ?static DEFINE_PER_CPU(struct cpu_dbs_info_s, od_cpu_dbs_info);
>> >
>> > @@ -111,6 +113,8 @@ static struct dbs_tuners {
>> > ? ? unsigned int sampling_down_factor;
>> > ? ? unsigned int powersave_bias;
>> > ? ? unsigned int io_is_busy;
>> > + ? struct notifier_block dvfs_lat_qos_db;
>> > + ? unsigned int dvfs_lat_qos_wants;
>> > ?} dbs_tuners_ins = {
>> > ? ? .up_threshold = DEF_FREQUENCY_UP_THRESHOLD,
>> > ? ? .sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR,
>> > @@ -164,6 +168,23 @@ static inline cputime64_t get_cpu_iowait_time(unsigned int cpu, cputime64_t *wal
>> > ?}
>> >
>> > ?/*
>> > + * Find right sampling rate based on sampling_rate and
>> > + * QoS requests on dvfs latency.
>> > + */
>> > +static unsigned int effective_sampling_rate(void)
>> > +{
>> > + ? unsigned int effective;
>> > +
>> > + ? if (dbs_tuners_ins.dvfs_lat_qos_wants)
>> > + ? ? ? ? ? effective = min(dbs_tuners_ins.dvfs_lat_qos_wants,
>> > + ? ? ? ? ? ? ? ? ? ? ? ? ? dbs_tuners_ins.sampling_rate);
>> > + ? else
>> > + ? ? ? ? ? effective = dbs_tuners_ins.sampling_rate;
>> > +
>> > + ? return max(effective, min_sampling_rate);
>> > +}
>> > +
>> > +/*
>> > ? * Find right freq to be set now with powersave_bias on.
>> > ? * Returns the freq_hi to be used right now and will set freq_hi_jiffies,
>> > ? * freq_lo, and freq_lo_jiffies in percpu area for averaging freqs.
>> > @@ -207,7 +228,7 @@ static unsigned int powersave_bias_target(struct cpufreq_policy *policy,
>> > ? ? ? ? ? ? dbs_info->freq_lo_jiffies = 0;
>> > ? ? ? ? ? ? return freq_lo;
>> > ? ? }
>> > - ? jiffies_total = usecs_to_jiffies(dbs_tuners_ins.sampling_rate);
>> > + ? jiffies_total = usecs_to_jiffies(effective_sampling_rate());
>> > ? ? jiffies_hi = (freq_avg - freq_lo) * jiffies_total;
>> > ? ? jiffies_hi += ((freq_hi - freq_lo) / 2);
>> > ? ? jiffies_hi /= (freq_hi - freq_lo);
>> > @@ -259,7 +280,8 @@ show_one(powersave_bias, powersave_bias);
>> >
>> > ?/**
>> > ? * update_sampling_rate - update sampling rate effective immediately if needed.
>> > - * @new_rate: new sampling rate
>> > + * @new_rate: new sampling rate. if it is 0, regard sampling rate is not
>> > + * ? ? ? ? changed and assume that qos request value is changed.
>> > ? *
>> > ? * If new rate is smaller than the old, simply updaing
>> > ? * dbs_tuners_int.sampling_rate might not be appropriate. For example,
>> > @@ -273,9 +295,13 @@ show_one(powersave_bias, powersave_bias);
>> > ?static void update_sampling_rate(unsigned int new_rate)
>> > ?{
>> > ? ? int cpu;
>> > + ? unsigned int effective;
>> > +
>> > +
>> > + ? if (new_rate)
>> > + ? ? ? ? ? dbs_tuners_ins.sampling_rate = max(new_rate, min_sampling_rate);
>> >
>> > - ? dbs_tuners_ins.sampling_rate = new_rate
>> > - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?= max(new_rate, min_sampling_rate);
>> > + ? effective = effective_sampling_rate();
>> >
>> > ? ? for_each_online_cpu(cpu) {
>> > ? ? ? ? ? ? struct cpufreq_policy *policy;
>> > @@ -283,21 +309,31 @@ static void update_sampling_rate(unsigned int new_rate)
>> > ? ? ? ? ? ? struct timer_list *timer;
>> > ? ? ? ? ? ? unsigned long appointed_at;
>> >
>> > + ? ? ? ? ? /*
>> > + ? ? ? ? ? ?* mutex_destory(&dbs_info->timer_mutex) should not happen
>> > + ? ? ? ? ? ?* in this context.
>> > + ? ? ? ? ? ?*/
>> > + ? ? ? ? ? mutex_lock(&dbs_mutex);
>> > +
>> > ? ? ? ? ? ? policy = cpufreq_cpu_get(cpu);
>> > ? ? ? ? ? ? if (!policy)
>> > - ? ? ? ? ? ? ? ? ? continue;
>> > + ? ? ? ? ? ? ? ? ? goto next;
>> > ? ? ? ? ? ? dbs_info = &per_cpu(od_cpu_dbs_info, policy->cpu);
>> > ? ? ? ? ? ? cpufreq_cpu_put(policy);
>> >
>> > + ? ? ? ? ? /* timer_mutex destroyed or will be destoyed soon */
>> > + ? ? ? ? ? if (!dbs_info->activated)
>> > + ? ? ? ? ? ? ? ? ? goto next;
>> > +
>> > ? ? ? ? ? ? mutex_lock(&dbs_info->timer_mutex);
>> >
>> > ? ? ? ? ? ? if (!delayed_work_pending(&dbs_info->work))
>> > - ? ? ? ? ? ? ? ? ? goto next;
>> > + ? ? ? ? ? ? ? ? ? goto next_timer_mutex;
>> >
>> > ? ? ? ? ? ? timer = &dbs_info->work.timer;
>> > ? ? ? ? ? ? appointed_at = timer->expires;
>> >
>> > - ? ? ? ? ? if (time_before(jiffies + usecs_to_jiffies(new_rate),
>> > + ? ? ? ? ? if (time_before(jiffies + usecs_to_jiffies(effective),
>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? appointed_at)) {
>> >
>> > ? ? ? ? ? ? ? ? ? ? mutex_unlock(&dbs_info->timer_mutex);
>> > @@ -305,12 +341,15 @@ static void update_sampling_rate(unsigned int new_rate)
>> > ? ? ? ? ? ? ? ? ? ? mutex_lock(&dbs_info->timer_mutex);
>> >
>> > ? ? ? ? ? ? ? ? ? ? schedule_delayed_work_on(dbs_info->cpu, &dbs_info->work,
>> > - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?usecs_to_jiffies(new_rate));
>> > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?usecs_to_jiffies(effective));
>> >
>> > ? ? ? ? ? ? }
>> > -next:
>> > +next_timer_mutex:
>> > ? ? ? ? ? ? mutex_unlock(&dbs_info->timer_mutex);
>> > +next:
>> > + ? ? ? ? ? mutex_unlock(&dbs_mutex);
>> > ? ? }
>> > +
>> > ?}
>>
>> I don't think gotos are helpful here. Can you use normal program
>> structure or move it to subroutine...?
>
> I agree with Pavel that gotos don't make that code particularly clear.
>
> Thanks,
> Rafael

Ok, I'll let it use a normal if/then/else structure there.

Thanks.


Cheers!
MyungJoo.

--
MyungJoo Ham, Ph.D.
System S/W Lab, S/W Center, Samsung Electronics

2012-02-28 02:19:26

by MyungJoo Ham

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] CPUfreq ondemand: update sampling rate without waiting for next sampling

2012/2/26 Rafael J. Wysocki <[email protected]>:
> On Wednesday, February 22, 2012, MyungJoo Ham wrote:
>> When a new sampling rate is shorter than the current one, (e.g., 1 sec
>> --> 10 ms) regardless how short the new one is, the current ondemand
>> mechanism wait for the previously set timer to be expired.
>>
>> For example, if the user has just expressed that the sampling rate
>> should be 10 ms from now and the previous was 1000 ms, the new rate may
>> become effective 999 ms later, which could be not acceptable for the
>> user if the user has intended to speed up sampling because the system is
>> expected to react to CPU load fluctuation quickly from __now__.
>>
>> In order to address this issue, we need to cancel the previously set
>> timer (schedule_delayed_work) and reset the timer if resetting timer is
>> expected to trigger the delayed_work ealier.
>>
>> Signed-off-by: MyungJoo Ham <[email protected]>
>> Signed-off-by: Kyungmin Park <[email protected]>
>> ---
[]
>> +
>> + ? ? ? ? ? ? if (!delayed_work_pending(&dbs_info->work))
>> + ? ? ? ? ? ? ? ? ? ? goto next;
>> +
>
> What about doing
>
> ? ? ? ? ? ? ? ? ? ? ? ?mutex_unlock(&dbs_info->timer_mutex);
> ? ? ? ? ? ? ? ? ? ? ? ?continue;
>
> here instead of the jump?
>

Like patch 2/2 of this patchset, I'll remove goto in the loop.

>
>> + ? ? ? ? ? ? timer = &dbs_info->work.timer;
>> + ? ? ? ? ? ? appointed_at = timer->expires;
>> +
>
> I would do
>
> ? ? ? ? ? ? ? ?next_sampling = jiffies + usecs_to_jiffies(new_rate);
>
> and compare that with timer->expires. ?Then, the if () below would look better.
> Or perhaps use new_rate_jiffies = usecs_to_jiffies(new_rate) and use that here
> and below?
>

Oh.. yes, this surely looks ugly. I'll update this.

>> + ? ? ? ? ? ? if (time_before(jiffies + usecs_to_jiffies(new_rate),
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? appointed_at)) {
>> +
>> + ? ? ? ? ? ? ? ? ? ? mutex_unlock(&dbs_info->timer_mutex);
>
> I'm not sure if this isn't going to be racy. ?Have you verified that?

This unlock is added to avoid race condition against do_dbs_timer().
Unless the delay is 0 (polling_interval = 0ms?), do_dbs_timer() or
mutex_lock() took several ms, do_dbs_timer() won't be running again
holding the mutex after cancel_delayed_work_sync().

I've tested a few cases only backported on kernel 3.0; however, I'll
do more extensive testing on this part before submitting the next
iteration of the patchset and try to run this code with 3.3-rc5.

>
>> + ? ? ? ? ? ? ? ? ? ? cancel_delayed_work_sync(&dbs_info->work);
>> + ? ? ? ? ? ? ? ? ? ? mutex_lock(&dbs_info->timer_mutex);
>> +
>> + ? ? ? ? ? ? ? ? ? ? schedule_delayed_work_on(dbs_info->cpu, &dbs_info->work,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?usecs_to_jiffies(new_rate));
>> +
>> + ? ? ? ? ? ? }
>> +next:
>> + ? ? ? ? ? ? mutex_unlock(&dbs_info->timer_mutex);
>> + ? ? }
>> +}
>> +
>> ?static ssize_t store_sampling_rate(struct kobject *a, struct attribute *b,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?const char *buf, size_t count)
>> ?{
>> @@ -265,7 +321,7 @@ static ssize_t store_sampling_rate(struct kobject *a, struct attribute *b,
>> ? ? ? ret = sscanf(buf, "%u", &input);
>> ? ? ? if (ret != 1)
>> ? ? ? ? ? ? ? return -EINVAL;
>> - ? ? dbs_tuners_ins.sampling_rate = max(input, min_sampling_rate);
>> + ? ? update_sampling_rate(input);
>> ? ? ? return count;
>> ?}
>
> Thanks,
> Rafael

Thank you.


Cheers!
MyungJoo.

--
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab, DMC Business, Samsung Electronics

2012-02-29 08:54:36

by MyungJoo Ham

[permalink] [raw]
Subject: [PATCH v2 0/2] CPUFreq / Ondemand update

1. A new sampling rate should be effective immediately at update.

2. The sampling rate may be controlled through QoS requests.

Summory of changes since v1 patchset (RFC)
- Style changes
- Avoid the possiblity that a destroyed mutex may be used with QoS request.


MyungJoo Ham (2):
CPUfreq ondemand: update sampling rate without waiting for next
sampling
CPUfreq ondemand: handle QoS request on DVFS response latency

drivers/cpufreq/cpufreq_ondemand.c | 166 ++++++++++++++++++++++++++++++++++-
1 files changed, 161 insertions(+), 5 deletions(-)

--
1.7.4.1

2012-02-29 08:54:35

by MyungJoo Ham

[permalink] [raw]
Subject: [PATCH v2 1/2] CPUfreq ondemand: update sampling rate without waiting for next sampling

When a new sampling rate is shorter than the current one, (e.g., 1 sec
--> 10 ms) regardless how short the new one is, the current ondemand
mechanism wait for the previously set timer to be expired.

For example, if the user has just expressed that the sampling rate
should be 10 ms from now and the previous was 1000 ms, the new rate may
become effective 999 ms later, which could be not acceptable for the
user if the user has intended to speed up sampling because the system is
expected to react to CPU load fluctuation quickly from __now__.

In order to address this issue, we need to cancel the previously set
timer (schedule_delayed_work) and reset the timer if resetting timer is
expected to trigger the delayed_work ealier.

Signed-off-by: MyungJoo Ham <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>

---
Changes from v1(RFC)
- Style updates
---
drivers/cpufreq/cpufreq_ondemand.c | 58 +++++++++++++++++++++++++++++++++++-
1 files changed, 57 insertions(+), 1 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index c3e0652..836e9b0 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -257,6 +257,62 @@ show_one(sampling_down_factor, sampling_down_factor);
show_one(ignore_nice_load, ignore_nice);
show_one(powersave_bias, powersave_bias);

+/**
+ * update_sampling_rate - update sampling rate effective immediately if needed.
+ * @new_rate: new sampling rate
+ *
+ * If new rate is smaller than the old, simply updaing
+ * dbs_tuners_int.sampling_rate might not be appropriate. For example,
+ * if the original sampling_rate was 1 second and the requested new sampling
+ * rate is 10 ms because the user needs immediate reaction from ondemand
+ * governor, but not sure if higher frequency will be required or not,
+ * then, the governor may change the sampling rate too late; up to 1 second
+ * later. Thus, if we are reducing the sampling rate, we need to make the
+ * new value effective immediately.
+ */
+static void update_sampling_rate(unsigned int new_rate)
+{
+ int cpu;
+
+ dbs_tuners_ins.sampling_rate = new_rate
+ = max(new_rate, min_sampling_rate);
+
+ for_each_online_cpu(cpu) {
+ struct cpufreq_policy *policy;
+ struct cpu_dbs_info_s *dbs_info;
+ unsigned long next_sampling, appointed_at;
+
+ policy = cpufreq_cpu_get(cpu);
+ if (!policy)
+ continue;
+ dbs_info = &per_cpu(od_cpu_dbs_info, policy->cpu);
+ cpufreq_cpu_put(policy);
+
+ mutex_lock(&dbs_info->timer_mutex);
+
+ if (!delayed_work_pending(&dbs_info->work)) {
+ mutex_unlock(&dbs_info->timer_mutex);
+ continue;
+ }
+
+ next_sampling = jiffies + usecs_to_jiffies(new_rate);
+ appointed_at = dbs_info->work.timer.expires;
+
+
+ if (time_before(next_sampling, appointed_at)) {
+
+ mutex_unlock(&dbs_info->timer_mutex);
+ cancel_delayed_work_sync(&dbs_info->work);
+ mutex_lock(&dbs_info->timer_mutex);
+
+ schedule_delayed_work_on(dbs_info->cpu, &dbs_info->work,
+ usecs_to_jiffies(new_rate));
+
+ }
+ mutex_unlock(&dbs_info->timer_mutex);
+ }
+}
+
static ssize_t store_sampling_rate(struct kobject *a, struct attribute *b,
const char *buf, size_t count)
{
@@ -265,7 +321,7 @@ static ssize_t store_sampling_rate(struct kobject *a, struct attribute *b,
ret = sscanf(buf, "%u", &input);
if (ret != 1)
return -EINVAL;
- dbs_tuners_ins.sampling_rate = max(input, min_sampling_rate);
+ update_sampling_rate(input);
return count;
}

--
1.7.4.1

2012-02-29 08:55:01

by MyungJoo Ham

[permalink] [raw]
Subject: [PATCH v2 2/2] CPUfreq ondemand: handle QoS request on DVFS response latency

With QoS class, DVFS_RESPONSE_LATENCY, users (device drivers and
userspace processes) may express the desired maximum response latency
from DVFS mechanisms such as CPUfreq's ondemand governors. Based on such
QoS requests, the ondemand governor may flexibly adjust sampling rate
accordingly unless it goes below the min_sampling_rate.

The benefit of having DVFS_RESPONSE_LATENCY is to have faster response
from user inputs (mouse clicks, keyboard inputs, touchscreen touches,
and others) without increasing frequency unconditionally. Because some
input events may not require any performance increases, increasing the
frequency unconditionally for inputs may simply consume too much energy.
Adjusting sampling rate based on user inputs enabled to increase
frequency with less latency if it requires and not to increase frequency
if it does not require.

Signed-off-by: MyungJoo Ham <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>

--
This patch depends on the patch
"PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency".
and the patch
"CPUfreq ondemand: update sampling rate without waiting for next
sampling"

Changes from v1(RFC)
- Style updates
- Avoid the possibility that a destoryed mutex may be used.
---
drivers/cpufreq/cpufreq_ondemand.c | 120 +++++++++++++++++++++++++++++++++---
1 files changed, 110 insertions(+), 10 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 836e9b0..f0df66d 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -22,6 +22,7 @@
#include <linux/tick.h>
#include <linux/ktime.h>
#include <linux/sched.h>
+#include <linux/pm_qos.h>

/*
* dbs is used in this file as a shortform for demandbased switching
@@ -93,6 +94,7 @@ struct cpu_dbs_info_s {
* when user is changing the governor or limits.
*/
struct mutex timer_mutex;
+ bool activated; /* dbs_timer_init is in effect */
};
static DEFINE_PER_CPU(struct cpu_dbs_info_s, od_cpu_dbs_info);

@@ -111,6 +113,8 @@ static struct dbs_tuners {
unsigned int sampling_down_factor;
unsigned int powersave_bias;
unsigned int io_is_busy;
+ struct notifier_block dvfs_lat_qos_db;
+ unsigned int dvfs_lat_qos_wants;
} dbs_tuners_ins = {
.up_threshold = DEF_FREQUENCY_UP_THRESHOLD,
.sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR,
@@ -164,6 +168,23 @@ static inline cputime64_t get_cpu_iowait_time(unsigned int cpu, cputime64_t *wal
}

/*
+ * Find right sampling rate based on sampling_rate and
+ * QoS requests on dvfs latency.
+ */
+static unsigned int effective_sampling_rate(void)
+{
+ unsigned int effective;
+
+ if (dbs_tuners_ins.dvfs_lat_qos_wants)
+ effective = min(dbs_tuners_ins.dvfs_lat_qos_wants,
+ dbs_tuners_ins.sampling_rate);
+ else
+ effective = dbs_tuners_ins.sampling_rate;
+
+ return max(effective, min_sampling_rate);
+}
+
+/*
* Find right freq to be set now with powersave_bias on.
* Returns the freq_hi to be used right now and will set freq_hi_jiffies,
* freq_lo, and freq_lo_jiffies in percpu area for averaging freqs.
@@ -207,7 +228,7 @@ static unsigned int powersave_bias_target(struct cpufreq_policy *policy,
dbs_info->freq_lo_jiffies = 0;
return freq_lo;
}
- jiffies_total = usecs_to_jiffies(dbs_tuners_ins.sampling_rate);
+ jiffies_total = usecs_to_jiffies(effective_sampling_rate());
jiffies_hi = (freq_avg - freq_lo) * jiffies_total;
jiffies_hi += ((freq_hi - freq_lo) / 2);
jiffies_hi /= (freq_hi - freq_lo);
@@ -259,7 +280,8 @@ show_one(powersave_bias, powersave_bias);

/**
* update_sampling_rate - update sampling rate effective immediately if needed.
- * @new_rate: new sampling rate
+ * @new_rate: new sampling rate. If it is 0, regard sampling rate is not
+ * changed and assume that qos request value is changed.
*
* If new rate is smaller than the old, simply updaing
* dbs_tuners_int.sampling_rate might not be appropriate. For example,
@@ -273,32 +295,51 @@ show_one(powersave_bias, powersave_bias);
static void update_sampling_rate(unsigned int new_rate)
{
int cpu;
+ unsigned int effective;
+

- dbs_tuners_ins.sampling_rate = new_rate
- = max(new_rate, min_sampling_rate);
+ if (new_rate)
+ dbs_tuners_ins.sampling_rate = max(new_rate, min_sampling_rate);
+
+ effective = effective_sampling_rate();

for_each_online_cpu(cpu) {
struct cpufreq_policy *policy;
struct cpu_dbs_info_s *dbs_info;
unsigned long next_sampling, appointed_at;

+ /*
+ * mutex_destory(&dbs_info->timer_mutex) should not happen
+ * in this context. dbs_mutex is locked/unlocked at GOV_START
+ * and GOV_STOP context only other than here.
+ */
+ mutex_lock(&dbs_mutex);
+
policy = cpufreq_cpu_get(cpu);
- if (!policy)
+ if (!policy) {
+ mutex_unlock(&dbs_mutex);
continue;
+ }
dbs_info = &per_cpu(od_cpu_dbs_info, policy->cpu);
cpufreq_cpu_put(policy);

+ /* timer_mutex is destroyed or will be destroyed soon */
+ if (!dbs_info->activated) {
+ mutex_unlock(&dbs_mutex);
+ continue;
+ }
+
mutex_lock(&dbs_info->timer_mutex);

if (!delayed_work_pending(&dbs_info->work)) {
mutex_unlock(&dbs_info->timer_mutex);
+ mutex_unlock(&dbs_mutex);
continue;
}

next_sampling = jiffies + usecs_to_jiffies(new_rate);
appointed_at = dbs_info->work.timer.expires;

-
if (time_before(next_sampling, appointed_at)) {

mutex_unlock(&dbs_info->timer_mutex);
@@ -306,10 +347,24 @@ static void update_sampling_rate(unsigned int new_rate)
mutex_lock(&dbs_info->timer_mutex);

schedule_delayed_work_on(dbs_info->cpu, &dbs_info->work,
- usecs_to_jiffies(new_rate));
+ usecs_to_jiffies(effective));

}
mutex_unlock(&dbs_info->timer_mutex);
+
+ /*
+ * For the little possiblity that dbs_timer_exit() has been
+ * called after checking dbs_info->activated above.
+ * If cancel_delayed_work_syn() has been calld by
+ * dbs_timer_exit() before schedule_delayed_work_on() of this
+ * function, it should be revoked by calling cancel again
+ * before releasing dbs_mutex, which will trigger mutex_destroy
+ * to be called.
+ */
+ if (!dbs_info->activated)
+ cancel_delayed_work_sync(&dbs_info->work);
+
+ mutex_unlock(&dbs_mutex);
}
}

@@ -620,7 +675,7 @@ static void do_dbs_timer(struct work_struct *work)
/* We want all CPUs to do sampling nearly on
* same jiffy
*/
- delay = usecs_to_jiffies(dbs_tuners_ins.sampling_rate
+ delay = usecs_to_jiffies(effective_sampling_rate()
* dbs_info->rate_mult);

if (num_online_cpus() > 1)
@@ -638,7 +693,7 @@ static void do_dbs_timer(struct work_struct *work)
static inline void dbs_timer_init(struct cpu_dbs_info_s *dbs_info)
{
/* We want all CPUs to do sampling nearly on same jiffy */
- int delay = usecs_to_jiffies(dbs_tuners_ins.sampling_rate);
+ int delay = usecs_to_jiffies(effective_sampling_rate());

if (num_online_cpus() > 1)
delay -= jiffies % delay;
@@ -646,10 +701,12 @@ static inline void dbs_timer_init(struct cpu_dbs_info_s *dbs_info)
dbs_info->sample_type = DBS_NORMAL_SAMPLE;
INIT_DELAYED_WORK_DEFERRABLE(&dbs_info->work, do_dbs_timer);
schedule_delayed_work_on(dbs_info->cpu, &dbs_info->work, delay);
+ dbs_info->activated = true;
}

static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
{
+ dbs_info->activated = false;
cancel_delayed_work_sync(&dbs_info->work);
}

@@ -767,10 +824,39 @@ static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
return 0;
}

+/**
+ * qos_dvfs_lat_notify - PM QoS Notifier for DVFS_LATENCY QoS Request
+ * @nb notifier block struct
+ * @value QoS value
+ * @dummy
+ */
+static int qos_dvfs_lat_notify(struct notifier_block *nb, unsigned long value,
+ void *dummy)
+{
+ /*
+ * In the worst case, with a continuous up-treshold + e cpu load
+ * from up-threshold - e load, the ondemand governor will react
+ * sampling_rate * 2.
+ *
+ * Thus, based on the worst case scenario, we use value / 2;
+ */
+ dbs_tuners_ins.dvfs_lat_qos_wants = value / 2;
+
+ /* Update sampling rate */
+ update_sampling_rate(0);
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block ondemand_qos_dvfs_lat_nb = {
+ .notifier_call = qos_dvfs_lat_notify,
+};
+
static int __init cpufreq_gov_dbs_init(void)
{
u64 idle_time;
int cpu = get_cpu();
+ int err = 0;

idle_time = get_cpu_idle_time_us(cpu, NULL);
put_cpu();
@@ -791,11 +877,25 @@ static int __init cpufreq_gov_dbs_init(void)
MIN_SAMPLING_RATE_RATIO * jiffies_to_usecs(10);
}

- return cpufreq_register_governor(&cpufreq_gov_ondemand);
+ err = pm_qos_add_notifier(PM_QOS_DVFS_RESPONSE_LATENCY,
+ &ondemand_qos_dvfs_lat_nb);
+ if (err)
+ return err;
+
+ err = cpufreq_register_governor(&cpufreq_gov_ondemand);
+ if (err) {
+ pm_qos_remove_notifier(PM_QOS_DVFS_RESPONSE_LATENCY,
+ &ondemand_qos_dvfs_lat_nb);
+ }
+
+ return err;
}

static void __exit cpufreq_gov_dbs_exit(void)
{
+ pm_qos_remove_notifier(PM_QOS_DVFS_RESPONSE_LATENCY,
+ &ondemand_qos_dvfs_lat_nb);
+
cpufreq_unregister_governor(&cpufreq_gov_ondemand);
}

--
1.7.4.1