2011-02-10 01:20:36

by Tim Chen

[permalink] [raw]
Subject: [Patch] idle governor: Avoid lock acquisition to read pm_qos before entering idle

I noticed that before entering idle state, the menu idle governor will
look up the current pm_qos value according to the list of qos request
received. This look up currently needs the acquisition of a lock to go
down a list of qos requests to find the qos value, slowing down the
entrance into idle state due to contention by multiple cpus to traverse
this list. The contention is severe when there are a lot of cpus waking
and going into idle. For example, for a simple workload that has 32
pair of processes ping ponging messages to each other, where 64 cores
cores are active in test system, I see the following profile:

- 37.82% swapper [kernel.kallsyms] [k]
_raw_spin_lock_irqsave
- _raw_spin_lock_irqsave
- 95.65% pm_qos_request
menu_select
cpuidle_idle_call
- cpu_idle
99.98% start_secondary

Perhaps a better approach will be to cache the updated pm_qos value so
reading it does not require lock acquisition as in the patch below.

Tim

Signed-off-by: Tim Chen <[email protected]>
diff --git a/include/linux/pm_qos_params.h b/include/linux/pm_qos_params.h
index 77cbddb..a7d87f9 100644
--- a/include/linux/pm_qos_params.h
+++ b/include/linux/pm_qos_params.h
@@ -16,6 +16,10 @@
#define PM_QOS_NUM_CLASSES 4
#define PM_QOS_DEFAULT_VALUE -1

+#define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
+#define PM_QOS_NETWORK_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
+#define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE 0
+
struct pm_qos_request_list {
struct plist_node list;
int pm_qos_class;
diff --git a/kernel/pm_qos_params.c b/kernel/pm_qos_params.c
index aeaa7f8..b6310d1 100644
--- a/kernel/pm_qos_params.c
+++ b/kernel/pm_qos_params.c
@@ -58,6 +58,7 @@ struct pm_qos_object {
struct blocking_notifier_head *notifiers;
struct miscdevice pm_qos_power_miscdev;
char *name;
+ s32 value;
s32 default_value;
enum pm_qos_type type;
};
@@ -70,7 +71,8 @@ static struct pm_qos_object cpu_dma_pm_qos = {
.requests = PLIST_HEAD_INIT(cpu_dma_pm_qos.requests, pm_qos_lock),
.notifiers = &cpu_dma_lat_notifier,
.name = "cpu_dma_latency",
- .default_value = 2000 * USEC_PER_SEC,
+ .value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
+ .default_value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
.type = PM_QOS_MIN,
};

@@ -79,7 +81,8 @@ static struct pm_qos_object network_lat_pm_qos = {
.requests = PLIST_HEAD_INIT(network_lat_pm_qos.requests, pm_qos_lock),
.notifiers = &network_lat_notifier,
.name = "network_latency",
- .default_value = 2000 * USEC_PER_SEC,
+ .value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
+ .default_value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
.type = PM_QOS_MIN
};

@@ -89,7 +92,8 @@ static struct pm_qos_object network_throughput_pm_qos = {
.requests = PLIST_HEAD_INIT(network_throughput_pm_qos.requests, pm_qos_lock),
.notifiers = &network_throughput_notifier,
.name = "network_throughput",
- .default_value = 0,
+ .value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
+ .default_value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
.type = PM_QOS_MAX,
};

@@ -132,6 +136,16 @@ static inline int pm_qos_get_value(struct pm_qos_object *o)
}
}

+static inline s32 pm_qos_read_value(struct pm_qos_object *o)
+{
+ return o->value;
+}
+
+static inline int pm_qos_set_value(struct pm_qos_object *o, s32 value)
+{
+ o->value = value;
+}
+
static void update_target(struct pm_qos_object *o, struct plist_node *node,
int del, int value)
{
@@ -156,6 +170,7 @@ static void update_target(struct pm_qos_object *o, struct plist_node *node,
plist_add(node, &o->requests);
}
curr_value = pm_qos_get_value(o);
+ pm_qos_set_value(o, curr_value);
spin_unlock_irqrestore(&pm_qos_lock, flags);

if (prev_value != curr_value)
@@ -190,18 +205,11 @@ static int find_pm_qos_object_by_minor(int minor)
* pm_qos_request - returns current system wide qos expectation
* @pm_qos_class: identification of which qos value is requested
*
- * This function returns the current target value in an atomic manner.
+ * This function returns the current target value.
*/
int pm_qos_request(int pm_qos_class)
{
- unsigned long flags;
- int value;
-
- spin_lock_irqsave(&pm_qos_lock, flags);
- value = pm_qos_get_value(pm_qos_array[pm_qos_class]);
- spin_unlock_irqrestore(&pm_qos_lock, flags);
-
- return value;
+ return pm_qos_read_value(pm_qos_array[pm_qos_class]);
}
EXPORT_SYMBOL_GPL(pm_qos_request);



2011-02-10 03:46:27

by Andi Kleen

[permalink] [raw]
Subject: Re: [Patch] idle governor: Avoid lock acquisition to read pm_qos before entering idle

> Signed-off-by: Tim Chen <[email protected]>

Patch looks good to me. Thanks Tim.

I expect this will also improve other pm-qos users, but of course they
are likely not as frequently used as the idle governour.

(and before anyone makes jokes about optimizing the idle loop: idle
latency is very important for workloads with lots of wakeups out of
idle)

-Andi

2011-02-10 05:10:53

by mark gross

[permalink] [raw]
Subject: Re: [Patch] idle governor: Avoid lock acquisition to read pm_qos before entering idle

On Wed, Feb 09, 2011 at 05:21:04PM -0800, Tim Chen wrote:
> I noticed that before entering idle state, the menu idle governor will
> look up the current pm_qos value according to the list of qos request
> received. This look up currently needs the acquisition of a lock to go
> down a list of qos requests to find the qos value, slowing down the

wait a second... It gets the target_value (that is an atomic variable)
humpf. I was looking at 2.6.35, where this is true. If you want to put
back a target_value why not put it back the way it was?

> entrance into idle state due to contention by multiple cpus to traverse
> this list. The contention is severe when there are a lot of cpus waking
> and going into idle. For example, for a simple workload that has 32
> pair of processes ping ponging messages to each other, where 64 cores
> cores are active in test system, I see the following profile:
>
> - 37.82% swapper [kernel.kallsyms] [k]
> _raw_spin_lock_irqsave
> - _raw_spin_lock_irqsave
> - 95.65% pm_qos_request
> menu_select
> cpuidle_idle_call
> - cpu_idle
> 99.98% start_secondary
>

I'm surprised by this as the last update to the pm_qos replaced the
lists with a O(1) data structure so there was no more walking of pending
requests.

What is the profile after the patch the Plist should be only one
dereference and an if instruction slower than a cached value.

Does your patch remove the need for the locks because if it doesn't I
don't see how it will make much of a difference?

> Perhaps a better approach will be to cache the updated pm_qos value so
> reading it does not require lock acquisition as in the patch below.

See v2.6.35 for an possible instance of the better approach.

>
> Tim
>
> Signed-off-by: Tim Chen <[email protected]>
> diff --git a/include/linux/pm_qos_params.h b/include/linux/pm_qos_params.h
> index 77cbddb..a7d87f9 100644
> --- a/include/linux/pm_qos_params.h
> +++ b/include/linux/pm_qos_params.h
> @@ -16,6 +16,10 @@
> #define PM_QOS_NUM_CLASSES 4
> #define PM_QOS_DEFAULT_VALUE -1
>
> +#define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
> +#define PM_QOS_NETWORK_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
> +#define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE 0
> +
> struct pm_qos_request_list {
> struct plist_node list;
> int pm_qos_class;
> diff --git a/kernel/pm_qos_params.c b/kernel/pm_qos_params.c
> index aeaa7f8..b6310d1 100644
> --- a/kernel/pm_qos_params.c
> +++ b/kernel/pm_qos_params.c
> @@ -58,6 +58,7 @@ struct pm_qos_object {
> struct blocking_notifier_head *notifiers;
> struct miscdevice pm_qos_power_miscdev;
> char *name;
> + s32 value;
> s32 default_value;
> enum pm_qos_type type;
> };
> @@ -70,7 +71,8 @@ static struct pm_qos_object cpu_dma_pm_qos = {
> .requests = PLIST_HEAD_INIT(cpu_dma_pm_qos.requests, pm_qos_lock),
> .notifiers = &cpu_dma_lat_notifier,
> .name = "cpu_dma_latency",
> - .default_value = 2000 * USEC_PER_SEC,
> + .value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
> + .default_value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
> .type = PM_QOS_MIN,
> };
>
> @@ -79,7 +81,8 @@ static struct pm_qos_object network_lat_pm_qos = {
> .requests = PLIST_HEAD_INIT(network_lat_pm_qos.requests, pm_qos_lock),
> .notifiers = &network_lat_notifier,
> .name = "network_latency",
> - .default_value = 2000 * USEC_PER_SEC,
> + .value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
> + .default_value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
> .type = PM_QOS_MIN
> };
>
> @@ -89,7 +92,8 @@ static struct pm_qos_object network_throughput_pm_qos = {
> .requests = PLIST_HEAD_INIT(network_throughput_pm_qos.requests, pm_qos_lock),
> .notifiers = &network_throughput_notifier,
> .name = "network_throughput",
> - .default_value = 0,
> + .value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
> + .default_value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
> .type = PM_QOS_MAX,
> };
>
> @@ -132,6 +136,16 @@ static inline int pm_qos_get_value(struct pm_qos_object *o)
> }
> }
>
> +static inline s32 pm_qos_read_value(struct pm_qos_object *o)
> +{
> + return o->value;
> +}
> +
> +static inline int pm_qos_set_value(struct pm_qos_object *o, s32 value)
> +{
> + o->value = value;
> +}
> +
> static void update_target(struct pm_qos_object *o, struct plist_node *node,
> int del, int value)
> {
> @@ -156,6 +170,7 @@ static void update_target(struct pm_qos_object *o, struct plist_node *node,
> plist_add(node, &o->requests);
> }
> curr_value = pm_qos_get_value(o);
> + pm_qos_set_value(o, curr_value);
> spin_unlock_irqrestore(&pm_qos_lock, flags);
>
> if (prev_value != curr_value)
> @@ -190,18 +205,11 @@ static int find_pm_qos_object_by_minor(int minor)
> * pm_qos_request - returns current system wide qos expectation
> * @pm_qos_class: identification of which qos value is requested
> *
> - * This function returns the current target value in an atomic manner.
> + * This function returns the current target value.
> */
> int pm_qos_request(int pm_qos_class)
> {
> - unsigned long flags;
> - int value;
> -
> - spin_lock_irqsave(&pm_qos_lock, flags);
> - value = pm_qos_get_value(pm_qos_array[pm_qos_class]);
Hmmm, I was looking at 2.6.35 where we had the return of an
atomic value here.

what happened to the atomic value in the 2.6.35 kernel?

> - spin_unlock_irqrestore(&pm_qos_lock, flags);
> -
> - return value;
> + return pm_qos_read_value(pm_qos_array[pm_qos_class]);

why does this not need a lock? This value is getting updated by
multiple CPU's concurrently It probably should at least be an atomic
read like it used to be in 2.6.35.

it looks to me that your change is really just remove the locks around
the current QoS target. The caching of the target to avoid hitting the
plist is really close to a noop.

--mark
> }
> EXPORT_SYMBOL_GPL(pm_qos_request);
>
>
>

2011-02-10 16:20:25

by Andi Kleen

[permalink] [raw]
Subject: Re: [Patch] idle governor: Avoid lock acquisition to read pm_qos before entering idle

On Wed, Feb 09, 2011 at 09:10:42PM -0800, mark gross wrote:
> On Wed, Feb 09, 2011 at 05:21:04PM -0800, Tim Chen wrote:
> > I noticed that before entering idle state, the menu idle governor will
> > look up the current pm_qos value according to the list of qos request
> > received. This look up currently needs the acquisition of a lock to go
> > down a list of qos requests to find the qos value, slowing down the
>
> wait a second... It gets the target_value (that is an atomic variable)
> humpf. I was looking at 2.6.35, where this is true. If you want to put
> back a target_value why not put it back the way it was?

I don't think the goal is to make the code look like it was before,
just to have clean and scalable code going forwards.

> I'm surprised by this as the last update to the pm_qos replaced the
> lists with a O(1) data structure so there was no more walking of pending
> requests.
>
> What is the profile after the patch the Plist should be only one
> dereference and an if instruction slower than a cached value.

The problem with the plist is that you need to take a lock for reading
the first value. The lock is a performance problem on servers.
The reference doesn't matter at all.

> Does your patch remove the need for the locks because if it doesn't I
> don't see how it will make much of a difference?

The value itself doesn't need a lock, just the list access.

>
> > Perhaps a better approach will be to cache the updated pm_qos value so
> > reading it does not require lock acquisition as in the patch below.
>
> See v2.6.35 for an possible instance of the better approach.

Tim's new code seems simpler and cleaner than what was in .35.

-Andi

2011-02-10 17:26:54

by Tim Chen

[permalink] [raw]
Subject: Re: [Patch] idle governor: Avoid lock acquisition to read pm_qos before entering idle

On Wed, 2011-02-09 at 21:10 -0800, mark gross wrote:

>
> I'm surprised by this as the last update to the pm_qos replaced the
> lists with a O(1) data structure so there was no more walking of pending
> requests.

But you need to acquire a lock before you can read the value on the list
within the function pm_qos_request. This is a problem if there are a
lot of cpus doing so.

>
> What is the profile after the patch the Plist should be only one
> dereference and an if instruction slower than a cached value.

After the patch, the acquisition of the lock on plist go away from the
profile, and I see a 12% improvement in throughput to the message
passing benchmark I was running.

>
> Does your patch remove the need for the locks because if it doesn't I
> don't see how it will make much of a difference?

We still need the lock to update/remove/insert values in the plist and
to update the cached value. The intention of the patch is to avoid lock
acquisition by reading from a cached value that is up to date. Lock
acquisition is needed --every time-- when a cpu go into idle, which is
bad as you want to let a cpu go to idle ASAP.


Tim

2011-02-10 17:45:22

by James Bottomley

[permalink] [raw]
Subject: Re: [Patch] idle governor: Avoid lock acquisition to read pm_qos before entering idle

On Wed, 2011-02-09 at 17:21 -0800, Tim Chen wrote:
> I noticed that before entering idle state, the menu idle governor will
> look up the current pm_qos value according to the list of qos request
> received. This look up currently needs the acquisition of a lock to go
> down a list of qos requests to find the qos value, slowing down the
> entrance into idle state due to contention by multiple cpus to traverse
> this list. The contention is severe when there are a lot of cpus waking
> and going into idle. For example, for a simple workload that has 32
> pair of processes ping ponging messages to each other, where 64 cores
> cores are active in test system, I see the following profile:
>
> - 37.82% swapper [kernel.kallsyms] [k]
> _raw_spin_lock_irqsave
> - _raw_spin_lock_irqsave
> - 95.65% pm_qos_request
> menu_select
> cpuidle_idle_call
> - cpu_idle
> 99.98% start_secondary
>
> Perhaps a better approach will be to cache the updated pm_qos value so
> reading it does not require lock acquisition as in the patch below.
>
> Tim
>
> Signed-off-by: Tim Chen <[email protected]>

What you say is true as long as the value is 32 bits ... perhaps a note
of that should be made somewhere?

Otherwise, it looks like a good lockless optimisation on the read path,
so you can add my ack.

James

2011-02-10 17:56:11

by mark gross

[permalink] [raw]
Subject: Re: [Patch] idle governor: Avoid lock acquisition to read pm_qos before entering idle

On Wed, Feb 09, 2011 at 05:21:04PM -0800, Tim Chen wrote:
> I noticed that before entering idle state, the menu idle governor will
> look up the current pm_qos value according to the list of qos request
> received. This look up currently needs the acquisition of a lock to go
> down a list of qos requests to find the qos value, slowing down the
> entrance into idle state due to contention by multiple cpus to traverse
> this list. The contention is severe when there are a lot of cpus waking
> and going into idle. For example, for a simple workload that has 32
> pair of processes ping ponging messages to each other, where 64 cores
> cores are active in test system, I see the following profile:
>
> - 37.82% swapper [kernel.kallsyms] [k]
> _raw_spin_lock_irqsave
> - _raw_spin_lock_irqsave
> - 95.65% pm_qos_request
> menu_select
> cpuidle_idle_call
> - cpu_idle
> 99.98% start_secondary
>
> Perhaps a better approach will be to cache the updated pm_qos value so
> reading it does not require lock acquisition as in the patch below.
>
> Tim
>
> Signed-off-by: Tim Chen <[email protected]>
> diff --git a/include/linux/pm_qos_params.h b/include/linux/pm_qos_params.h
> index 77cbddb..a7d87f9 100644
> --- a/include/linux/pm_qos_params.h
> +++ b/include/linux/pm_qos_params.h
> @@ -16,6 +16,10 @@
> #define PM_QOS_NUM_CLASSES 4
> #define PM_QOS_DEFAULT_VALUE -1
>
> +#define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
> +#define PM_QOS_NETWORK_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
> +#define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE 0
> +
> struct pm_qos_request_list {
> struct plist_node list;
> int pm_qos_class;
> diff --git a/kernel/pm_qos_params.c b/kernel/pm_qos_params.c
> index aeaa7f8..b6310d1 100644
> --- a/kernel/pm_qos_params.c
> +++ b/kernel/pm_qos_params.c
> @@ -58,6 +58,7 @@ struct pm_qos_object {
> struct blocking_notifier_head *notifiers;
> struct miscdevice pm_qos_power_miscdev;
> char *name;
> + s32 value;
> s32 default_value;
> enum pm_qos_type type;
> };
> @@ -70,7 +71,8 @@ static struct pm_qos_object cpu_dma_pm_qos = {
> .requests = PLIST_HEAD_INIT(cpu_dma_pm_qos.requests, pm_qos_lock),
> .notifiers = &cpu_dma_lat_notifier,
> .name = "cpu_dma_latency",
> - .default_value = 2000 * USEC_PER_SEC,
> + .value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
> + .default_value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
> .type = PM_QOS_MIN,
> };
>
> @@ -79,7 +81,8 @@ static struct pm_qos_object network_lat_pm_qos = {
> .requests = PLIST_HEAD_INIT(network_lat_pm_qos.requests, pm_qos_lock),
> .notifiers = &network_lat_notifier,
> .name = "network_latency",
> - .default_value = 2000 * USEC_PER_SEC,
> + .value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
> + .default_value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
> .type = PM_QOS_MIN
> };
>
> @@ -89,7 +92,8 @@ static struct pm_qos_object network_throughput_pm_qos = {
> .requests = PLIST_HEAD_INIT(network_throughput_pm_qos.requests, pm_qos_lock),
> .notifiers = &network_throughput_notifier,
> .name = "network_throughput",
> - .default_value = 0,
> + .value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
> + .default_value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
> .type = PM_QOS_MAX,
> };
>
> @@ -132,6 +136,16 @@ static inline int pm_qos_get_value(struct pm_qos_object *o)
> }
> }
>
> +static inline s32 pm_qos_read_value(struct pm_qos_object *o)
> +{
> + return o->value;
> +}
> +
> +static inline int pm_qos_set_value(struct pm_qos_object *o, s32 value)
> +{
> + o->value = value;
> +}
> +
> static void update_target(struct pm_qos_object *o, struct plist_node *node,
> int del, int value)
> {
> @@ -156,6 +170,7 @@ static void update_target(struct pm_qos_object *o, struct plist_node *node,
> plist_add(node, &o->requests);
> }
> curr_value = pm_qos_get_value(o);
> + pm_qos_set_value(o, curr_value);
> spin_unlock_irqrestore(&pm_qos_lock, flags);
>
> if (prev_value != curr_value)
> @@ -190,18 +205,11 @@ static int find_pm_qos_object_by_minor(int minor)
> * pm_qos_request - returns current system wide qos expectation
> * @pm_qos_class: identification of which qos value is requested
> *
> - * This function returns the current target value in an atomic manner.
> + * This function returns the current target value.
> */
> int pm_qos_request(int pm_qos_class)
> {
> - unsigned long flags;
> - int value;
> -
> - spin_lock_irqsave(&pm_qos_lock, flags);
> - value = pm_qos_get_value(pm_qos_array[pm_qos_class]);
> - spin_unlock_irqrestore(&pm_qos_lock, flags);
> -
> - return value;
> + return pm_qos_read_value(pm_qos_array[pm_qos_class]);
> }
> EXPORT_SYMBOL_GPL(pm_qos_request);
>
>
Never mind my silliness lastnight. This looks like a good change.

Signed-off-by: mark gross <[email protected]>


Ack

-- mgross

>

2011-02-10 18:50:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Patch] idle governor: Avoid lock acquisition to read pm_qos before entering idle

On Thursday, February 10, 2011, James Bottomley wrote:
> On Wed, 2011-02-09 at 17:21 -0800, Tim Chen wrote:
> > I noticed that before entering idle state, the menu idle governor will
> > look up the current pm_qos value according to the list of qos request
> > received. This look up currently needs the acquisition of a lock to go
> > down a list of qos requests to find the qos value, slowing down the
> > entrance into idle state due to contention by multiple cpus to traverse
> > this list. The contention is severe when there are a lot of cpus waking
> > and going into idle. For example, for a simple workload that has 32
> > pair of processes ping ponging messages to each other, where 64 cores
> > cores are active in test system, I see the following profile:
> >
> > - 37.82% swapper [kernel.kallsyms] [k]
> > _raw_spin_lock_irqsave
> > - _raw_spin_lock_irqsave
> > - 95.65% pm_qos_request
> > menu_select
> > cpuidle_idle_call
> > - cpu_idle
> > 99.98% start_secondary
> >
> > Perhaps a better approach will be to cache the updated pm_qos value so
> > reading it does not require lock acquisition as in the patch below.
> >
> > Tim
> >
> > Signed-off-by: Tim Chen <[email protected]>
>
> What you say is true as long as the value is 32 bits ... perhaps a note
> of that should be made somewhere?
>
> Otherwise, it looks like a good lockless optimisation on the read path,
> so you can add my ack.

Thanks James, I'm going to take the patch.

Rafael

2011-02-10 18:59:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Patch] idle governor: Avoid lock acquisition to read pm_qos before entering idle

On Thursday, February 10, 2011, Andi Kleen wrote:
> > Signed-off-by: Tim Chen <[email protected]>
>
> Patch looks good to me. Thanks Tim.

I'm taking that as an ACK.

Thanks,
Rafael

2011-02-10 19:32:35

by Tim Chen

[permalink] [raw]
Subject: Re: [Patch] idle governor: Avoid lock acquisition to read pm_qos before entering idle

On Thu, 2011-02-10 at 11:45 -0600, James Bottomley wrote:

>
> What you say is true as long as the value is 32 bits ... perhaps a note
> of that should be made somewhere?
>

It seems like pm_qos_add_request, pm_qos_update_request only uses 32
bits value. Did I miss some cases where the target value for
pm_qos_object be non-32 bit under some situation?

Tim

2011-02-10 20:37:51

by James Bottomley

[permalink] [raw]
Subject: Re: [Patch] idle governor: Avoid lock acquisition to read pm_qos before entering idle

On Thu, 2011-02-10 at 11:33 -0800, Tim Chen wrote:
> On Thu, 2011-02-10 at 11:45 -0600, James Bottomley wrote:
>
> >
> > What you say is true as long as the value is 32 bits ... perhaps a note
> > of that should be made somewhere?
> >
>
> It seems like pm_qos_add_request, pm_qos_update_request only uses 32
> bits value. Did I miss some cases where the target value for
> pm_qos_object be non-32 bit under some situation?

No ... I'm thinking of the future. On the face of it, if we needed
larger quantities (like s64), the obvious approach would be to change
s32 to s64 which would fail ... horribly. I was just suggesting a nice
reminder comment for this.

James

2011-02-10 23:17:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Patch] idle governor: Avoid lock acquisition to read pm_qos before entering idle

On Thursday, February 10, 2011, Tim Chen wrote:
> I noticed that before entering idle state, the menu idle governor will
> look up the current pm_qos value according to the list of qos request
> received. This look up currently needs the acquisition of a lock to go
> down a list of qos requests to find the qos value, slowing down the
> entrance into idle state due to contention by multiple cpus to traverse
> this list. The contention is severe when there are a lot of cpus waking
> and going into idle. For example, for a simple workload that has 32
> pair of processes ping ponging messages to each other, where 64 cores
> cores are active in test system, I see the following profile:
>
> - 37.82% swapper [kernel.kallsyms] [k]
> _raw_spin_lock_irqsave
> - _raw_spin_lock_irqsave
> - 95.65% pm_qos_request
> menu_select
> cpuidle_idle_call
> - cpu_idle
> 99.98% start_secondary
>
> Perhaps a better approach will be to cache the updated pm_qos value so
> reading it does not require lock acquisition as in the patch below.
>
> Tim
>
> Signed-off-by: Tim Chen <[email protected]>
> diff --git a/include/linux/pm_qos_params.h b/include/linux/pm_qos_params.h
> index 77cbddb..a7d87f9 100644
> --- a/include/linux/pm_qos_params.h
> +++ b/include/linux/pm_qos_params.h
> @@ -16,6 +16,10 @@
> #define PM_QOS_NUM_CLASSES 4
> #define PM_QOS_DEFAULT_VALUE -1
>
> +#define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
> +#define PM_QOS_NETWORK_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
> +#define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE 0
> +
> struct pm_qos_request_list {
> struct plist_node list;
> int pm_qos_class;
> diff --git a/kernel/pm_qos_params.c b/kernel/pm_qos_params.c
> index aeaa7f8..b6310d1 100644
> --- a/kernel/pm_qos_params.c
> +++ b/kernel/pm_qos_params.c
> @@ -58,6 +58,7 @@ struct pm_qos_object {
> struct blocking_notifier_head *notifiers;
> struct miscdevice pm_qos_power_miscdev;
> char *name;
> + s32 value;
> s32 default_value;
> enum pm_qos_type type;
> };
> @@ -70,7 +71,8 @@ static struct pm_qos_object cpu_dma_pm_qos = {
> .requests = PLIST_HEAD_INIT(cpu_dma_pm_qos.requests, pm_qos_lock),
> .notifiers = &cpu_dma_lat_notifier,
> .name = "cpu_dma_latency",
> - .default_value = 2000 * USEC_PER_SEC,
> + .value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
> + .default_value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
> .type = PM_QOS_MIN,
> };
>
> @@ -79,7 +81,8 @@ static struct pm_qos_object network_lat_pm_qos = {
> .requests = PLIST_HEAD_INIT(network_lat_pm_qos.requests, pm_qos_lock),
> .notifiers = &network_lat_notifier,
> .name = "network_latency",
> - .default_value = 2000 * USEC_PER_SEC,
> + .value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
> + .default_value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
> .type = PM_QOS_MIN
> };
>
> @@ -89,7 +92,8 @@ static struct pm_qos_object network_throughput_pm_qos = {
> .requests = PLIST_HEAD_INIT(network_throughput_pm_qos.requests, pm_qos_lock),
> .notifiers = &network_throughput_notifier,
> .name = "network_throughput",
> - .default_value = 0,
> + .value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
> + .default_value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
> .type = PM_QOS_MAX,
> };
>
> @@ -132,6 +136,16 @@ static inline int pm_qos_get_value(struct pm_qos_object *o)
> }
> }
>
> +static inline s32 pm_qos_read_value(struct pm_qos_object *o)
> +{
> + return o->value;
> +}
> +
> +static inline int pm_qos_set_value(struct pm_qos_object *o, s32 value)
> +{
> + o->value = value;
> +}

This requires a fixup, the above function is supposed to return an int.

Also, please add a comment about the requisite 32-bitness as suggested by James.

Thanks,
Rafael

2011-02-11 00:49:50

by Tim Chen

[permalink] [raw]
Subject: Re: [Patch] idle governor: Avoid lock acquisition to read pm_qos before entering idle

On Fri, 2011-02-11 at 00:17 +0100, Rafael J. Wysocki wrote:

> > +static inline int pm_qos_set_value(struct pm_qos_object *o, s32 value)
> > +{
> > + o->value = value;
> > +}
>
> This requires a fixup, the above function is supposed to return an int.
>
> Also, please add a comment about the requisite 32-bitness as suggested by James.
>
> Thanks,
> Rafael

I've fixed the above function, and also included a comment warning
people not to change target value to 64 bit.

Thanks.

Tim

Signed-off-by: Tim Chen <[email protected]>
diff --git a/include/linux/pm_qos_params.h b/include/linux/pm_qos_params.h
index 77cbddb..a7d87f9 100644
--- a/include/linux/pm_qos_params.h
+++ b/include/linux/pm_qos_params.h
@@ -16,6 +16,10 @@
#define PM_QOS_NUM_CLASSES 4
#define PM_QOS_DEFAULT_VALUE -1

+#define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
+#define PM_QOS_NETWORK_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
+#define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE 0
+
struct pm_qos_request_list {
struct plist_node list;
int pm_qos_class;
diff --git a/kernel/pm_qos_params.c b/kernel/pm_qos_params.c
index aeaa7f8..7b6333b 100644
--- a/kernel/pm_qos_params.c
+++ b/kernel/pm_qos_params.c
@@ -58,6 +58,7 @@ struct pm_qos_object {
struct blocking_notifier_head *notifiers;
struct miscdevice pm_qos_power_miscdev;
char *name;
+ s32 target_value; /* Do not change this to 64 bit */
s32 default_value;
enum pm_qos_type type;
};
@@ -70,7 +71,8 @@ static struct pm_qos_object cpu_dma_pm_qos = {
.requests = PLIST_HEAD_INIT(cpu_dma_pm_qos.requests, pm_qos_lock),
.notifiers = &cpu_dma_lat_notifier,
.name = "cpu_dma_latency",
- .default_value = 2000 * USEC_PER_SEC,
+ .target_value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
+ .default_value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
.type = PM_QOS_MIN,
};

@@ -79,7 +81,8 @@ static struct pm_qos_object network_lat_pm_qos = {
.requests = PLIST_HEAD_INIT(network_lat_pm_qos.requests, pm_qos_lock),
.notifiers = &network_lat_notifier,
.name = "network_latency",
- .default_value = 2000 * USEC_PER_SEC,
+ .target_value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
+ .default_value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
.type = PM_QOS_MIN
};

@@ -89,7 +92,8 @@ static struct pm_qos_object network_throughput_pm_qos = {
.requests = PLIST_HEAD_INIT(network_throughput_pm_qos.requests, pm_qos_lock),
.notifiers = &network_throughput_notifier,
.name = "network_throughput",
- .default_value = 0,
+ .target_value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
+ .default_value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
.type = PM_QOS_MAX,
};

@@ -132,6 +136,16 @@ static inline int pm_qos_get_value(struct pm_qos_object *o)
}
}

+static inline s32 pm_qos_read_value(struct pm_qos_object *o)
+{
+ return o->target_value;
+}
+
+static inline void pm_qos_set_value(struct pm_qos_object *o, s32 value)
+{
+ o->value = target_value;
+}
+
static void update_target(struct pm_qos_object *o, struct plist_node *node,
int del, int value)
{
@@ -156,6 +170,7 @@ static void update_target(struct pm_qos_object *o, struct plist_node *node,
plist_add(node, &o->requests);
}
curr_value = pm_qos_get_value(o);
+ pm_qos_set_value(o, curr_value);
spin_unlock_irqrestore(&pm_qos_lock, flags);

if (prev_value != curr_value)
@@ -190,18 +205,11 @@ static int find_pm_qos_object_by_minor(int minor)
* pm_qos_request - returns current system wide qos expectation
* @pm_qos_class: identification of which qos value is requested
*
- * This function returns the current target value in an atomic manner.
+ * This function returns the current target value.
*/
int pm_qos_request(int pm_qos_class)
{
- unsigned long flags;
- int value;
-
- spin_lock_irqsave(&pm_qos_lock, flags);
- value = pm_qos_get_value(pm_qos_array[pm_qos_class]);
- spin_unlock_irqrestore(&pm_qos_lock, flags);
-
- return value;
+ return pm_qos_read_value(pm_qos_array[pm_qos_class]);
}
EXPORT_SYMBOL_GPL(pm_qos_request);


2011-02-11 19:23:04

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Patch] idle governor: Avoid lock acquisition to read pm_qos before entering idle

On Friday, February 11, 2011, Tim Chen wrote:
> On Fri, 2011-02-11 at 00:17 +0100, Rafael J. Wysocki wrote:
>
> > > +static inline int pm_qos_set_value(struct pm_qos_object *o, s32 value)
> > > +{
> > > + o->value = value;
> > > +}
> >
> > This requires a fixup, the above function is supposed to return an int.
> >
> > Also, please add a comment about the requisite 32-bitness as suggested by James.
> >
> > Thanks,
> > Rafael
>
> I've fixed the above function, and also included a comment warning
> people not to change target value to 64 bit.

Thanks, but I'd put that comment before the structure definition with the
"NOTE:" prefix and I think you should explain the reason why that field
shouldn't be changed to 64-bits.

Thanks,
Rafael


> Signed-off-by: Tim Chen <[email protected]>
> diff --git a/include/linux/pm_qos_params.h b/include/linux/pm_qos_params.h
> index 77cbddb..a7d87f9 100644
> --- a/include/linux/pm_qos_params.h
> +++ b/include/linux/pm_qos_params.h
> @@ -16,6 +16,10 @@
> #define PM_QOS_NUM_CLASSES 4
> #define PM_QOS_DEFAULT_VALUE -1
>
> +#define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
> +#define PM_QOS_NETWORK_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC)
> +#define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE 0
> +
> struct pm_qos_request_list {
> struct plist_node list;
> int pm_qos_class;
> diff --git a/kernel/pm_qos_params.c b/kernel/pm_qos_params.c
> index aeaa7f8..7b6333b 100644
> --- a/kernel/pm_qos_params.c
> +++ b/kernel/pm_qos_params.c
> @@ -58,6 +58,7 @@ struct pm_qos_object {
> struct blocking_notifier_head *notifiers;
> struct miscdevice pm_qos_power_miscdev;
> char *name;
> + s32 target_value; /* Do not change this to 64 bit */
> s32 default_value;
> enum pm_qos_type type;
> };
> @@ -70,7 +71,8 @@ static struct pm_qos_object cpu_dma_pm_qos = {
> .requests = PLIST_HEAD_INIT(cpu_dma_pm_qos.requests, pm_qos_lock),
> .notifiers = &cpu_dma_lat_notifier,
> .name = "cpu_dma_latency",
> - .default_value = 2000 * USEC_PER_SEC,
> + .target_value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
> + .default_value = PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE,
> .type = PM_QOS_MIN,
> };
>
> @@ -79,7 +81,8 @@ static struct pm_qos_object network_lat_pm_qos = {
> .requests = PLIST_HEAD_INIT(network_lat_pm_qos.requests, pm_qos_lock),
> .notifiers = &network_lat_notifier,
> .name = "network_latency",
> - .default_value = 2000 * USEC_PER_SEC,
> + .target_value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
> + .default_value = PM_QOS_NETWORK_LAT_DEFAULT_VALUE,
> .type = PM_QOS_MIN
> };
>
> @@ -89,7 +92,8 @@ static struct pm_qos_object network_throughput_pm_qos = {
> .requests = PLIST_HEAD_INIT(network_throughput_pm_qos.requests, pm_qos_lock),
> .notifiers = &network_throughput_notifier,
> .name = "network_throughput",
> - .default_value = 0,
> + .target_value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
> + .default_value = PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE,
> .type = PM_QOS_MAX,
> };
>
> @@ -132,6 +136,16 @@ static inline int pm_qos_get_value(struct pm_qos_object *o)
> }
> }
>
> +static inline s32 pm_qos_read_value(struct pm_qos_object *o)
> +{
> + return o->target_value;
> +}
> +
> +static inline void pm_qos_set_value(struct pm_qos_object *o, s32 value)
> +{
> + o->value = target_value;
> +}
> +
> static void update_target(struct pm_qos_object *o, struct plist_node *node,
> int del, int value)
> {
> @@ -156,6 +170,7 @@ static void update_target(struct pm_qos_object *o, struct plist_node *node,
> plist_add(node, &o->requests);
> }
> curr_value = pm_qos_get_value(o);
> + pm_qos_set_value(o, curr_value);
> spin_unlock_irqrestore(&pm_qos_lock, flags);
>
> if (prev_value != curr_value)
> @@ -190,18 +205,11 @@ static int find_pm_qos_object_by_minor(int minor)
> * pm_qos_request - returns current system wide qos expectation
> * @pm_qos_class: identification of which qos value is requested
> *
> - * This function returns the current target value in an atomic manner.
> + * This function returns the current target value.
> */
> int pm_qos_request(int pm_qos_class)
> {
> - unsigned long flags;
> - int value;
> -
> - spin_lock_irqsave(&pm_qos_lock, flags);
> - value = pm_qos_get_value(pm_qos_array[pm_qos_class]);
> - spin_unlock_irqrestore(&pm_qos_lock, flags);
> -
> - return value;
> + return pm_qos_read_value(pm_qos_array[pm_qos_class]);
> }
> EXPORT_SYMBOL_GPL(pm_qos_request);
>
>
>
>
>

2011-02-11 19:27:40

by Tim Chen

[permalink] [raw]
Subject: Re: [Patch] idle governor: Avoid lock acquisition to read pm_qos before entering idle

On Fri, 2011-02-11 at 20:22 +0100, Rafael J. Wysocki wrote:

>
> Thanks, but I'd put that comment before the structure definition with the
> "NOTE:" prefix and I think you should explain the reason why that field
> shouldn't be changed to 64-bits.
>

Either you or James want to suggest a comment on why the field shouldn't
be changed to 64-bits to be placed there?

Tim

2011-02-11 19:33:44

by James Bottomley

[permalink] [raw]
Subject: Re: [Patch] idle governor: Avoid lock acquisition to read pm_qos before entering idle

On Fri, 2011-02-11 at 11:27 -0800, Tim Chen wrote:
> On Fri, 2011-02-11 at 20:22 +0100, Rafael J. Wysocki wrote:
>
> >
> > Thanks, but I'd put that comment before the structure definition with the
> > "NOTE:" prefix and I think you should explain the reason why that field
> > shouldn't be changed to 64-bits.
> >
>
> Either you or James want to suggest a comment on why the field shouldn't
> be changed to 64-bits to be placed there?

/*
* Note: The lockless read read path depends on the CPU accessing
* target_value atomically. Atomic access is only guaranteed on all CPU
* types linux supports for 32 bit quantites
*/

James

2011-02-11 20:03:32

by Tim Chen

[permalink] [raw]
Subject: Re: [Patch] idle governor: Avoid lock acquisition to read pm_qos before entering idle

On Fri, 2011-02-11 at 13:33 -0600, James Bottomley wrote:
> On Fri, 2011-02-11 at 11:27 -0800, Tim Chen wrote:
> > On Fri, 2011-02-11 at 20:22 +0100, Rafael J. Wysocki wrote:
> >
> > >
> > > Thanks, but I'd put that comment before the structure definition with the
> > > "NOTE:" prefix and I think you should explain the reason why that field
> > > shouldn't be changed to 64-bits.
> > >
> >
> > Either you or James want to suggest a comment on why the field shouldn't
> > be changed to 64-bits to be placed there?
>
> /*
> * Note: The lockless read read path depends on the CPU accessing
> * target_value atomically. Atomic access is only guaranteed on all CPU
> * types linux supports for 32 bit quantites
> */
>
> James
>
>

Thanks. I will incorporate your suggested comment in an updated patch.

Tim