2006-08-24 17:41:43

by Arjan van de Ven

[permalink] [raw]
Subject: [RFC] maximum latency tracking infrastructure

Subject: [RFC] maximum latency tracking infrastructure
From: Arjan van de Ven <[email protected]>

The patch below adds infrastructure to track "maximum allowable latency" for power
saving policies.

The reason for adding this infrastructure is that power management in the
idle loop needs to make a tradeoff between latency and power savings (deeper
power save modes have a longer latency to running code again).
The code that today makes this tradeoff just does a rather simple algorithm;
however this is not good enough: There are devices and use cases where a
lower latency is required than that the higher power saving states provide.
An example would be audio playback, but another example is the ipw2100
wireless driver that right now has a very direct and ugly acpi hook to
disable some higher power states randomly when it gets certain types of
error.

The proposed solution is to have an interface where drivers can
* announce the maximum latency (in microseconds) that they can deal with
* modify this latency
* give up their constraint
and a function where the code that decides on power saving strategy can query
the current global desired maximum.

This patch has a user of each side: on the consumer side, ACPI is patched to use this,
on the producer side the ipw2100 driver is patched.

A generic maximum latency is also registered of 2 timer ticks (more and you
lose accurate time tracking after all).

While the existing users of the patch are x86 specific, the infrastructure
is not. I'd like to ask the arch maintainers of other architectures if the
infrastructure is generic enough for their use (assuming the architecture
has such a tradeoff as concept at all), and the sound/multimedia driver
owners to look at the driver facing API to see if this is something they can
use.

A sysrq key is registered to dump the list of registered latencies so that
bugreports about too high latency / too high power usage can be analyzed.

Signed-off-by: Arjan van de Ven <[email protected]>

---
drivers/acpi/processor_idle.c | 8 +
drivers/net/wireless/ipw2100.c | 10 +
include/linux/latency.h | 18 +++
kernel/Makefile | 2
kernel/latency.c | 245 +++++++++++++++++++++++++++++++++++++++++
5 files changed, 280 insertions(+), 3 deletions(-)

Index: linux-2.6.18-rc4-latency/drivers/acpi/processor_idle.c
===================================================================
--- linux-2.6.18-rc4-latency.orig/drivers/acpi/processor_idle.c
+++ linux-2.6.18-rc4-latency/drivers/acpi/processor_idle.c
@@ -38,6 +38,7 @@
#include <linux/dmi.h>
#include <linux/moduleparam.h>
#include <linux/sched.h> /* need_resched() */
+#include <linux/latency.h>

#include <asm/io.h>
#include <asm/uaccess.h>
@@ -453,7 +454,8 @@ static void acpi_processor_idle(void)
*/
if (cx->promotion.state &&
((cx->promotion.state - pr->power.states) <= max_cstate)) {
- if (sleep_ticks > cx->promotion.threshold.ticks) {
+ if (sleep_ticks > cx->promotion.threshold.ticks &&
+ cx->promotion.state->latency <= get_acceptable_latency()) {
cx->promotion.count++;
cx->demotion.count = 0;
if (cx->promotion.count >=
@@ -494,8 +496,10 @@ static void acpi_processor_idle(void)
end:
/*
* Demote if current state exceeds max_cstate
+ * or if the latency of the current state is unacceptable
*/
- if ((pr->power.state - pr->power.states) > max_cstate) {
+ if ((pr->power.state - pr->power.states) > max_cstate ||
+ pr->power.state->latency > get_acceptable_latency()) {
if (cx->demotion.state)
next_state = cx->demotion.state;
}
Index: linux-2.6.18-rc4-latency/include/linux/latency.h
===================================================================
--- /dev/null
+++ linux-2.6.18-rc4-latency/include/linux/latency.h
@@ -0,0 +1,18 @@
+/*
+ * latency.h: Explicit system-wise latency-expectation infrastructure
+ *
+ * (C) Copyright 2006 Intel Corporation
+ * Author: Arjan van de Ven <[email protected]>
+ *
+ */
+
+#ifdef __KERNEL__
+
+void set_acceptable_latency(char *identifier, int usecs);
+void modify_acceptable_latency(char *identifier, int usecs);
+void remove_acceptable_latency(char *identifier);
+int get_acceptable_latency(void);
+
+#define INFINITE_LATENCY 1000000
+
+#endif
Index: linux-2.6.18-rc4-latency/kernel/Makefile
===================================================================
--- linux-2.6.18-rc4-latency.orig/kernel/Makefile
+++ linux-2.6.18-rc4-latency/kernel/Makefile
@@ -8,7 +8,7 @@ obj-y = sched.o fork.o exec_domain.o
signal.o sys.o kmod.o workqueue.o pid.o \
rcupdate.o extable.o params.o posix-timers.o \
kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
- hrtimer.o rwsem.o
+ hrtimer.o rwsem.o latency.o

obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-y += time/
Index: linux-2.6.18-rc4-latency/kernel/latency.c
===================================================================
--- /dev/null
+++ linux-2.6.18-rc4-latency/kernel/latency.c
@@ -0,0 +1,245 @@
+/*
+ * latency.c: Explicit system-wise latency-expectation infrastructure
+ *
+ * The purpose of this infrastructure is to allow device drivers to set
+ * latency requirements they have and to collect and summarize these
+ * expectations globally. The cummulated result can then be used by
+ * power management and similar users to make decisions that have
+ * tradoffs with a latency component.
+ *
+ * An example user of this are the x86 C-states; each higher C state saves
+ * more power, but has a higher exit latency. For the idle loop power
+ * code to make a good decision which C-state to use, information about
+ * acceptable latencies is required.
+ *
+ * An example announcer of latency is an audio driver that knowns it
+ * will get an interrupt when the hardware has 200 usec of samples
+ * left in the DMA buffer; in that case the driver can set a latency
+ * requirement of, say, 150 usec.
+ *
+ * Multiple drivers can each announce their maximum accepted latency,
+ * to keep these appart, a string based identifier is used.
+ *
+ *
+ * (C) Copyright 2006 Intel Corporation
+ * Author: Arjan van de Ven <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/latency.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/sysrq.h>
+#include <asm/atomic.h>
+
+struct latency_info {
+ struct list_head list;
+ int usecs;
+ char *identifier;
+};
+
+static atomic_t current_max_latency;
+static DEFINE_SPINLOCK(latency_lock);
+
+static LIST_HEAD(latency_list);
+
+
+
+/*
+ * This function returns the maximum latency allowed, which
+ * happens to be the minimum of all maximum latencies on the
+ * list.
+ */
+static int __find_max_latency(void)
+{
+ int max = INFINITE_LATENCY;
+ struct latency_info *info;
+ list_for_each_entry(info, &latency_list, list) {
+ if (info->usecs < max) /* new minimum */
+ max = info->usecs;
+ }
+ return max;
+}
+
+/**
+ * set_acceptable_latency - sets the maximum latency acceptable
+ * @identifier: string that identifies this driver
+ * @usecs: maximum acceptable latency for this driver
+ *
+ * This function informs the kernel that this device(driver)
+ * can accept at most usecs latency. This setting is used for
+ * power management and similar tradeoffs.
+ *
+ * This function sleeps and can only be called from process
+ * context.
+ * Calling this function with an existing identifier is valid
+ * and will cause the existing latency setting to be changed.
+ */
+void set_acceptable_latency(char *identifier, int usecs)
+{
+ struct latency_info *info, *iter;
+ unsigned long flags;
+ int found_old = 0;
+ int newmax;
+
+ info = kmalloc(sizeof(struct latency_info), GFP_KERNEL);
+ if (!info)
+ return;
+ memset(info, 0, sizeof(struct latency_info));
+ info->usecs = usecs;
+ info->identifier = kstrdup(identifier, GFP_KERNEL);
+ if (!info->identifier)
+ goto free_info;
+
+ spin_lock_irqsave(&latency_lock, flags);
+ list_for_each_entry(iter, &latency_list, list) {
+ if (strcmp(iter->identifier, identifier)==0) {
+ found_old = 1;
+ iter->usecs = usecs;
+ }
+ }
+ if (!found_old)
+ list_add(&info->list, &latency_list);
+
+ newmax = __find_max_latency();
+ atomic_set(&current_max_latency, newmax);
+
+ spin_unlock_irqrestore(&latency_lock, flags);
+
+ /* if we inserted the new one, we're done; otherwise there was
+ * an existing one so we need to free the redundant data
+ */
+ if (!found_old)
+ return;
+
+ kfree(info->identifier);
+free_info:
+ kfree(info);
+}
+EXPORT_SYMBOL_GPL(set_acceptable_latency);
+
+/**
+ * modify_acceptable_latency - changes the maximum latency acceptable
+ * @identifier: string that identifies this driver
+ * @usecs: maximum acceptable latency for this driver
+ *
+ * This function informs the kernel that this device(driver)
+ * can accept at most usecs latency. This setting is used for
+ * power management and similar tradeoffs.
+ *
+ * This function does not sleep and can be called in any context.
+ * Trying to use a non-existing identifier silently gets ignored.
+ */
+void modify_acceptable_latency(char *identifier, int usecs)
+{
+ struct latency_info *iter;
+ unsigned long flags;
+ int newmax;
+
+ spin_lock_irqsave(&latency_lock, flags);
+ list_for_each_entry(iter, &latency_list, list) {
+ if (strcmp(iter->identifier, identifier)==0)
+ iter->usecs = usecs;
+ }
+ newmax = __find_max_latency();
+ atomic_set(&current_max_latency, newmax);
+ spin_unlock_irqrestore(&latency_lock, flags);
+}
+EXPORT_SYMBOL_GPL(modify_acceptable_latency);
+
+/*
+ * remove_acceptable_latency - removes the maximum latency acceptable
+ * @identifier: string that identifies this driver
+ *
+ * This function removes a previously set maximum latency setting
+ * for the driver and frees up any resources associated with the
+ * bookkeeping needed for this.
+ *
+ * This function does not sleep and can be called in any context.
+ * Trying to use a non-existing identifier silently gets ignored.
+ */
+
+void remove_acceptable_latency(char *identifier)
+{
+ unsigned long flags;
+ int newmax = 0;
+ struct latency_info *iter, *temp;
+
+ spin_lock_irqsave(&latency_lock, flags);
+
+ list_for_each_entry_safe(iter, temp, &latency_list, list) {
+ if (strcmp(iter->identifier, identifier)==0)
+ list_del(&iter->list);
+ newmax = iter->usecs;
+ kfree(iter->identifier);
+ kfree(iter);
+ }
+
+ /* If we just deleted the system wide value, we need to
+ * recalculate
+ */
+ if (newmax == atomic_read(&current_max_latency)) {
+ newmax = __find_max_latency();
+ atomic_set(&current_max_latency, newmax);
+ }
+ spin_unlock_irqrestore(&latency_lock, flags);
+}
+EXPORT_SYMBOL_GPL(remove_acceptable_latency);
+
+/*
+ * get_acceptable_latency - queries the system wide latency maximum
+ *
+ * This function returns the system wide maximum latency in
+ * microseconds.
+ *
+ * This function does not sleep and can be called in any context.
+ */
+int get_acceptable_latency(void)
+{
+ return atomic_read(&current_max_latency);
+}
+EXPORT_SYMBOL_GPL(get_acceptable_latency);
+
+
+#ifdef CONFIG_MAGIC_SYSRQ
+
+static void sysrq_handle_latlist(int key, struct pt_regs *pt_regs,
+ struct tty_struct *tty)
+{
+ unsigned long flags;
+ struct latency_info *info;
+
+ spin_lock_irqsave(&latency_lock, flags);
+ printk(KERN_INFO "Latency restrictions list\n");
+ printk(KERN_INFO "-------------------------\n");
+ printk(KERN_INFO "Current minimum\t: %i\n", get_acceptable_latency());
+ list_for_each_entry(info, &latency_list, list) {
+ printk(KERN_INFO "%s\t\t: %i\n", info->identifier, info->usecs);
+ }
+ printk(KERN_INFO "-------------------------\n");
+ spin_unlock_irqrestore(&latency_lock, flags);
+}
+static struct sysrq_key_op sysrq_latlist_op = {
+ .handler = sysrq_handle_latlist,
+ .help_msg = "Latencylist",
+ .action_msg = "Printing latency list",
+};
+#endif
+
+static int latency_init(void)
+{
+ register_sysrq_key('l', &sysrq_latlist_op);
+ /* we don't want by default to have longer latencies than 2 ticks,
+ * since that would cause lost ticks
+ */
+ set_acceptable_latency("kernel", 2*1000000/HZ);
+ return 0;
+}
+
+module_init(latency_init);
Index: linux-2.6.18-rc4-latency/drivers/net/wireless/ipw2100.c
===================================================================
--- linux-2.6.18-rc4-latency.orig/drivers/net/wireless/ipw2100.c
+++ linux-2.6.18-rc4-latency/drivers/net/wireless/ipw2100.c
@@ -163,6 +163,7 @@ that only one external action is invoked
#include <linux/firmware.h>
#include <linux/acpi.h>
#include <linux/ctype.h>
+#include <linux/latency.h>

#include "ipw2100.h"

@@ -1697,6 +1698,11 @@ static int ipw2100_up(struct ipw2100_pri
return 0;
}

+ /* the ipw2100 hardware really doesn't want power management delays
+ * longer than 500usec
+ */
+ modify_acceptable_latency("ipw2100", 500);
+
/* If the interrupt is enabled, turn it off... */
spin_lock_irqsave(&priv->low_lock, flags);
ipw2100_disable_interrupts(priv);
@@ -1849,6 +1855,8 @@ static void ipw2100_down(struct ipw2100_
ipw2100_disable_interrupts(priv);
spin_unlock_irqrestore(&priv->low_lock, flags);

+ modify_acceptable_latency("ipw2100", INFINITE_LATENCY);
+
#ifdef ACPI_CSTATE_LIMIT_DEFINED
if (priv->config & CFG_C3_DISABLED) {
IPW_DEBUG_INFO(": Resetting C3 transitions.\n");
@@ -6533,6 +6541,7 @@ static int __init ipw2100_init(void)

ret = pci_module_init(&ipw2100_pci_driver);

+ set_acceptable_latency("ipw2100", INFINITE_LATENCY);
#ifdef CONFIG_IPW2100_DEBUG
ipw2100_debug_level = debug;
driver_create_file(&ipw2100_pci_driver.driver,
@@ -6553,6 +6562,7 @@ static void __exit ipw2100_exit(void)
&driver_attr_debug_level);
#endif
pci_unregister_driver(&ipw2100_pci_driver);
+ remove_acceptable_latency("ipw2100");
}

module_init(ipw2100_init);


2006-08-24 19:16:39

by Brown, Len

[permalink] [raw]
Subject: Re: [RFC] maximum latency tracking infrastructure

On Thursday 24 August 2006 13:41, Arjan van de Ven wrote:
> Subject: [RFC] maximum latency tracking infrastructure
> From: Arjan van de Ven <[email protected]>
>
> The patch below adds infrastructure to track "maximum allowable latency" for power
> saving policies.

I like it.
Stating the constraints is much nicer than today's hack where
ipw2100 reaches into ACPI and disables C3 when it notices
that DMA overflows.

everything in usecs -- good.

-Len

2006-08-24 21:07:34

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC] maximum latency tracking infrastructure

On Thursday, August 24, 2006 10:41 am, Arjan van de Ven wrote:
> The reason for adding this infrastructure is that power management in
> the idle loop needs to make a tradeoff between latency and power
> savings (deeper power save modes have a longer latency to running code
> again).

What if a processor was already in a sleep state when a call to
set_acceptable_latency() latency occurs? Should there be a callback so
they can be woken up? A callback would also allow ACPI to tell the
user "disabling C3 because of device <foo>" or somesuch, which might be
nice.

Also, should subsystems have the ability to set a lower bound on
latency? That would mean set_acceptable_latency() could fail,
indicating that the user should buy a better device or a system with
better realtime guarantees, which is also valuable info.

Comments aside, this is a nice interface, should help clarify things for
devices with response time limits.

Thanks,
Jesse

2006-08-24 21:20:28

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [RFC] maximum latency tracking infrastructure

Jesse Barnes wrote:
> On Thursday, August 24, 2006 10:41 am, Arjan van de Ven wrote:
>> The reason for adding this infrastructure is that power management in
>> the idle loop needs to make a tradeoff between latency and power
>> savings (deeper power save modes have a longer latency to running code
>> again).
>
> What if a processor was already in a sleep state when a call to
> set_acceptable_latency() latency occurs?

there's nothing sane that can be done in that case; any wake up already will cause the unwanted latency!
A premature wakeup is only making it happen *now*, but now is as inconvenient a time as any...
(in fact it may be a worst case time scenario, say, an audio interrupt...)

> Should there be a callback so
> they can be woken up? A callback would also allow ACPI to tell the
> user "disabling C3 because of device <foo>" or somesuch, which might be
> nice.

printk'ing would be evil, changes like this will be "semi frequent", like every time you start
or stop playing audio. What ACPI could easily do is indicate in /proc/acpi/processor/*/power
that a state will not be reachable because it violates the latency constraints. That would
be entirely reasonable.

> Also, should subsystems have the ability to set a lower bound on
> latency? That would mean set_acceptable_latency() could fail,
> indicating that the user should buy a better device or a system with
> better realtime guarantees, which is also valuable info.

While it's valuable info.. there is nothing you can DO about it...
While the kernel can even do a latency of 1us by just not going into C1 even... so the kernel
CAN honor it, even if it thinks it might not be a good idea. Can you give a more concrete example
of a situation where you think your idea would be useful?

Greetings,
Arjan van de Ven

2006-08-24 21:29:20

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC] maximum latency tracking infrastructure

On Thursday, August 24, 2006 2:20 pm, Arjan van de Ven wrote:
> Jesse Barnes wrote:
> > On Thursday, August 24, 2006 10:41 am, Arjan van de Ven wrote:
> >> The reason for adding this infrastructure is that power management
> >> in the idle loop needs to make a tradeoff between latency and power
> >> savings (deeper power save modes have a longer latency to running
> >> code again).
> >
> > What if a processor was already in a sleep state when a call to
> > set_acceptable_latency() latency occurs?
>
> there's nothing sane that can be done in that case; any wake up
> already will cause the unwanted latency! A premature wakeup is only
> making it happen *now*, but now is as inconvenient a time as any...
> (in fact it may be a worst case time scenario, say, an audio
> interrupt...)

Depends on what's going on. What if you have a two socket machine, and
one CPU is in C3 when the latency setting occurs? Shouldn't you wake it
up and prevent it from going that deep again? But you're right, you
won't necessarily improve anything...

>
> > Should there be a callback so
> > they can be woken up? A callback would also allow ACPI to tell the
> > user "disabling C3 because of device <foo>" or somesuch, which might
> > be nice.
>
> printk'ing would be evil, changes like this will be "semi frequent",
> like every time you start or stop playing audio. What ACPI could
> easily do is indicate in /proc/acpi/processor/*/power that a state
> will not be reachable because it violates the latency constraints.
> That would be entirely reasonable.

Right, that's the idea. A printk may be overkill but some kind of
notification might be nice, for similar reasons to the below.

> > Also, should subsystems have the ability to set a lower bound on
> > latency? That would mean set_acceptable_latency() could fail,
> > indicating that the user should buy a better device or a system with
> > better realtime guarantees, which is also valuable info.
>
> While it's valuable info.. there is nothing you can DO about it...
> While the kernel can even do a latency of 1us by just not going into
> C1 even... so the kernel CAN honor it, even if it thinks it might not
> be a good idea. Can you give a more concrete example of a situation
> where you think your idea would be useful?

Well, I was imagining a scenario where you didn't really want to disallow
C3 for whatever reason (maybe some standard you're following requires
it), so your minimum latency would be N usec. You'd definitely want to
know about any device that required < N usecs of latency (at boot or
driver init time) so that you'd know you had a bad device or system.

Jesse

2006-08-24 21:50:37

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC] maximum latency tracking infrastructure

Jesse Barnes <[email protected]> writes:

> On Thursday, August 24, 2006 2:20 pm, Arjan van de Ven wrote:
> > Jesse Barnes wrote:
> > > On Thursday, August 24, 2006 10:41 am, Arjan van de Ven wrote:
> > >> The reason for adding this infrastructure is that power management
> > >> in the idle loop needs to make a tradeoff between latency and power
> > >> savings (deeper power save modes have a longer latency to running
> > >> code again).
> > >
> > > What if a processor was already in a sleep state when a call to
> > > set_acceptable_latency() latency occurs?
> >
> > there's nothing sane that can be done in that case; any wake up
> > already will cause the unwanted latency! A premature wakeup is only
> > making it happen *now*, but now is as inconvenient a time as any...
> > (in fact it may be a worst case time scenario, say, an audio
> > interrupt...)
>
> Depends on what's going on. What if you have a two socket machine, and
> one CPU is in C3 when the latency setting occurs?

I didn't think there were currently any multi socket machines with C3
support? The best you get is dual core.

> Shouldn't you wake it
> up and prevent it from going that deep again? But you're right, you
> won't necessarily improve anything...

Generally there are so many events that wake up CPUs that the case is pretty
academic -- all CPUs will eventually wake up in a reasonable time
(before your driver initialization finished likely) and then follow
the new latency settings.

Maybe at some point if all the idle breaking events in Linux have been
fixed up it might be a problem, but I think that's a long time off.

-Andi

2006-08-24 21:52:32

by Daniel Walker

[permalink] [raw]
Subject: Re: [RFC] maximum latency tracking infrastructure

On Thu, 2006-08-24 at 19:41 +0200, Arjan van de Ven wrote:
> Subject: [RFC] maximum latency tracking infrastructure
> From: Arjan van de Ven <[email protected]>
>
> The patch below adds infrastructure to track "maximum allowable latency" for power
> saving policies.
>
> The reason for adding this infrastructure is that power management in the
> idle loop needs to make a tradeoff between latency and power savings (deeper
> power save modes have a longer latency to running code again).
> The code that today makes this tradeoff just does a rather simple algorithm;

I was just thinking that it might be cleaner to register a structure
instead of tracking identifiers to usecs. You might get a speed up on
some of the operations, like unregister.

Another thing I was thinking about is that this seems somewhat contrary
to the idea of using dynamic tick (assuming it was in mainline) to
heuristically pick a power state. Do you have any thoughts on how you
would combine the two?

Daniel

2006-08-24 21:57:43

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [RFC] maximum latency tracking infrastructure

On Thu, 2006-08-24 at 14:52 -0700, Daniel Walker wrote:
> On Thu, 2006-08-24 at 19:41 +0200, Arjan van de Ven wrote:
> > Subject: [RFC] maximum latency tracking infrastructure
> > From: Arjan van de Ven <[email protected]>
> >
> > The patch below adds infrastructure to track "maximum allowable latency" for power
> > saving policies.
> >
> > The reason for adding this infrastructure is that power management in the
> > idle loop needs to make a tradeoff between latency and power savings (deeper
> > power save modes have a longer latency to running code again).
> > The code that today makes this tradeoff just does a rather simple algorithm;
>
> I was just thinking that it might be cleaner to register a structure
> instead of tracking identifiers to usecs. You might get a speed up on
> some of the operations, like unregister.

it makes things a lot more complex for both the user and the
infrastructure though, and I doubt it's going to be a performance gain;
you need to walk all registered items anyway to decide the new minimum
value if you unregister one for example.


> Another thing I was thinking about is that this seems somewhat contrary
> to the idea of using dynamic tick (assuming it was in mainline) to
> heuristically pick a power state. Do you have any thoughts on how you
> would combine the two?

Actually it's designed in part FOR this case!
So how that will work (thought experiment, I don't have the code yet)

In idle, determine the time the next scheduled event is.
Then given that time go over the C-states and pick the deepest C-state
that
1) satisfies the requested latency
2) has a latency that is a small enough fraction of the total time

(2 is needed to not pick a 1 msec-latency C state for a 1ms idle, that
won't save you power most likely, so you need to have enough time in
"real" idle)

so when you know your latency requirements, you now can pick a DEEPER
sleepstate than you could before (or at least the right one)... dynticks
needs this more than anything :)

2006-08-24 22:16:37

by Daniel Walker

[permalink] [raw]
Subject: Re: [RFC] maximum latency tracking infrastructure

On Thu, 2006-08-24 at 23:57 +0200, Arjan van de Ven wrote:
> > I was just thinking that it might be cleaner to register a structure
> > instead of tracking identifiers to usecs. You might get a speed up on
> > some of the operations, like unregister.
>
> it makes things a lot more complex for both the user and the
> infrastructure though, and I doubt it's going to be a performance gain;
> you need to walk all registered items anyway to decide the new minimum
> value if you unregister one for example.

Might be time for a priority list (lib/plist.c), but that might be like
swatting a gnat with a sledge hammer.

> > Another thing I was thinking about is that this seems somewhat contrary
> > to the idea of using dynamic tick (assuming it was in mainline) to
> > heuristically pick a power state. Do you have any thoughts on how you
> > would combine the two?
>
> Actually it's designed in part FOR this case!
> So how that will work (thought experiment, I don't have the code yet)
>
> In idle, determine the time the next scheduled event is.
> Then given that time go over the C-states and pick the deepest C-state
> that
> 1) satisfies the requested latency
> 2) has a latency that is a small enough fraction of the total time
>
> (2 is needed to not pick a 1 msec-latency C state for a 1ms idle, that
> won't save you power most likely, so you need to have enough time in
> "real" idle)
>
> so when you know your latency requirements, you now can pick a DEEPER
> sleepstate than you could before (or at least the right one)... dynticks
> needs this more than anything :)

Sounds pretty good. Since dynamic tick tracks timer events one could
also add a method to track interrupts in general if they are regular
enough to do so. That's just thinking while typing, so it might not be
sane.

Daniel

2006-08-24 22:24:23

by Matthew Garrett

[permalink] [raw]
Subject: Re: [RFC] maximum latency tracking infrastructure

On Thu, Aug 24, 2006 at 07:41:35PM +0200, Arjan van de Ven wrote:

> + /* the ipw2100 hardware really doesn't want power management delays
> + * longer than 500usec
> + */
> + modify_acceptable_latency("ipw2100", 500);
> +

Hm. My BIOS claims that the C3 transition period is 85usec (and even my
C4 is 185) , but I've hit the error path where C3 gets disabled. Is this
really adequate? Also, by the looks of it, the C3 disabling path is
still present - is it still theoretically necessary with the above, or
is this just a belt and braces approach?

--
Matthew Garrett | [email protected]

2006-08-24 22:53:57

by Matt Mackall

[permalink] [raw]
Subject: Re: [RFC] maximum latency tracking infrastructure

On Thu, Aug 24, 2006 at 07:41:35PM +0200, Arjan van de Ven wrote:
> Subject: [RFC] maximum latency tracking infrastructure
> From: Arjan van de Ven <[email protected]>
>
> The patch below adds infrastructure to track "maximum allowable latency" for power
> saving policies.

Looks good. But it will also be important to have a user-level way to
report who is constraining us from power saving and by how much of a
margin.

--
Mathematics is the supreme nostalgia of our time.

2006-08-25 04:55:31

by Nick Piggin

[permalink] [raw]
Subject: Re: [RFC] maximum latency tracking infrastructure

Arjan van de Ven wrote:
> Jesse Barnes wrote:
>
>> On Thursday, August 24, 2006 10:41 am, Arjan van de Ven wrote:
>>
>>> The reason for adding this infrastructure is that power management in
>>> the idle loop needs to make a tradeoff between latency and power
>>> savings (deeper power save modes have a longer latency to running code
>>> again).
>>
>>
>> What if a processor was already in a sleep state when a call to
>> set_acceptable_latency() latency occurs?
>
>
> there's nothing sane that can be done in that case; any wake up already
> will cause the unwanted latency!
> A premature wakeup is only making it happen *now*, but now is as
> inconvenient a time as any...
> (in fact it may be a worst case time scenario, say, an audio interrupt...)

Surely you would call set_acceptable_latency() *before* running such
operation that requires the given latency? And that set_acceptable_latency
would block the caller until all CPUs are set to wake within this latency.

That would be the API semantics I would expect, anyway.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-08-25 07:56:49

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [RFC] maximum latency tracking infrastructure

Matt Mackall wrote:
> On Thu, Aug 24, 2006 at 07:41:35PM +0200, Arjan van de Ven wrote:
>> Subject: [RFC] maximum latency tracking infrastructure
>> From: Arjan van de Ven <[email protected]>
>>
>> The patch below adds infrastructure to track "maximum allowable latency" for power
>> saving policies.
>
> Looks good. But it will also be important to have a user-level way to
> report who is constraining us from power saving and by how much of a
> margin.
>

there is in the patch:

echo l > /proc/sysreq-trigger

2006-08-25 07:58:15

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [RFC] maximum latency tracking infrastructure

Nick Piggin wrote:
> Arjan van de Ven wrote:
>> Jesse Barnes wrote:
>>
>>> On Thursday, August 24, 2006 10:41 am, Arjan van de Ven wrote:
>>>
>>>> The reason for adding this infrastructure is that power management in
>>>> the idle loop needs to make a tradeoff between latency and power
>>>> savings (deeper power save modes have a longer latency to running code
>>>> again).
>>>
>>>
>>> What if a processor was already in a sleep state when a call to
>>> set_acceptable_latency() latency occurs?
>>
>>
>> there's nothing sane that can be done in that case; any wake up
>> already will cause the unwanted latency!
>> A premature wakeup is only making it happen *now*, but now is as
>> inconvenient a time as any...
>> (in fact it may be a worst case time scenario, say, an audio
>> interrupt...)
>
> Surely you would call set_acceptable_latency() *before* running such
> operation that requires the given latency? And that set_acceptable_latency
> would block the caller until all CPUs are set to wake within this latency.
>
> That would be the API semantics I would expect, anyway.

but that means it blocks, and thus can't be used in irq context

(the usage model I imagine happens most is a set_acceptable_latency() which can block during device init,
with either no or a very course limit, and a modify_acceptable_latency(), which cannot block, from irq context or
device open)

2006-08-25 08:20:09

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [RFC] maximum latency tracking infrastructure

On Thu, 2006-08-24 at 23:24 +0100, Matthew Garrett wrote:
> On Thu, Aug 24, 2006 at 07:41:35PM +0200, Arjan van de Ven wrote:
>
> > + /* the ipw2100 hardware really doesn't want power management delays
> > + * longer than 500usec
> > + */
> > + modify_acceptable_latency("ipw2100", 500);
> > +
>
> Hm. My BIOS claims that the C3 transition period is 85usec (and even my
> C4 is 185) , but I've hit the error path where C3 gets disabled. Is this
> really adequate?

first of all that 500 is a bit of a guess on my side; James (the Intel
wireless guy) is on holiday so I couldn't get real numbers out of it.
But as proof of concept it's pretty ok :)

> Also, by the looks of it, the C3 disabling path is
> still present - is it still theoretically necessary with the above, or
> is this just a belt and braces approach?

the "problem" is that bioses lie about these numbers all the time as
well ;( (it's getting better but still).


Those numbers you gave, were those on batter or on AC ? (apparently for
the problem machines C3 latency goes WAY up when on battery, and then
the problem hits)

2006-08-25 08:26:48

by Nick Piggin

[permalink] [raw]
Subject: Re: [RFC] maximum latency tracking infrastructure

Arjan van de Ven wrote:
> Nick Piggin wrote:

>> Surely you would call set_acceptable_latency() *before* running such
>> operation that requires the given latency? And that
>> set_acceptable_latency
>> would block the caller until all CPUs are set to wake within this
>> latency.
>>
>> That would be the API semantics I would expect, anyway.
>
>
> but that means it blocks, and thus can't be used in irq context

Is that a problem? I guess it could be, but you don't want to
give a false sense of security either. Having an explicit _nosync
version may make that clear?

>
> (the usage model I imagine happens most is a set_acceptable_latency()
> which can block during device init,
> with either no or a very course limit, and a
> modify_acceptable_latency(), which cannot block, from irq context or
> device open)

OK. You'd know more about that than I ;)

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-08-25 08:31:09

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [RFC] maximum latency tracking infrastructure

On Fri, 2006-08-25 at 18:26 +1000, Nick Piggin wrote:
> Arjan van de Ven wrote:
> > Nick Piggin wrote:
>
> >> Surely you would call set_acceptable_latency() *before* running such
> >> operation that requires the given latency? And that
> >> set_acceptable_latency
> >> would block the caller until all CPUs are set to wake within this
> >> latency.
> >>
> >> That would be the API semantics I would expect, anyway.
> >
> >
> > but that means it blocks, and thus can't be used in irq context
>
> Is that a problem? I guess it could be, but you don't want to
> give a false sense of security either. Having an explicit _nosync
> version may make that clear?

well the api is already split between blocking and non-blocking so in
principle that's easy. The problem is that I suspect most users will use
the non-blocking variant.

Also the "what to do" can be treacherous; it'll need a callback list
simply because many places can be using the latency values, more than
just idle. (I can see pstate code for example also using it to limit
which ones to use, and not use the ones it takes to long to get out of)

I'll investigate what it'll take to get the callback in place; for the
C-state case it's not THAT critical (after all the cpu you are running
on when making this call is not in a deep C state.. :-)

2006-08-25 14:56:17

by Matt Mackall

[permalink] [raw]
Subject: Re: [RFC] maximum latency tracking infrastructure

On Fri, Aug 25, 2006 at 09:56:37AM +0200, Arjan van de Ven wrote:
> Matt Mackall wrote:
> >On Thu, Aug 24, 2006 at 07:41:35PM +0200, Arjan van de Ven wrote:
> >>Subject: [RFC] maximum latency tracking infrastructure
> >>From: Arjan van de Ven <[email protected]>
> >>
> >>The patch below adds infrastructure to track "maximum allowable latency"
> >>for power
> >>saving policies.
> >
> >Looks good. But it will also be important to have a user-level way to
> >report who is constraining us from power saving and by how much of a
> >margin.
> >
>
> there is in the patch:
>
> echo l > /proc/sysreq-trigger

Ahh, missed that. I suppose that'll suffice.

--
Mathematics is the supreme nostalgia of our time.