2008-07-14 21:02:24

by Woodruff, Richard

[permalink] [raw]
Subject: [RFC-PATCH] Improve Menu Governor Prediction of interrupted C states.

Hi,

Any comments on the following?

I'm finding with high interrupt rates for some USB devices the menu governor guesses wrong enough that throughput drops.

With the below tweak tests with fixed data sizes which were taking 12s to complete drop back to 9s. Also my standby idle doesn't change before and after activity with our with out the patch.

What it does is simply timestamp incoming irqs, then in the menu governor will use current & last irq time delta to refine its guess as to how long sleep will happen.

Regards,
Richard W.

Signed-off-by: Richard Woodruff <[email protected]>

diff -purN img/2.6_kernel/arch/arm/kernel/irq.c 2.6_kernel/arch/arm/kernel/irq.c
--- img/2.6_kernel/arch/arm/kernel/irq.c 2008-04-03 22:43:09.000000000 -0500
+++ 2.6_kernel/arch/arm/kernel/irq.c 2008-07-14 15:09:48.000000000 -0500
@@ -122,6 +131,8 @@ asmlinkage void __exception asm_do_IRQ(u

irq_enter();

+ kstat_irq_stamp();
+
desc_handle_irq(irq, desc);

/* AT91 specific workaround */
diff -purN img/2.6_kernel/drivers/cpuidle/governors/menu.c 2.6_kernel/drivers/cpuidle/governors/menu.c
--- img/2.6_kernel/drivers/cpuidle/governors/menu.c 2008-07-04 15:31:23.000000000 -0500
+++ 2.6_kernel/drivers/cpuidle/governors/menu.c 2008-07-14 15:22:55.000000000 -0500
@@ -7,6 +7,7 @@
*/

#include <linux/kernel.h>
+#include <linux/kernel_stat.h>
#include <linux/cpuidle.h>
#include <linux/latency.h>
#include <linux/time.h>
@@ -70,6 +71,7 @@ static void menu_reflect(struct cpuidle_
unsigned int measured_us =
cpuidle_get_last_residency(dev) + data->elapsed_us;
struct cpuidle_state *target = &dev->states[last_idx];
+ const unsigned int cpu = smp_processor_id();

/*
* Ugh, this idle state doesn't support residency measurements, so we
@@ -92,6 +94,10 @@ static void menu_reflect(struct cpuidle_
data->elapsed_us = -1;
data->predicted_us = max(measured_us, data->last_measured_us);
}
+
+ /* Guess & factor in interrupt wake rate */
+ data->predicted_us = min((s64)data->predicted_us, ktime_to_us(
+ ktime_sub(kstat_cpu(cpu).irq_now, kstat_cpu(cpu).irq_last)));
}

/**
diff -purN img/2.6_kernel/include/linux/kernel_stat.h 2.6_kernel/include/linux/kernel_stat.h
--- img/2.6_kernel/include/linux/kernel_stat.h 2008-04-03 21:51:50.000000000 -0500
+++ 2.6_kernel/include/linux/kernel_stat.h 2008-07-14 13:11:09.000000000 -0500
@@ -7,6 +7,7 @@
#include <linux/percpu.h>
#include <linux/cpumask.h>
#include <asm/cputime.h>
+#include <linux/hrtimer.h>

/*
* 'kernel_stat.h' contains the definitions needed for doing
@@ -29,6 +30,8 @@ struct cpu_usage_stat {
struct kernel_stat {
struct cpu_usage_stat cpustat;
unsigned int irqs[NR_IRQS];
+ ktime_t irq_now;
+ ktime_t irq_last;
};

DECLARE_PER_CPU(struct kernel_stat, kstat);
@@ -52,6 +55,16 @@ static inline int kstat_irqs(int irq)
return sum;
}

+/*
+ * Provide hook to try and understand interrupt rate on a processor
+ */
+static inline void kstat_irq_stamp(void)
+{
+ const unsigned int cpu = smp_processor_id();
+ kstat_cpu(cpu).irq_last = kstat_cpu(cpu).irq_now;
+ kstat_cpu(cpu).irq_now = ktime_get();
+}
+
extern void account_user_time(struct task_struct *, cputime_t);
extern void account_user_time_scaled(struct task_struct *, cputime_t);
extern void account_system_time(struct task_struct *, int, cputime_t);


Attachments:
menu_gov_irq_guess.diff (2.66 kB)
menu_gov_irq_guess.diff

2008-07-22 07:59:28

by Wei, Gang

[permalink] [raw]
Subject: RE: [linux-pm] [RFC-PATCH] Improve Menu Governor Prediction of interrupted C states.

I also found current cpuidle Menu governor should have some problems
while predicting next non-expected break-event after some expected
break-events. The measured_us/elapsed_us/predicted_us will become larger
and larger once (measured_us + BREAK_FUZZ >= data->expected_us -
target->exit_latency). The major point is that it should be
last_residency, not measured_us, that need to be used to do comparing
and distinguish between expected & non-expected events.

Below is my draft patch (not tested) for this. It is simple and should
also be effective for high interrupt rates case.

Jimmy

Signed-off-by: Wei Gang <[email protected]>

diff --git a/drivers/cpuidle/governors/menu.c
b/drivers/cpuidle/governors/menu.c
index 78d77c5..dea5250 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -80,18 +80,19 @@ static void menu_reflect(struct cpuidle_device *dev)
if (!(target->flags & CPUIDLE_FLAG_TIME_VALID))
measured_us = USEC_PER_SEC / HZ;

+ /* if wrapping, set to max uint (-1) */
+ measured_us = (data->elapsed_us < measured_us) ? measured_us :
-1;
+
/* Predict time remaining until next break event */
- if (measured_us + BREAK_FUZZ < data->expected_us -
target->exit_latency) {
- data->predicted_us = max(measured_us,
data->last_measured_us);
+ data->predicted_us = max(measured_us, data->last_measured_us);
+
+ /* Distinguish between expected & non-expected events */
+ if (cpuidle_get_last_residency(dev) + BREAK_FUZZ
+ < data->expected_us - target->exit_latency) {
data->last_measured_us = measured_us;
data->elapsed_us = 0;
- } else {
- if (data->elapsed_us < data->elapsed_us + measured_us)
- data->elapsed_us = measured_us;
- else
- data->elapsed_us = -1;
- data->predicted_us = max(measured_us,
data->last_measured_us);
- }
+ } else
+ data->elapsed_us = measured_us;
}

/**


On Tuesday, July 15, 2008 5:02 AM, Woodruff, Richard wrote:
> Hi,
>
> Any comments on the following?
>
> I'm finding with high interrupt rates for some USB devices the menu
governor
> guesses wrong enough that throughput drops.
>
> With the below tweak tests with fixed data sizes which were taking 12s
to
> complete drop back to 9s. Also my standby idle doesn't change before
and
> after activity with our with out the patch.
>
> What it does is simply timestamp incoming irqs, then in the menu
governor
> will use current & last irq time delta to refine its guess as to how
long
> sleep will happen.
>
> Regards,
> Richard W.

2008-07-23 19:00:49

by Woodruff, Richard

[permalink] [raw]
Subject: RE: [linux-pm] [RFC-PATCH] Improve Menu Governor Prediction of interrupted C states.

Hi Wei,

I'll give it a try and make some measurements. It will probable be a week before I can give any feed back. Its good to see others are looking at this.

Thanks,
Richard W.

> I also found current cpuidle Menu governor should have some problems
> while predicting next non-expected break-event after some expected
> break-events. The measured_us/elapsed_us/predicted_us will become
> larger
> and larger once (measured_us + BREAK_FUZZ >= data->expected_us -
> target->exit_latency). The major point is that it should be
> last_residency, not measured_us, that need to be used to do comparing
> and distinguish between expected & non-expected events.
>
> Below is my draft patch (not tested) for this. It is simple and should
> also be effective for high interrupt rates case.
>
> Jimmy
>
> Signed-off-by: Wei Gang <[email protected]>
>
> diff --git a/drivers/cpuidle/governors/menu.c
> b/drivers/cpuidle/governors/menu.c
> index 78d77c5..dea5250 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -80,18 +80,19 @@ static void menu_reflect(struct cpuidle_device
> *dev)
> if (!(target->flags & CPUIDLE_FLAG_TIME_VALID))
> measured_us = USEC_PER_SEC / HZ;
>
> + /* if wrapping, set to max uint (-1) */
> + measured_us = (data->elapsed_us < measured_us) ? measured_us :
> -1;
> +
> /* Predict time remaining until next break event */
> - if (measured_us + BREAK_FUZZ < data->expected_us -
> target->exit_latency) {
> - data->predicted_us = max(measured_us,
> data->last_measured_us);
> + data->predicted_us = max(measured_us, data->last_measured_us);
> +
> + /* Distinguish between expected & non-expected events */
> + if (cpuidle_get_last_residency(dev) + BREAK_FUZZ
> + < data->expected_us - target->exit_latency) {
> data->last_measured_us = measured_us;
> data->elapsed_us = 0;
> - } else {
> - if (data->elapsed_us < data->elapsed_us + measured_us)
> - data->elapsed_us = measured_us;
> - else
> - data->elapsed_us = -1;
> - data->predicted_us = max(measured_us,
> data->last_measured_us);
> - }
> + } else
> + data->elapsed_us = measured_us;
> }
>
> /**
>
>
> On Tuesday, July 15, 2008 5:02 AM, Woodruff, Richard wrote:
> > Hi,
> >
> > Any comments on the following?
> >
> > I'm finding with high interrupt rates for some USB devices the menu
> governor
> > guesses wrong enough that throughput drops.
> >
> > With the below tweak tests with fixed data sizes which were taking
> 12s
> to
> > complete drop back to 9s. Also my standby idle doesn't change
> before
> and
> > after activity with our with out the patch.
> >
> > What it does is simply timestamp incoming irqs, then in the menu
> governor
> > will use current & last irq time delta to refine its guess as to how
> long
> > sleep will happen.
> >
> > Regards,
> > Richard W.

2008-08-06 01:25:12

by Woodruff, Richard

[permalink] [raw]
Subject: RE: [linux-pm] [RFC-PATCH] Improve Menu Governor Prediction of interrupted C states.

Hi,

> I also found current cpuidle Menu governor should have some problems
> while predicting next non-expected break-event after some expected
> break-events. The measured_us/elapsed_us/predicted_us will become
> larger
> and larger once (measured_us + BREAK_FUZZ >= data->expected_us -
> target->exit_latency). The major point is that it should be
> last_residency, not measured_us, that need to be used to do comparing
> and distinguish between expected & non-expected events.
>
> Below is my draft patch (not tested) for this. It is simple and should
> also be effective for high interrupt rates case.

This does seem to give better guesses. However, first tests are not showing as well as the irq time stamp version I posted.

I'll need to grab some hardware trace to see what is going on.

Thanks,
Richard W.

2008-08-06 01:52:45

by Wei, Gang

[permalink] [raw]
Subject: RE: [linux-pm] [RFC-PATCH] Improve Menu Governor Prediction of interrupted C states.

> Hi,
>
>> I also found current cpuidle Menu governor should have some problems
>> while predicting next non-expected break-event after some expected
>> break-events. The measured_us/elapsed_us/predicted_us will become
larger
>> and larger once (measured_us + BREAK_FUZZ >= data->expected_us -
>> target->exit_latency). The major point is that it should be
>> last_residency, not measured_us, that need to be used to do comparing
>> and distinguish between expected & non-expected events.
>>
>> Below is my draft patch (not tested) for this. It is simple and
should
>> also be effective for high interrupt rates case.
>
> This does seem to give better guesses. However, first tests are not
showing
> as well as the irq time stamp version I posted.
>
> I'll need to grab some hardware trace to see what is going on.

Thanks for the trial. My original patch has already been merged into
another patch "[patch 2/3] cpuidle: Menu governor fix wrong usage of
measured_us" sent out by Venkatesh. The new patch should be more intact.

Jimmy