From: Rafael J. Wysocki <[email protected]>
Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
with stopped tick) missed the case when the target residencies of
deep idle states of CPUs are above the tick boundary which may cause
the CPU to get stuck in a shallow idle state for a long time.
Say there are two CPU idle states available: one shallow, with the
target residency much below the tick boundary and one deep, with
the target residency significantly above the tick boundary. In
that case, if the tick has been stopped already and the expected
next timer event is relatively far in the future, the governor will
assume the idle duration to be equal to TICK_USEC and it will select
the idle state for the CPU accordingly. However, that will cause the
shallow state to be selected even though it would have been more
energy-efficient to select the deep one.
To address this issue, modify the governor to always assume idle
duration to be equal to the time till the closest timer event if
the tick is not running which will cause the selected idle states
to always match the known CPU wakeup time.
Also make it always indicate that the tick should be stopped in
that case for consistency.
Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
Reported-by: Leo Yan <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/cpuidle/governors/menu.c | 49 ++++++++++++++++++---------------------
1 file changed, 23 insertions(+), 26 deletions(-)
Index: linux-pm/drivers/cpuidle/governors/menu.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/menu.c
+++ linux-pm/drivers/cpuidle/governors/menu.c
@@ -307,6 +307,18 @@ static int menu_select(struct cpuidle_dr
/* determine the expected residency time, round up */
data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
+ /*
+ * If the tick is already stopped, the cost of possible short idle
+ * duration misprediction is much higher, because the CPU may be stuck
+ * in a shallow idle state for a long time as a result of it. In that
+ * case say we might mispredict and use the known time till the closest
+ * timer event for the idle state selection.
+ */
+ if (tick_nohz_tick_stopped()) {
+ data->predicted_us = ktime_to_us(delta_next);
+ goto select;
+ }
+
get_iowait_load(&nr_iowaiters, &cpu_load);
data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
@@ -344,29 +356,15 @@ static int menu_select(struct cpuidle_dr
*/
data->predicted_us = min(data->predicted_us, expected_interval);
- if (tick_nohz_tick_stopped()) {
- /*
- * If the tick is already stopped, the cost of possible short
- * idle duration misprediction is much higher, because the CPU
- * may be stuck in a shallow idle state for a long time as a
- * result of it. In that case say we might mispredict and try
- * to force the CPU into a state for which we would have stopped
- * the tick, unless a timer is going to expire really soon
- * anyway.
- */
- if (data->predicted_us < TICK_USEC)
- data->predicted_us = min_t(unsigned int, TICK_USEC,
- ktime_to_us(delta_next));
- } else {
- /*
- * Use the performance multiplier and the user-configurable
- * latency_req to determine the maximum exit latency.
- */
- interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
- if (latency_req > interactivity_req)
- latency_req = interactivity_req;
- }
+ /*
+ * Use the performance multiplier and the user-configurable latency_req
+ * to determine the maximum exit latency.
+ */
+ interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
+ if (latency_req > interactivity_req)
+ latency_req = interactivity_req;
+select:
expected_interval = data->predicted_us;
/*
* Find the idle state with the lowest power while satisfying
@@ -403,14 +401,13 @@ static int menu_select(struct cpuidle_dr
* Don't stop the tick if the selected state is a polling one or if the
* expected idle duration is shorter than the tick period length.
*/
- if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
- expected_interval < TICK_USEC) {
+ if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
+ expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
unsigned int delta_next_us = ktime_to_us(delta_next);
*stop_tick = false;
- if (!tick_nohz_tick_stopped() && idx > 0 &&
- drv->states[idx].target_residency > delta_next_us) {
+ if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
/*
* The tick is not going to be stopped and the target
* residency of the state to be returned is not within
From: Rafael J. Wysocki <[email protected]>
Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
with stopped tick) missed the case when the target residencies of
deep idle states of CPUs are above the tick boundary which may cause
the CPU to get stuck in a shallow idle state for a long time.
Say there are two CPU idle states available: one shallow, with the
target residency much below the tick boundary and one deep, with
the target residency significantly above the tick boundary. In
that case, if the tick has been stopped already and the expected
next timer event is relatively far in the future, the governor will
assume the idle duration to be equal to TICK_USEC and it will select
the idle state for the CPU accordingly. However, that will cause the
shallow state to be selected even though it would have been more
energy-efficient to select the deep one.
To address this issue, modify the governor to always assume idle
duration to be equal to the time till the closest timer event if
the tick is not running which will cause the selected idle states
to always match the known CPU wakeup time.
Also make it always indicate that the tick should be stopped in
that case for consistency.
Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
Reported-by: Leo Yan <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
-> v2: Initialize first_idx properly in the stopped tick case.
---
drivers/cpuidle/governors/menu.c | 55 +++++++++++++++++----------------------
1 file changed, 25 insertions(+), 30 deletions(-)
Index: linux-pm/drivers/cpuidle/governors/menu.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/menu.c
+++ linux-pm/drivers/cpuidle/governors/menu.c
@@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
{
struct menu_device *data = this_cpu_ptr(&menu_devices);
int latency_req = cpuidle_governor_latency_req(dev->cpu);
- int i;
- int first_idx;
- int idx;
+ int first_idx = 0;
+ int idx, i;
unsigned int interactivity_req;
unsigned int expected_interval;
unsigned long nr_iowaiters, cpu_load;
@@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
/* determine the expected residency time, round up */
data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
+ /*
+ * If the tick is already stopped, the cost of possible short idle
+ * duration misprediction is much higher, because the CPU may be stuck
+ * in a shallow idle state for a long time as a result of it. In that
+ * case say we might mispredict and use the known time till the closest
+ * timer event for the idle state selection.
+ */
+ if (tick_nohz_tick_stopped()) {
+ data->predicted_us = ktime_to_us(delta_next);
+ goto select;
+ }
+
get_iowait_load(&nr_iowaiters, &cpu_load);
data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
@@ -322,7 +333,6 @@ static int menu_select(struct cpuidle_dr
expected_interval = get_typical_interval(data);
expected_interval = min(expected_interval, data->next_timer_us);
- first_idx = 0;
if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
struct cpuidle_state *s = &drv->states[1];
unsigned int polling_threshold;
@@ -344,29 +354,15 @@ static int menu_select(struct cpuidle_dr
*/
data->predicted_us = min(data->predicted_us, expected_interval);
- if (tick_nohz_tick_stopped()) {
- /*
- * If the tick is already stopped, the cost of possible short
- * idle duration misprediction is much higher, because the CPU
- * may be stuck in a shallow idle state for a long time as a
- * result of it. In that case say we might mispredict and try
- * to force the CPU into a state for which we would have stopped
- * the tick, unless a timer is going to expire really soon
- * anyway.
- */
- if (data->predicted_us < TICK_USEC)
- data->predicted_us = min_t(unsigned int, TICK_USEC,
- ktime_to_us(delta_next));
- } else {
- /*
- * Use the performance multiplier and the user-configurable
- * latency_req to determine the maximum exit latency.
- */
- interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
- if (latency_req > interactivity_req)
- latency_req = interactivity_req;
- }
+ /*
+ * Use the performance multiplier and the user-configurable latency_req
+ * to determine the maximum exit latency.
+ */
+ interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
+ if (latency_req > interactivity_req)
+ latency_req = interactivity_req;
+select:
expected_interval = data->predicted_us;
/*
* Find the idle state with the lowest power while satisfying
@@ -403,14 +399,13 @@ static int menu_select(struct cpuidle_dr
* Don't stop the tick if the selected state is a polling one or if the
* expected idle duration is shorter than the tick period length.
*/
- if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
- expected_interval < TICK_USEC) {
+ if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
+ expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
unsigned int delta_next_us = ktime_to_us(delta_next);
*stop_tick = false;
- if (!tick_nohz_tick_stopped() && idx > 0 &&
- drv->states[idx].target_residency > delta_next_us) {
+ if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
/*
* The tick is not going to be stopped and the target
* residency of the state to be returned is not within
On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
> Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
>
> Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> with stopped tick) missed the case when the target residencies of
> deep idle states of CPUs are above the tick boundary which may cause
> the CPU to get stuck in a shallow idle state for a long time.
>
> Say there are two CPU idle states available: one shallow, with the
> target residency much below the tick boundary and one deep, with
> the target residency significantly above the tick boundary. In
> that case, if the tick has been stopped already and the expected
> next timer event is relatively far in the future, the governor will
> assume the idle duration to be equal to TICK_USEC and it will select
> the idle state for the CPU accordingly. However, that will cause the
> shallow state to be selected even though it would have been more
> energy-efficient to select the deep one.
>
> To address this issue, modify the governor to always assume idle
> duration to be equal to the time till the closest timer event if
> the tick is not running which will cause the selected idle states
> to always match the known CPU wakeup time.
>
> Also make it always indicate that the tick should be stopped in
> that case for consistency.
>
> Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
> Reported-by: Leo Yan <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
>
> -> v2: Initialize first_idx properly in the stopped tick case.
>
> ---
> drivers/cpuidle/governors/menu.c | 55 +++++++++++++++++----------------------
> 1 file changed, 25 insertions(+), 30 deletions(-)
>
> Index: linux-pm/drivers/cpuidle/governors/menu.c
> ===================================================================
> --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> +++ linux-pm/drivers/cpuidle/governors/menu.c
> @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> {
> struct menu_device *data = this_cpu_ptr(&menu_devices);
> int latency_req = cpuidle_governor_latency_req(dev->cpu);
> - int i;
> - int first_idx;
> - int idx;
> + int first_idx = 0;
> + int idx, i;
> unsigned int interactivity_req;
> unsigned int expected_interval;
> unsigned long nr_iowaiters, cpu_load;
> @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> /* determine the expected residency time, round up */
> data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
>
> + /*
> + * If the tick is already stopped, the cost of possible short idle
> + * duration misprediction is much higher, because the CPU may be stuck
> + * in a shallow idle state for a long time as a result of it. In that
> + * case say we might mispredict and use the known time till the closest
> + * timer event for the idle state selection.
> + */
> + if (tick_nohz_tick_stopped()) {
> + data->predicted_us = ktime_to_us(delta_next);
> + goto select;
> + }
> +
This introduce two potential issues:
- This will totally ignore the typical pattern in idle loop; I
observed on the mmc driver can trigger multiple times (> 10 times)
with consistent interval; but I have no strong opinion to not
use next timer event for this case.
- Will this break correction factors when the CPU exit from idle?
data->bucket is stale value ....
> get_iowait_load(&nr_iowaiters, &cpu_load);
> data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
>
> @@ -322,7 +333,6 @@ static int menu_select(struct cpuidle_dr
> expected_interval = get_typical_interval(data);
> expected_interval = min(expected_interval, data->next_timer_us);
>
> - first_idx = 0;
> if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
> struct cpuidle_state *s = &drv->states[1];
> unsigned int polling_threshold;
> @@ -344,29 +354,15 @@ static int menu_select(struct cpuidle_dr
> */
> data->predicted_us = min(data->predicted_us, expected_interval);
>
> - if (tick_nohz_tick_stopped()) {
> - /*
> - * If the tick is already stopped, the cost of possible short
> - * idle duration misprediction is much higher, because the CPU
> - * may be stuck in a shallow idle state for a long time as a
> - * result of it. In that case say we might mispredict and try
> - * to force the CPU into a state for which we would have stopped
> - * the tick, unless a timer is going to expire really soon
> - * anyway.
> - */
> - if (data->predicted_us < TICK_USEC)
> - data->predicted_us = min_t(unsigned int, TICK_USEC,
> - ktime_to_us(delta_next));
> - } else {
> - /*
> - * Use the performance multiplier and the user-configurable
> - * latency_req to determine the maximum exit latency.
> - */
> - interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
> - if (latency_req > interactivity_req)
> - latency_req = interactivity_req;
> - }
> + /*
> + * Use the performance multiplier and the user-configurable latency_req
> + * to determine the maximum exit latency.
> + */
> + interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
> + if (latency_req > interactivity_req)
> + latency_req = interactivity_req;
>
> +select:
> expected_interval = data->predicted_us;
> /*
> * Find the idle state with the lowest power while satisfying
> @@ -403,14 +399,13 @@ static int menu_select(struct cpuidle_dr
> * Don't stop the tick if the selected state is a polling one or if the
> * expected idle duration is shorter than the tick period length.
> */
> - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> - expected_interval < TICK_USEC) {
> + if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> + expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
I am not sure this logic is right... Why not use below checking, so
for POLLING state we will never ask to stop the tick?
if (drv->states[idx].flags & CPUIDLE_FLAG_POLLING ||
(expected_interval < TICK_USEC && !tick_nohz_tick_stopped())) {
> unsigned int delta_next_us = ktime_to_us(delta_next);
>
> *stop_tick = false;
>
> - if (!tick_nohz_tick_stopped() && idx > 0 &&
> - drv->states[idx].target_residency > delta_next_us) {
> + if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
> /*
> * The tick is not going to be stopped and the target
> * residency of the state to be returned is not within
>
On Fri, Aug 10, 2018 at 11:20 AM <[email protected]> wrote:
>
> On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> > Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
> >
> > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> > with stopped tick) missed the case when the target residencies of
> > deep idle states of CPUs are above the tick boundary which may cause
> > the CPU to get stuck in a shallow idle state for a long time.
> >
> > Say there are two CPU idle states available: one shallow, with the
> > target residency much below the tick boundary and one deep, with
> > the target residency significantly above the tick boundary. In
> > that case, if the tick has been stopped already and the expected
> > next timer event is relatively far in the future, the governor will
> > assume the idle duration to be equal to TICK_USEC and it will select
> > the idle state for the CPU accordingly. However, that will cause the
> > shallow state to be selected even though it would have been more
> > energy-efficient to select the deep one.
> >
> > To address this issue, modify the governor to always assume idle
> > duration to be equal to the time till the closest timer event if
> > the tick is not running which will cause the selected idle states
> > to always match the known CPU wakeup time.
> >
> > Also make it always indicate that the tick should be stopped in
> > that case for consistency.
> >
> > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
> > Reported-by: Leo Yan <[email protected]>
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> >
> > -> v2: Initialize first_idx properly in the stopped tick case.
> >
> > ---
> > drivers/cpuidle/governors/menu.c | 55 +++++++++++++++++----------------------
> > 1 file changed, 25 insertions(+), 30 deletions(-)
> >
> > Index: linux-pm/drivers/cpuidle/governors/menu.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> > {
> > struct menu_device *data = this_cpu_ptr(&menu_devices);
> > int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > - int i;
> > - int first_idx;
> > - int idx;
> > + int first_idx = 0;
> > + int idx, i;
> > unsigned int interactivity_req;
> > unsigned int expected_interval;
> > unsigned long nr_iowaiters, cpu_load;
> > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> > /* determine the expected residency time, round up */
> > data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
> >
> > + /*
> > + * If the tick is already stopped, the cost of possible short idle
> > + * duration misprediction is much higher, because the CPU may be stuck
> > + * in a shallow idle state for a long time as a result of it. In that
> > + * case say we might mispredict and use the known time till the closest
> > + * timer event for the idle state selection.
> > + */
> > + if (tick_nohz_tick_stopped()) {
> > + data->predicted_us = ktime_to_us(delta_next);
> > + goto select;
> > + }
> > +
>
> This introduce two potential issues:
>
> - This will totally ignore the typical pattern in idle loop; I
> observed on the mmc driver can trigger multiple times (> 10 times)
> with consistent interval;
I'm not sure what you mean by "ignore".
> but I have no strong opinion to not use next timer event for this case.
OK
> - Will this break correction factors when the CPU exit from idle?
> data->bucket is stale value ....
Good point.
I'll send a v3 with this addressed.
>
> > get_iowait_load(&nr_iowaiters, &cpu_load);
> > data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
> >
> > @@ -322,7 +333,6 @@ static int menu_select(struct cpuidle_dr
> > expected_interval = get_typical_interval(data);
> > expected_interval = min(expected_interval, data->next_timer_us);
> >
> > - first_idx = 0;
> > if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
> > struct cpuidle_state *s = &drv->states[1];
> > unsigned int polling_threshold;
> > @@ -344,29 +354,15 @@ static int menu_select(struct cpuidle_dr
> > */
> > data->predicted_us = min(data->predicted_us, expected_interval);
> >
> > - if (tick_nohz_tick_stopped()) {
> > - /*
> > - * If the tick is already stopped, the cost of possible short
> > - * idle duration misprediction is much higher, because the CPU
> > - * may be stuck in a shallow idle state for a long time as a
> > - * result of it. In that case say we might mispredict and try
> > - * to force the CPU into a state for which we would have stopped
> > - * the tick, unless a timer is going to expire really soon
> > - * anyway.
> > - */
> > - if (data->predicted_us < TICK_USEC)
> > - data->predicted_us = min_t(unsigned int, TICK_USEC,
> > - ktime_to_us(delta_next));
> > - } else {
> > - /*
> > - * Use the performance multiplier and the user-configurable
> > - * latency_req to determine the maximum exit latency.
> > - */
> > - interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
> > - if (latency_req > interactivity_req)
> > - latency_req = interactivity_req;
> > - }
> > + /*
> > + * Use the performance multiplier and the user-configurable latency_req
> > + * to determine the maximum exit latency.
> > + */
> > + interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
> > + if (latency_req > interactivity_req)
> > + latency_req = interactivity_req;
> >
> > +select:
> > expected_interval = data->predicted_us;
> > /*
> > * Find the idle state with the lowest power while satisfying
> > @@ -403,14 +399,13 @@ static int menu_select(struct cpuidle_dr
> > * Don't stop the tick if the selected state is a polling one or if the
> > * expected idle duration is shorter than the tick period length.
> > */
> > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > - expected_interval < TICK_USEC) {
> > + if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > + expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
>
> I am not sure this logic is right... Why not use below checking, so
> for POLLING state we will never ask to stop the tick?
>
> if (drv->states[idx].flags & CPUIDLE_FLAG_POLLING ||
> (expected_interval < TICK_USEC && !tick_nohz_tick_stopped())) {
>
The only effect of it would be setting stop_tick to false, but why
would that matter?
> > unsigned int delta_next_us = ktime_to_us(delta_next);
> >
> > *stop_tick = false;
> >
> > - if (!tick_nohz_tick_stopped() && idx > 0 &&
> > - drv->states[idx].target_residency > delta_next_us) {
> > + if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
> > /*
> > * The tick is not going to be stopped and the target
> > * residency of the state to be returned is not within
> >
From: Rafael J. Wysocki <[email protected]>
Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
with stopped tick) missed the case when the target residencies of
deep idle states of CPUs are above the tick boundary which may cause
the CPU to get stuck in a shallow idle state for a long time.
Say there are two CPU idle states available: one shallow, with the
target residency much below the tick boundary and one deep, with
the target residency significantly above the tick boundary. In
that case, if the tick has been stopped already and the expected
next timer event is relatively far in the future, the governor will
assume the idle duration to be equal to TICK_USEC and it will select
the idle state for the CPU accordingly. However, that will cause the
shallow state to be selected even though it would have been more
energy-efficient to select the deep one.
To address this issue, modify the governor to always assume idle
duration to be equal to the time till the closest timer event if
the tick is not running which will cause the selected idle states
to always match the known CPU wakeup time.
Also make it always indicate that the tick should be stopped in
that case for consistency.
Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
Reported-by: Leo Yan <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
-> v2: Initialize first_idx properly in the stopped tick case.
-> v3: Compute data->bucket before checking whether or not the tick has been
stopped already to prevent it from becoming stale.
---
drivers/cpuidle/governors/menu.c | 55 +++++++++++++++++----------------------
1 file changed, 25 insertions(+), 30 deletions(-)
Index: linux-pm/drivers/cpuidle/governors/menu.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/menu.c
+++ linux-pm/drivers/cpuidle/governors/menu.c
@@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
{
struct menu_device *data = this_cpu_ptr(&menu_devices);
int latency_req = cpuidle_governor_latency_req(dev->cpu);
- int i;
- int first_idx;
- int idx;
+ int first_idx = 0;
+ int idx, i;
unsigned int interactivity_req;
unsigned int expected_interval;
unsigned long nr_iowaiters, cpu_load;
@@ -311,6 +310,18 @@ static int menu_select(struct cpuidle_dr
data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
/*
+ * If the tick is already stopped, the cost of possible short idle
+ * duration misprediction is much higher, because the CPU may be stuck
+ * in a shallow idle state for a long time as a result of it. In that
+ * case say we might mispredict and use the known time till the closest
+ * timer event for the idle state selection.
+ */
+ if (tick_nohz_tick_stopped()) {
+ data->predicted_us = ktime_to_us(delta_next);
+ goto select;
+ }
+
+ /*
* Force the result of multiplication to be 64 bits even if both
* operands are 32 bits.
* Make sure to round up for half microseconds.
@@ -322,7 +333,6 @@ static int menu_select(struct cpuidle_dr
expected_interval = get_typical_interval(data);
expected_interval = min(expected_interval, data->next_timer_us);
- first_idx = 0;
if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
struct cpuidle_state *s = &drv->states[1];
unsigned int polling_threshold;
@@ -344,29 +354,15 @@ static int menu_select(struct cpuidle_dr
*/
data->predicted_us = min(data->predicted_us, expected_interval);
- if (tick_nohz_tick_stopped()) {
- /*
- * If the tick is already stopped, the cost of possible short
- * idle duration misprediction is much higher, because the CPU
- * may be stuck in a shallow idle state for a long time as a
- * result of it. In that case say we might mispredict and try
- * to force the CPU into a state for which we would have stopped
- * the tick, unless a timer is going to expire really soon
- * anyway.
- */
- if (data->predicted_us < TICK_USEC)
- data->predicted_us = min_t(unsigned int, TICK_USEC,
- ktime_to_us(delta_next));
- } else {
- /*
- * Use the performance multiplier and the user-configurable
- * latency_req to determine the maximum exit latency.
- */
- interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
- if (latency_req > interactivity_req)
- latency_req = interactivity_req;
- }
+ /*
+ * Use the performance multiplier and the user-configurable latency_req
+ * to determine the maximum exit latency.
+ */
+ interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
+ if (latency_req > interactivity_req)
+ latency_req = interactivity_req;
+select:
expected_interval = data->predicted_us;
/*
* Find the idle state with the lowest power while satisfying
@@ -403,14 +399,13 @@ static int menu_select(struct cpuidle_dr
* Don't stop the tick if the selected state is a polling one or if the
* expected idle duration is shorter than the tick period length.
*/
- if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
- expected_interval < TICK_USEC) {
+ if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
+ expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
unsigned int delta_next_us = ktime_to_us(delta_next);
*stop_tick = false;
- if (!tick_nohz_tick_stopped() && idx > 0 &&
- drv->states[idx].target_residency > delta_next_us) {
+ if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
/*
* The tick is not going to be stopped and the target
* residency of the state to be returned is not within
On Fri, Aug 10, 2018 at 01:04:22PM +0200, Rafael J. Wysocki wrote:
> On Fri, Aug 10, 2018 at 11:20 AM <[email protected]> wrote:
> >
> > On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote:
> > > From: Rafael J. Wysocki <[email protected]>
> > > Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
> > >
> > > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> > > with stopped tick) missed the case when the target residencies of
> > > deep idle states of CPUs are above the tick boundary which may cause
> > > the CPU to get stuck in a shallow idle state for a long time.
> > >
> > > Say there are two CPU idle states available: one shallow, with the
> > > target residency much below the tick boundary and one deep, with
> > > the target residency significantly above the tick boundary. In
> > > that case, if the tick has been stopped already and the expected
> > > next timer event is relatively far in the future, the governor will
> > > assume the idle duration to be equal to TICK_USEC and it will select
> > > the idle state for the CPU accordingly. However, that will cause the
> > > shallow state to be selected even though it would have been more
> > > energy-efficient to select the deep one.
> > >
> > > To address this issue, modify the governor to always assume idle
> > > duration to be equal to the time till the closest timer event if
> > > the tick is not running which will cause the selected idle states
> > > to always match the known CPU wakeup time.
> > >
> > > Also make it always indicate that the tick should be stopped in
> > > that case for consistency.
> > >
> > > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
> > > Reported-by: Leo Yan <[email protected]>
> > > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > > ---
> > >
> > > -> v2: Initialize first_idx properly in the stopped tick case.
> > >
> > > ---
> > > drivers/cpuidle/governors/menu.c | 55 +++++++++++++++++----------------------
> > > 1 file changed, 25 insertions(+), 30 deletions(-)
> > >
> > > Index: linux-pm/drivers/cpuidle/governors/menu.c
> > > ===================================================================
> > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> > > {
> > > struct menu_device *data = this_cpu_ptr(&menu_devices);
> > > int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > > - int i;
> > > - int first_idx;
> > > - int idx;
> > > + int first_idx = 0;
> > > + int idx, i;
> > > unsigned int interactivity_req;
> > > unsigned int expected_interval;
> > > unsigned long nr_iowaiters, cpu_load;
> > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> > > /* determine the expected residency time, round up */
> > > data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
> > >
> > > + /*
> > > + * If the tick is already stopped, the cost of possible short idle
> > > + * duration misprediction is much higher, because the CPU may be stuck
> > > + * in a shallow idle state for a long time as a result of it. In that
> > > + * case say we might mispredict and use the known time till the closest
> > > + * timer event for the idle state selection.
> > > + */
> > > + if (tick_nohz_tick_stopped()) {
> > > + data->predicted_us = ktime_to_us(delta_next);
> > > + goto select;
> > > + }
> > > +
> >
> > This introduce two potential issues:
> >
> > - This will totally ignore the typical pattern in idle loop; I
> > observed on the mmc driver can trigger multiple times (> 10 times)
> > with consistent interval;
>
> I'm not sure what you mean by "ignore".
You could see after move code from blow to this position, the typical
pattern interval will not be accounted; so if in the middle of idles
there have a bunch of interrupts with fix pattern, the upper code
cannot detect this pattern anymore.
[...]
> > > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > - expected_interval < TICK_USEC) {
> > > + if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > + expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
> >
> > I am not sure this logic is right... Why not use below checking, so
> > for POLLING state we will never ask to stop the tick?
> >
> > if (drv->states[idx].flags & CPUIDLE_FLAG_POLLING ||
> > (expected_interval < TICK_USEC && !tick_nohz_tick_stopped())) {
> >
>
> The only effect of it would be setting stop_tick to false, but why
> would that matter?
Please consider below situation, not sure if this case is existed or
not:
step1: first time: enter one idle state with stopping tick;
step2: second time: select POLLING state and tick_nohz_tick_stopped()
is true;
So in step2, it cannot set stop_tick to false with below sentence.
> > > unsigned int delta_next_us = ktime_to_us(delta_next);
> > >
> > > *stop_tick = false;
[...]
Thanks,
Leo Yan
Hi Rafael,
I love your patch! Perhaps something to improve:
[auto build test WARNING on pm/linux-next]
[also build test WARNING on v4.18-rc8 next-20180810]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Rafael-J-Wysocki/cpuidle-menu-Handle-stopped-tick-more-aggressively/20180811-191914
base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: mips-allyesconfig (attached as .config)
compiler: mips-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=mips
Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings
All warnings (new ones prefixed by >>):
drivers/cpuidle/governors/menu.c: In function 'menu_select':
>> drivers/cpuidle/governors/menu.c:289:6: warning: 'first_idx' may be used uninitialized in this function [-Wmaybe-uninitialized]
int first_idx;
^~~~~~~~~
vim +/first_idx +289 drivers/cpuidle/governors/menu.c
1f85f87d4 Arjan van de Ven 2010-05-24 276
4f86d3a8e Len Brown 2007-10-03 277 /**
4f86d3a8e Len Brown 2007-10-03 278 * menu_select - selects the next idle state to enter
46bcfad7a Deepthi Dharwar 2011-10-28 279 * @drv: cpuidle driver containing state data
4f86d3a8e Len Brown 2007-10-03 280 * @dev: the CPU
45f1ff59e Rafael J. Wysocki 2018-03-22 281 * @stop_tick: indication on whether or not to stop the tick
4f86d3a8e Len Brown 2007-10-03 282 */
45f1ff59e Rafael J. Wysocki 2018-03-22 283 static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
45f1ff59e Rafael J. Wysocki 2018-03-22 284 bool *stop_tick)
4f86d3a8e Len Brown 2007-10-03 285 {
229b6863b Christoph Lameter 2014-08-17 286 struct menu_device *data = this_cpu_ptr(&menu_devices);
0fc784fb0 Rafael J. Wysocki 2018-05-30 287 int latency_req = cpuidle_governor_latency_req(dev->cpu);
4f86d3a8e Len Brown 2007-10-03 288 int i;
3ed09c945 Nicholas Piggin 2017-06-26 @289 int first_idx;
3ed09c945 Nicholas Piggin 2017-06-26 290 int idx;
96e95182e [email protected] 2014-02-24 291 unsigned int interactivity_req;
e132b9b3b Rik van Riel 2016-03-16 292 unsigned int expected_interval;
372ba8cb4 Mel Gorman 2014-08-06 293 unsigned long nr_iowaiters, cpu_load;
296bb1e51 Rafael J. Wysocki 2018-04-05 294 ktime_t delta_next;
4f86d3a8e Len Brown 2007-10-03 295
672917dcc Corrado Zoccolo 2009-09-21 296 if (data->needs_update) {
46bcfad7a Deepthi Dharwar 2011-10-28 297 menu_update(drv, dev);
672917dcc Corrado Zoccolo 2009-09-21 298 data->needs_update = 0;
672917dcc Corrado Zoccolo 2009-09-21 299 }
672917dcc Corrado Zoccolo 2009-09-21 300
69d25870f Arjan van de Ven 2009-09-21 301 /* Special case when user has set very strict latency requirement */
45f1ff59e Rafael J. Wysocki 2018-03-22 302 if (unlikely(latency_req == 0)) {
45f1ff59e Rafael J. Wysocki 2018-03-22 303 *stop_tick = false;
a2bd92023 [email protected] 2008-07-30 304 return 0;
45f1ff59e Rafael J. Wysocki 2018-03-22 305 }
a2bd92023 [email protected] 2008-07-30 306
69d25870f Arjan van de Ven 2009-09-21 307 /* determine the expected residency time, round up */
296bb1e51 Rafael J. Wysocki 2018-04-05 308 data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
69d25870f Arjan van de Ven 2009-09-21 309
5f9f09809 Rafael J. Wysocki 2018-08-10 310 /*
5f9f09809 Rafael J. Wysocki 2018-08-10 311 * If the tick is already stopped, the cost of possible short idle
5f9f09809 Rafael J. Wysocki 2018-08-10 312 * duration misprediction is much higher, because the CPU may be stuck
5f9f09809 Rafael J. Wysocki 2018-08-10 313 * in a shallow idle state for a long time as a result of it. In that
5f9f09809 Rafael J. Wysocki 2018-08-10 314 * case say we might mispredict and use the known time till the closest
5f9f09809 Rafael J. Wysocki 2018-08-10 315 * timer event for the idle state selection.
5f9f09809 Rafael J. Wysocki 2018-08-10 316 */
5f9f09809 Rafael J. Wysocki 2018-08-10 317 if (tick_nohz_tick_stopped()) {
5f9f09809 Rafael J. Wysocki 2018-08-10 318 data->predicted_us = ktime_to_us(delta_next);
5f9f09809 Rafael J. Wysocki 2018-08-10 319 goto select;
5f9f09809 Rafael J. Wysocki 2018-08-10 320 }
5f9f09809 Rafael J. Wysocki 2018-08-10 321
372ba8cb4 Mel Gorman 2014-08-06 322 get_iowait_load(&nr_iowaiters, &cpu_load);
64b4ca5cb Mel Gorman 2014-08-06 323 data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
69d25870f Arjan van de Ven 2009-09-21 324
69d25870f Arjan van de Ven 2009-09-21 325 /*
51f245b89 Tuukka Tikkanen 2013-08-14 326 * Force the result of multiplication to be 64 bits even if both
51f245b89 Tuukka Tikkanen 2013-08-14 327 * operands are 32 bits.
51f245b89 Tuukka Tikkanen 2013-08-14 328 * Make sure to round up for half microseconds.
51f245b89 Tuukka Tikkanen 2013-08-14 329 */
ee3c86f35 Javi Merino 2015-04-16 330 data->predicted_us = DIV_ROUND_CLOSEST_ULL((uint64_t)data->next_timer_us *
51f245b89 Tuukka Tikkanen 2013-08-14 331 data->correction_factor[data->bucket],
69d25870f Arjan van de Ven 2009-09-21 332 RESOLUTION * DECAY);
69d25870f Arjan van de Ven 2009-09-21 333
e132b9b3b Rik van Riel 2016-03-16 334 expected_interval = get_typical_interval(data);
e132b9b3b Rik van Riel 2016-03-16 335 expected_interval = min(expected_interval, data->next_timer_us);
96e95182e [email protected] 2014-02-24 336
dc2251bf9 Rafael J. Wysocki 2017-08-23 337 first_idx = 0;
dc2251bf9 Rafael J. Wysocki 2017-08-23 338 if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
dc2251bf9 Rafael J. Wysocki 2017-08-23 339 struct cpuidle_state *s = &drv->states[1];
0c313cb20 Rafael J. Wysocki 2016-03-20 340 unsigned int polling_threshold;
0c313cb20 Rafael J. Wysocki 2016-03-20 341
96e95182e [email protected] 2014-02-24 342 /*
69d25870f Arjan van de Ven 2009-09-21 343 * We want to default to C1 (hlt), not to busy polling
e132b9b3b Rik van Riel 2016-03-16 344 * unless the timer is happening really really soon, or
e132b9b3b Rik van Riel 2016-03-16 345 * C1's exit latency exceeds the user configured limit.
69d25870f Arjan van de Ven 2009-09-21 346 */
0c313cb20 Rafael J. Wysocki 2016-03-20 347 polling_threshold = max_t(unsigned int, 20, s->target_residency);
0c313cb20 Rafael J. Wysocki 2016-03-20 348 if (data->next_timer_us > polling_threshold &&
0c313cb20 Rafael J. Wysocki 2016-03-20 349 latency_req > s->exit_latency && !s->disabled &&
dc2251bf9 Rafael J. Wysocki 2017-08-23 350 !dev->states_usage[1].disable)
dc2251bf9 Rafael J. Wysocki 2017-08-23 351 first_idx = 1;
9c4b2867e Rafael J. Wysocki 2016-01-14 352 }
4f86d3a8e Len Brown 2007-10-03 353
71abbbf85 Ai Li 2010-08-09 354 /*
e132b9b3b Rik van Riel 2016-03-16 355 * Use the lowest expected idle interval to pick the idle state.
e132b9b3b Rik van Riel 2016-03-16 356 */
e132b9b3b Rik van Riel 2016-03-16 357 data->predicted_us = min(data->predicted_us, expected_interval);
e132b9b3b Rik van Riel 2016-03-16 358
e132b9b3b Rik van Riel 2016-03-16 359 /*
5f9f09809 Rafael J. Wysocki 2018-08-10 360 * Use the performance multiplier and the user-configurable latency_req
5f9f09809 Rafael J. Wysocki 2018-08-10 361 * to determine the maximum exit latency.
e132b9b3b Rik van Riel 2016-03-16 362 */
e132b9b3b Rik van Riel 2016-03-16 363 interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
e132b9b3b Rik van Riel 2016-03-16 364 if (latency_req > interactivity_req)
e132b9b3b Rik van Riel 2016-03-16 365 latency_req = interactivity_req;
e132b9b3b Rik van Riel 2016-03-16 366
5f9f09809 Rafael J. Wysocki 2018-08-10 367 select:
45f1ff59e Rafael J. Wysocki 2018-03-22 368 expected_interval = data->predicted_us;
e132b9b3b Rik van Riel 2016-03-16 369 /*
71abbbf85 Ai Li 2010-08-09 370 * Find the idle state with the lowest power while satisfying
71abbbf85 Ai Li 2010-08-09 371 * our constraints.
71abbbf85 Ai Li 2010-08-09 372 */
3ed09c945 Nicholas Piggin 2017-06-26 373 idx = -1;
3ed09c945 Nicholas Piggin 2017-06-26 374 for (i = first_idx; i < drv->state_count; i++) {
46bcfad7a Deepthi Dharwar 2011-10-28 375 struct cpuidle_state *s = &drv->states[i];
dc7fd275a ShuoX Liu 2012-07-03 376 struct cpuidle_state_usage *su = &dev->states_usage[i];
4f86d3a8e Len Brown 2007-10-03 377
cbc9ef028 Rafael J. Wysocki 2012-07-03 378 if (s->disabled || su->disable)
3a53396b0 ShuoX Liu 2012-03-28 379 continue;
3ed09c945 Nicholas Piggin 2017-06-26 380 if (idx == -1)
3ed09c945 Nicholas Piggin 2017-06-26 381 idx = i; /* first enabled state */
148519120 Rafael J. Wysocki 2013-07-27 382 if (s->target_residency > data->predicted_us)
8e37e1a2a Alex Shi 2017-01-12 383 break;
45f1ff59e Rafael J. Wysocki 2018-03-22 384 if (s->exit_latency > latency_req) {
45f1ff59e Rafael J. Wysocki 2018-03-22 385 /*
45f1ff59e Rafael J. Wysocki 2018-03-22 386 * If we break out of the loop for latency reasons, use
45f1ff59e Rafael J. Wysocki 2018-03-22 387 * the target residency of the selected state as the
45f1ff59e Rafael J. Wysocki 2018-03-22 388 * expected idle duration so that the tick is retained
45f1ff59e Rafael J. Wysocki 2018-03-22 389 * as long as that target residency is low enough.
45f1ff59e Rafael J. Wysocki 2018-03-22 390 */
45f1ff59e Rafael J. Wysocki 2018-03-22 391 expected_interval = drv->states[idx].target_residency;
8e37e1a2a Alex Shi 2017-01-12 392 break;
45f1ff59e Rafael J. Wysocki 2018-03-22 393 }
3ed09c945 Nicholas Piggin 2017-06-26 394 idx = i;
71abbbf85 Ai Li 2010-08-09 395 }
4f86d3a8e Len Brown 2007-10-03 396
3ed09c945 Nicholas Piggin 2017-06-26 397 if (idx == -1)
3ed09c945 Nicholas Piggin 2017-06-26 398 idx = 0; /* No states enabled. Must use 0. */
3ed09c945 Nicholas Piggin 2017-06-26 399
45f1ff59e Rafael J. Wysocki 2018-03-22 400 /*
45f1ff59e Rafael J. Wysocki 2018-03-22 401 * Don't stop the tick if the selected state is a polling one or if the
45f1ff59e Rafael J. Wysocki 2018-03-22 402 * expected idle duration is shorter than the tick period length.
45f1ff59e Rafael J. Wysocki 2018-03-22 403 */
5f9f09809 Rafael J. Wysocki 2018-08-10 404 if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
5f9f09809 Rafael J. Wysocki 2018-08-10 405 expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
296bb1e51 Rafael J. Wysocki 2018-04-05 406 unsigned int delta_next_us = ktime_to_us(delta_next);
296bb1e51 Rafael J. Wysocki 2018-04-05 407
45f1ff59e Rafael J. Wysocki 2018-03-22 408 *stop_tick = false;
45f1ff59e Rafael J. Wysocki 2018-03-22 409
5f9f09809 Rafael J. Wysocki 2018-08-10 410 if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
296bb1e51 Rafael J. Wysocki 2018-04-05 411 /*
296bb1e51 Rafael J. Wysocki 2018-04-05 412 * The tick is not going to be stopped and the target
296bb1e51 Rafael J. Wysocki 2018-04-05 413 * residency of the state to be returned is not within
296bb1e51 Rafael J. Wysocki 2018-04-05 414 * the time until the next timer event including the
296bb1e51 Rafael J. Wysocki 2018-04-05 415 * tick, so try to correct that.
296bb1e51 Rafael J. Wysocki 2018-04-05 416 */
296bb1e51 Rafael J. Wysocki 2018-04-05 417 for (i = idx - 1; i >= 0; i--) {
296bb1e51 Rafael J. Wysocki 2018-04-05 418 if (drv->states[i].disabled ||
296bb1e51 Rafael J. Wysocki 2018-04-05 419 dev->states_usage[i].disable)
296bb1e51 Rafael J. Wysocki 2018-04-05 420 continue;
296bb1e51 Rafael J. Wysocki 2018-04-05 421
296bb1e51 Rafael J. Wysocki 2018-04-05 422 idx = i;
296bb1e51 Rafael J. Wysocki 2018-04-05 423 if (drv->states[i].target_residency <= delta_next_us)
296bb1e51 Rafael J. Wysocki 2018-04-05 424 break;
296bb1e51 Rafael J. Wysocki 2018-04-05 425 }
296bb1e51 Rafael J. Wysocki 2018-04-05 426 }
296bb1e51 Rafael J. Wysocki 2018-04-05 427 }
296bb1e51 Rafael J. Wysocki 2018-04-05 428
3ed09c945 Nicholas Piggin 2017-06-26 429 data->last_state_idx = idx;
3ed09c945 Nicholas Piggin 2017-06-26 430
69d25870f Arjan van de Ven 2009-09-21 431 return data->last_state_idx;
4f86d3a8e Len Brown 2007-10-03 432 }
4f86d3a8e Len Brown 2007-10-03 433
:::::: The code at line 289 was first introduced by commit
:::::: 3ed09c94580de9d5b18cc35d1f97e9f24cd9233b cpuidle: menu: allow state 0 to be disabled
:::::: TO: Nicholas Piggin <[email protected]>
:::::: CC: Rafael J. Wysocki <[email protected]>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Fri, Aug 10, 2018 at 2:31 PM <[email protected]> wrote:
>
> On Fri, Aug 10, 2018 at 01:04:22PM +0200, Rafael J. Wysocki wrote:
> > On Fri, Aug 10, 2018 at 11:20 AM <[email protected]> wrote:
> > >
> > > On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote:
> > > > From: Rafael J. Wysocki <[email protected]>
> > > > Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
> > > >
> > > > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> > > > with stopped tick) missed the case when the target residencies of
> > > > deep idle states of CPUs are above the tick boundary which may cause
> > > > the CPU to get stuck in a shallow idle state for a long time.
> > > >
> > > > Say there are two CPU idle states available: one shallow, with the
> > > > target residency much below the tick boundary and one deep, with
> > > > the target residency significantly above the tick boundary. In
> > > > that case, if the tick has been stopped already and the expected
> > > > next timer event is relatively far in the future, the governor will
> > > > assume the idle duration to be equal to TICK_USEC and it will select
> > > > the idle state for the CPU accordingly. However, that will cause the
> > > > shallow state to be selected even though it would have been more
> > > > energy-efficient to select the deep one.
> > > >
> > > > To address this issue, modify the governor to always assume idle
> > > > duration to be equal to the time till the closest timer event if
> > > > the tick is not running which will cause the selected idle states
> > > > to always match the known CPU wakeup time.
> > > >
> > > > Also make it always indicate that the tick should be stopped in
> > > > that case for consistency.
> > > >
> > > > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
> > > > Reported-by: Leo Yan <[email protected]>
> > > > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > > > ---
> > > >
> > > > -> v2: Initialize first_idx properly in the stopped tick case.
> > > >
> > > > ---
> > > > drivers/cpuidle/governors/menu.c | 55 +++++++++++++++++----------------------
> > > > 1 file changed, 25 insertions(+), 30 deletions(-)
> > > >
> > > > Index: linux-pm/drivers/cpuidle/governors/menu.c
> > > > ===================================================================
> > > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > > > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> > > > {
> > > > struct menu_device *data = this_cpu_ptr(&menu_devices);
> > > > int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > > > - int i;
> > > > - int first_idx;
> > > > - int idx;
> > > > + int first_idx = 0;
> > > > + int idx, i;
> > > > unsigned int interactivity_req;
> > > > unsigned int expected_interval;
> > > > unsigned long nr_iowaiters, cpu_load;
> > > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> > > > /* determine the expected residency time, round up */
> > > > data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
> > > >
> > > > + /*
> > > > + * If the tick is already stopped, the cost of possible short idle
> > > > + * duration misprediction is much higher, because the CPU may be stuck
> > > > + * in a shallow idle state for a long time as a result of it. In that
> > > > + * case say we might mispredict and use the known time till the closest
> > > > + * timer event for the idle state selection.
> > > > + */
> > > > + if (tick_nohz_tick_stopped()) {
> > > > + data->predicted_us = ktime_to_us(delta_next);
> > > > + goto select;
> > > > + }
> > > > +
> > >
> > > This introduce two potential issues:
> > >
> > > - This will totally ignore the typical pattern in idle loop; I
> > > observed on the mmc driver can trigger multiple times (> 10 times)
> > > with consistent interval;
> >
> > I'm not sure what you mean by "ignore".
>
> You could see after move code from blow to this position, the typical
> pattern interval will not be accounted; so if in the middle of idles
> there have a bunch of interrupts with fix pattern, the upper code
> cannot detect this pattern anymore.
I'm not really following you here.
The part of the code skipped for tick_nohz_tick_stopped() doesn't
update the data at all AFAICS. It only computes some values that
would be discarded later anyway, so I'm not sure what the point of
running that computation is.
The statistics are updated by menu_update() and that still runs and it
will take the actual wakeup events into account, won't it?
> [...]
>
> > > > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > > - expected_interval < TICK_USEC) {
> > > > + if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > > + expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
> > >
> > > I am not sure this logic is right... Why not use below checking, so
> > > for POLLING state we will never ask to stop the tick?
> > >
> > > if (drv->states[idx].flags & CPUIDLE_FLAG_POLLING ||
> > > (expected_interval < TICK_USEC && !tick_nohz_tick_stopped())) {
> > >
> >
> > The only effect of it would be setting stop_tick to false, but why
> > would that matter?
>
> Please consider below situation, not sure if this case is existed or
> not:
>
> step1: first time: enter one idle state with stopping tick;
> step2: second time: select POLLING state and tick_nohz_tick_stopped()
> is true;
>
> So in step2, it cannot set stop_tick to false with below sentence.
>
> > > > unsigned int delta_next_us = ktime_to_us(delta_next);
> > > >
> > > > *stop_tick = false;
But setting *stop_tick has no effect as far as the current code is
concerned (up to the bug fixed by the other patch).
Also the POLLING state can only be selected if there are no other
states matching delta_next available in that case which means that
there will be a timer to break the polling loop soon enough (and BTW
the polling has a built-in timeout too), so I don't really see a
problem here.
On Sun, Aug 12, 2018 at 12:07:45PM +0200, Rafael J. Wysocki wrote:
[...]
> > > > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > > > > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > > > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> > > > > {
> > > > > struct menu_device *data = this_cpu_ptr(&menu_devices);
> > > > > int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > > > > - int i;
> > > > > - int first_idx;
> > > > > - int idx;
> > > > > + int first_idx = 0;
> > > > > + int idx, i;
> > > > > unsigned int interactivity_req;
> > > > > unsigned int expected_interval;
> > > > > unsigned long nr_iowaiters, cpu_load;
> > > > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> > > > > /* determine the expected residency time, round up */
> > > > > data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
> > > > >
> > > > > + /*
> > > > > + * If the tick is already stopped, the cost of possible short idle
> > > > > + * duration misprediction is much higher, because the CPU may be stuck
> > > > > + * in a shallow idle state for a long time as a result of it. In that
> > > > > + * case say we might mispredict and use the known time till the closest
> > > > > + * timer event for the idle state selection.
> > > > > + */
> > > > > + if (tick_nohz_tick_stopped()) {
> > > > > + data->predicted_us = ktime_to_us(delta_next);
> > > > > + goto select;
> > > > > + }
> > > > > +
> > > >
> > > > This introduce two potential issues:
> > > >
> > > > - This will totally ignore the typical pattern in idle loop; I
> > > > observed on the mmc driver can trigger multiple times (> 10 times)
> > > > with consistent interval;
> > >
> > > I'm not sure what you mean by "ignore".
> >
> > You could see after move code from blow to this position, the typical
> > pattern interval will not be accounted; so if in the middle of idles
> > there have a bunch of interrupts with fix pattern, the upper code
> > cannot detect this pattern anymore.
>
> I'm not really following you here.
>
> The part of the code skipped for tick_nohz_tick_stopped() doesn't
> update the data at all AFAICS. It only computes some values that
> would be discarded later anyway, so I'm not sure what the point of
> running that computation is.
Sorry I don't explain clearly, so try to rephrase:
With your patch for the tick stopped case, it directly uses tick delta
value as prediction and goto 'select' tag. So it skips below code
pieces, these codes have minor improvement for typical pattern which
can be applied in the middle of idles, for example, the mmc driver
triggers 16 interrupts with ~1500us interval, these interrupts are all
handled within the idle loop, so the typical pattern can detect the mmc
interrupts pattern and it will help idle governor to select a shallower
idle state so can avoid to break the residency.
You mentioned these computed values would be discarded later, this is
true for most cases, but it isn't always true actually. Without your
patch, the governor will discard the computed values only when
'data->predicted_us < TICK_USEC', otherwise the interval pattern is
still be applied in the prediction.
expected_interval = get_typical_interval(data);
expected_interval = min(expected_interval, data->next_timer_us);
[...]
/*
* Use the lowest expected idle interval to pick the idle state.
*/
data->predicted_us = min(data->predicted_us, expected_interval);
> The statistics are updated by menu_update() and that still runs and it
> will take the actual wakeup events into account, won't it?
Yes.
> > [...]
> >
> > > > > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > > > - expected_interval < TICK_USEC) {
> > > > > + if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > > > + expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
> > > >
> > > > I am not sure this logic is right... Why not use below checking, so
> > > > for POLLING state we will never ask to stop the tick?
> > > >
> > > > if (drv->states[idx].flags & CPUIDLE_FLAG_POLLING ||
> > > > (expected_interval < TICK_USEC && !tick_nohz_tick_stopped())) {
> > > >
> > >
> > > The only effect of it would be setting stop_tick to false, but why
> > > would that matter?
> >
> > Please consider below situation, not sure if this case is existed or
> > not:
> >
> > step1: first time: enter one idle state with stopping tick;
> > step2: second time: select POLLING state and tick_nohz_tick_stopped()
> > is true;
> >
> > So in step2, it cannot set stop_tick to false with below sentence.
> >
> > > > > unsigned int delta_next_us = ktime_to_us(delta_next);
> > > > >
> > > > > *stop_tick = false;
>
> But setting *stop_tick has no effect as far as the current code is
> concerned (up to the bug fixed by the other patch).
Yeah.
> Also the POLLING state can only be selected if there are no other
> states matching delta_next available in that case which means that
> there will be a timer to break the polling loop soon enough (and BTW
> the polling has a built-in timeout too), so I don't really see a
> problem here.
Ah, now I understand the logic and I misunderstand the POLLING mode
before; now agree with this. Sorry for noise.
Thanks,
Leo Yan
On Fri, Aug 10, 2018 at 01:15:58PM +0200, Rafael J . Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> with stopped tick) missed the case when the target residencies of
> deep idle states of CPUs are above the tick boundary which may cause
> the CPU to get stuck in a shallow idle state for a long time.
>
> Say there are two CPU idle states available: one shallow, with the
> target residency much below the tick boundary and one deep, with
> the target residency significantly above the tick boundary. In
> that case, if the tick has been stopped already and the expected
> next timer event is relatively far in the future, the governor will
> assume the idle duration to be equal to TICK_USEC and it will select
> the idle state for the CPU accordingly. However, that will cause the
> shallow state to be selected even though it would have been more
> energy-efficient to select the deep one.
>
> To address this issue, modify the governor to always assume idle
> duration to be equal to the time till the closest timer event if
> the tick is not running which will cause the selected idle states
> to always match the known CPU wakeup time.
>
> Also make it always indicate that the tick should be stopped in
> that case for consistency.
>
> Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
> Reported-by: Leo Yan <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
>
> -> v2: Initialize first_idx properly in the stopped tick case.
>
> -> v3: Compute data->bucket before checking whether or not the tick has been
> stopped already to prevent it from becoming stale.
>
> ---
> drivers/cpuidle/governors/menu.c | 55 +++++++++++++++++----------------------
> 1 file changed, 25 insertions(+), 30 deletions(-)
>
> Index: linux-pm/drivers/cpuidle/governors/menu.c
> ===================================================================
> --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> +++ linux-pm/drivers/cpuidle/governors/menu.c
> @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> {
> struct menu_device *data = this_cpu_ptr(&menu_devices);
> int latency_req = cpuidle_governor_latency_req(dev->cpu);
> - int i;
> - int first_idx;
> - int idx;
> + int first_idx = 0;
> + int idx, i;
> unsigned int interactivity_req;
> unsigned int expected_interval;
> unsigned long nr_iowaiters, cpu_load;
> @@ -311,6 +310,18 @@ static int menu_select(struct cpuidle_dr
> data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
>
> /*
> + * If the tick is already stopped, the cost of possible short idle
> + * duration misprediction is much higher, because the CPU may be stuck
> + * in a shallow idle state for a long time as a result of it. In that
> + * case say we might mispredict and use the known time till the closest
> + * timer event for the idle state selection.
> + */
> + if (tick_nohz_tick_stopped()) {
> + data->predicted_us = ktime_to_us(delta_next);
> + goto select;
> + }
I tried this patch at my side, firstly just clarify this patch is okay
for me, but there have other underlying issues I observed the CPU
staying shallow idle state with tick stopped, so just note at here.
From my understanding, the rational for this patch is we
only use the timer event as the reliable wake up source; if there have
one short timer event then we can select shallow state, otherwise we
also can select deepest idle state for long expired timer.
This means the idle governor needs to know the reliable info for the
timer event, so far I observe there at least have two issues for timer
event delta value cannot be trusted.
The first one issue is caused by timer cancel, I wrote one case for
CPU_0 starting a hrtimer with pinned mode with short expire time and
when the CPU_0 goes to sleep this short timeout timer can let idle
governor selects a shallow state; at the meantime another CPU_1 will
be used to try to cancel the timer, my purpose is to cheat CPU_0 so can
see the CPU_0 staying in shallow state for long time; it has low
percentage to cancel the timer successfully, but I do see seldomly the
timer can be canceled successfully so CPU_0 will stay in idle for long
time (I cannot explain why the timer cannot be canceled successfully
for every time, this might be another issue?). This case is tricky,
but it's possible happen in drivers with timer cancel.
Another issue is caused by spurious interrupts; if we review the
function tick_nohz_get_sleep_length(), it uses 'ts->idle_entrytime' to
calculate tick or timer delta, so every time when exit from interrupt
and before enter idle governor, it needs to update
'ts->idle_entrytime'; but for spurious interrupts, it will not call
irq_enter() and irq_exit() pairs, so it doesn't invoke below flows:
irq_exit()
`->tick_irq_exit()
`->tick_nohz_irq_exit()
`->tick_nohz_start_idle()
As result, after spurious interrupts handling, the idle loop doesn't
update for ts->idle_entrytime so the governor might read back a stale
value. I don't really locate this issue, but I can see the CPU is
waken up without any interrupt handling and then directly go to
sleep again, the menu governor selects one shallow state so the cpu
stay in shallow state for long time.
> +
> + /*
> * Force the result of multiplication to be 64 bits even if both
> * operands are 32 bits.
> * Make sure to round up for half microseconds.
> @@ -322,7 +333,6 @@ static int menu_select(struct cpuidle_dr
> expected_interval = get_typical_interval(data);
> expected_interval = min(expected_interval, data->next_timer_us);
>
> - first_idx = 0;
> if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
> struct cpuidle_state *s = &drv->states[1];
> unsigned int polling_threshold;
> @@ -344,29 +354,15 @@ static int menu_select(struct cpuidle_dr
> */
> data->predicted_us = min(data->predicted_us, expected_interval);
>
> - if (tick_nohz_tick_stopped()) {
> - /*
> - * If the tick is already stopped, the cost of possible short
> - * idle duration misprediction is much higher, because the CPU
> - * may be stuck in a shallow idle state for a long time as a
> - * result of it. In that case say we might mispredict and try
> - * to force the CPU into a state for which we would have stopped
> - * the tick, unless a timer is going to expire really soon
> - * anyway.
> - */
> - if (data->predicted_us < TICK_USEC)
> - data->predicted_us = min_t(unsigned int, TICK_USEC,
> - ktime_to_us(delta_next));
> - } else {
> - /*
> - * Use the performance multiplier and the user-configurable
> - * latency_req to determine the maximum exit latency.
> - */
> - interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
> - if (latency_req > interactivity_req)
> - latency_req = interactivity_req;
> - }
> + /*
> + * Use the performance multiplier and the user-configurable latency_req
> + * to determine the maximum exit latency.
> + */
> + interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
> + if (latency_req > interactivity_req)
> + latency_req = interactivity_req;
>
> +select:
> expected_interval = data->predicted_us;
> /*
> * Find the idle state with the lowest power while satisfying
> @@ -403,14 +399,13 @@ static int menu_select(struct cpuidle_dr
> * Don't stop the tick if the selected state is a polling one or if the
> * expected idle duration is shorter than the tick period length.
> */
> - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> - expected_interval < TICK_USEC) {
> + if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> + expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
> unsigned int delta_next_us = ktime_to_us(delta_next);
>
> *stop_tick = false;
>
> - if (!tick_nohz_tick_stopped() && idx > 0 &&
> - drv->states[idx].target_residency > delta_next_us) {
> + if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
> /*
> * The tick is not going to be stopped and the target
> * residency of the state to be returned is not within
>
Hi Rafael,
I love your patch! Perhaps something to improve:
url: https://github.com/0day-ci/linux/commits/Rafael-J-Wysocki/cpuidle-menu-Handle-stopped-tick-more-aggressively/20180811-191914
base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
smatch warnings:
drivers/cpuidle/governors/menu.c:374 menu_select() error: uninitialized symbol 'first_idx'.
# https://github.com/0day-ci/linux/commit/5f9f09809ebd1b4f7820c9925a0cbd417bd3a823
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 5f9f09809ebd1b4f7820c9925a0cbd417bd3a823
vim +/first_idx +374 drivers/cpuidle/governors/menu.c
1f85f87d4 Arjan van de Ven 2010-05-24 276
4f86d3a8e Len Brown 2007-10-03 277 /**
4f86d3a8e Len Brown 2007-10-03 278 * menu_select - selects the next idle state to enter
46bcfad7a Deepthi Dharwar 2011-10-28 279 * @drv: cpuidle driver containing state data
4f86d3a8e Len Brown 2007-10-03 280 * @dev: the CPU
45f1ff59e Rafael J. Wysocki 2018-03-22 281 * @stop_tick: indication on whether or not to stop the tick
4f86d3a8e Len Brown 2007-10-03 282 */
45f1ff59e Rafael J. Wysocki 2018-03-22 283 static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
45f1ff59e Rafael J. Wysocki 2018-03-22 284 bool *stop_tick)
4f86d3a8e Len Brown 2007-10-03 285 {
229b6863b Christoph Lameter 2014-08-17 286 struct menu_device *data = this_cpu_ptr(&menu_devices);
0fc784fb0 Rafael J. Wysocki 2018-05-30 287 int latency_req = cpuidle_governor_latency_req(dev->cpu);
4f86d3a8e Len Brown 2007-10-03 288 int i;
3ed09c945 Nicholas Piggin 2017-06-26 289 int first_idx;
3ed09c945 Nicholas Piggin 2017-06-26 290 int idx;
96e95182e [email protected] 2014-02-24 291 unsigned int interactivity_req;
e132b9b3b Rik van Riel 2016-03-16 292 unsigned int expected_interval;
372ba8cb4 Mel Gorman 2014-08-06 293 unsigned long nr_iowaiters, cpu_load;
296bb1e51 Rafael J. Wysocki 2018-04-05 294 ktime_t delta_next;
4f86d3a8e Len Brown 2007-10-03 295
672917dcc Corrado Zoccolo 2009-09-21 296 if (data->needs_update) {
46bcfad7a Deepthi Dharwar 2011-10-28 297 menu_update(drv, dev);
672917dcc Corrado Zoccolo 2009-09-21 298 data->needs_update = 0;
672917dcc Corrado Zoccolo 2009-09-21 299 }
672917dcc Corrado Zoccolo 2009-09-21 300
69d25870f Arjan van de Ven 2009-09-21 301 /* Special case when user has set very strict latency requirement */
45f1ff59e Rafael J. Wysocki 2018-03-22 302 if (unlikely(latency_req == 0)) {
45f1ff59e Rafael J. Wysocki 2018-03-22 303 *stop_tick = false;
a2bd92023 [email protected] 2008-07-30 304 return 0;
45f1ff59e Rafael J. Wysocki 2018-03-22 305 }
a2bd92023 [email protected] 2008-07-30 306
69d25870f Arjan van de Ven 2009-09-21 307 /* determine the expected residency time, round up */
296bb1e51 Rafael J. Wysocki 2018-04-05 308 data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
69d25870f Arjan van de Ven 2009-09-21 309
5f9f09809 Rafael J. Wysocki 2018-08-10 310 /*
5f9f09809 Rafael J. Wysocki 2018-08-10 311 * If the tick is already stopped, the cost of possible short idle
5f9f09809 Rafael J. Wysocki 2018-08-10 312 * duration misprediction is much higher, because the CPU may be stuck
5f9f09809 Rafael J. Wysocki 2018-08-10 313 * in a shallow idle state for a long time as a result of it. In that
5f9f09809 Rafael J. Wysocki 2018-08-10 314 * case say we might mispredict and use the known time till the closest
5f9f09809 Rafael J. Wysocki 2018-08-10 315 * timer event for the idle state selection.
5f9f09809 Rafael J. Wysocki 2018-08-10 316 */
5f9f09809 Rafael J. Wysocki 2018-08-10 317 if (tick_nohz_tick_stopped()) {
5f9f09809 Rafael J. Wysocki 2018-08-10 318 data->predicted_us = ktime_to_us(delta_next);
5f9f09809 Rafael J. Wysocki 2018-08-10 319 goto select;
^^^^^^^^^^^
We hit this goto
5f9f09809 Rafael J. Wysocki 2018-08-10 320 }
5f9f09809 Rafael J. Wysocki 2018-08-10 321
372ba8cb4 Mel Gorman 2014-08-06 322 get_iowait_load(&nr_iowaiters, &cpu_load);
64b4ca5cb Mel Gorman 2014-08-06 323 data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
69d25870f Arjan van de Ven 2009-09-21 324
69d25870f Arjan van de Ven 2009-09-21 325 /*
51f245b89 Tuukka Tikkanen 2013-08-14 326 * Force the result of multiplication to be 64 bits even if both
51f245b89 Tuukka Tikkanen 2013-08-14 327 * operands are 32 bits.
51f245b89 Tuukka Tikkanen 2013-08-14 328 * Make sure to round up for half microseconds.
51f245b89 Tuukka Tikkanen 2013-08-14 329 */
ee3c86f35 Javi Merino 2015-04-16 330 data->predicted_us = DIV_ROUND_CLOSEST_ULL((uint64_t)data->next_timer_us *
51f245b89 Tuukka Tikkanen 2013-08-14 331 data->correction_factor[data->bucket],
69d25870f Arjan van de Ven 2009-09-21 332 RESOLUTION * DECAY);
69d25870f Arjan van de Ven 2009-09-21 333
e132b9b3b Rik van Riel 2016-03-16 334 expected_interval = get_typical_interval(data);
e132b9b3b Rik van Riel 2016-03-16 335 expected_interval = min(expected_interval, data->next_timer_us);
96e95182e [email protected] 2014-02-24 336
dc2251bf9 Rafael J. Wysocki 2017-08-23 337 first_idx = 0;
dc2251bf9 Rafael J. Wysocki 2017-08-23 338 if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
dc2251bf9 Rafael J. Wysocki 2017-08-23 339 struct cpuidle_state *s = &drv->states[1];
0c313cb20 Rafael J. Wysocki 2016-03-20 340 unsigned int polling_threshold;
0c313cb20 Rafael J. Wysocki 2016-03-20 341
96e95182e [email protected] 2014-02-24 342 /*
69d25870f Arjan van de Ven 2009-09-21 343 * We want to default to C1 (hlt), not to busy polling
e132b9b3b Rik van Riel 2016-03-16 344 * unless the timer is happening really really soon, or
e132b9b3b Rik van Riel 2016-03-16 345 * C1's exit latency exceeds the user configured limit.
69d25870f Arjan van de Ven 2009-09-21 346 */
0c313cb20 Rafael J. Wysocki 2016-03-20 347 polling_threshold = max_t(unsigned int, 20, s->target_residency);
0c313cb20 Rafael J. Wysocki 2016-03-20 348 if (data->next_timer_us > polling_threshold &&
0c313cb20 Rafael J. Wysocki 2016-03-20 349 latency_req > s->exit_latency && !s->disabled &&
dc2251bf9 Rafael J. Wysocki 2017-08-23 350 !dev->states_usage[1].disable)
dc2251bf9 Rafael J. Wysocki 2017-08-23 351 first_idx = 1;
9c4b2867e Rafael J. Wysocki 2016-01-14 352 }
4f86d3a8e Len Brown 2007-10-03 353
71abbbf85 Ai Li 2010-08-09 354 /*
e132b9b3b Rik van Riel 2016-03-16 355 * Use the lowest expected idle interval to pick the idle state.
e132b9b3b Rik van Riel 2016-03-16 356 */
e132b9b3b Rik van Riel 2016-03-16 357 data->predicted_us = min(data->predicted_us, expected_interval);
e132b9b3b Rik van Riel 2016-03-16 358
e132b9b3b Rik van Riel 2016-03-16 359 /*
5f9f09809 Rafael J. Wysocki 2018-08-10 360 * Use the performance multiplier and the user-configurable latency_req
5f9f09809 Rafael J. Wysocki 2018-08-10 361 * to determine the maximum exit latency.
e132b9b3b Rik van Riel 2016-03-16 362 */
e132b9b3b Rik van Riel 2016-03-16 363 interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
e132b9b3b Rik van Riel 2016-03-16 364 if (latency_req > interactivity_req)
e132b9b3b Rik van Riel 2016-03-16 365 latency_req = interactivity_req;
e132b9b3b Rik van Riel 2016-03-16 366
5f9f09809 Rafael J. Wysocki 2018-08-10 367 select:
45f1ff59e Rafael J. Wysocki 2018-03-22 368 expected_interval = data->predicted_us;
e132b9b3b Rik van Riel 2016-03-16 369 /*
71abbbf85 Ai Li 2010-08-09 370 * Find the idle state with the lowest power while satisfying
71abbbf85 Ai Li 2010-08-09 371 * our constraints.
71abbbf85 Ai Li 2010-08-09 372 */
3ed09c945 Nicholas Piggin 2017-06-26 373 idx = -1;
3ed09c945 Nicholas Piggin 2017-06-26 @374 for (i = first_idx; i < drv->state_count; i++) {
^^^^^^^^^^^^^^
This is uninitialized
46bcfad7a Deepthi Dharwar 2011-10-28 375 struct cpuidle_state *s = &drv->states[i];
dc7fd275a ShuoX Liu 2012-07-03 376 struct cpuidle_state_usage *su = &dev->states_usage[i];
4f86d3a8e Len Brown 2007-10-03 377
cbc9ef028 Rafael J. Wysocki 2012-07-03 378 if (s->disabled || su->disable)
3a53396b0 ShuoX Liu 2012-03-28 379 continue;
3ed09c945 Nicholas Piggin 2017-06-26 380 if (idx == -1)
3ed09c945 Nicholas Piggin 2017-06-26 381 idx = i; /* first enabled state */
148519120 Rafael J. Wysocki 2013-07-27 382 if (s->target_residency > data->predicted_us)
8e37e1a2a Alex Shi 2017-01-12 383 break;
45f1ff59e Rafael J. Wysocki 2018-03-22 384 if (s->exit_latency > latency_req) {
45f1ff59e Rafael J. Wysocki 2018-03-22 385 /*
45f1ff59e Rafael J. Wysocki 2018-03-22 386 * If we break out of the loop for latency reasons, use
45f1ff59e Rafael J. Wysocki 2018-03-22 387 * the target residency of the selected state as the
45f1ff59e Rafael J. Wysocki 2018-03-22 388 * expected idle duration so that the tick is retained
45f1ff59e Rafael J. Wysocki 2018-03-22 389 * as long as that target residency is low enough.
45f1ff59e Rafael J. Wysocki 2018-03-22 390 */
45f1ff59e Rafael J. Wysocki 2018-03-22 391 expected_interval = drv->states[idx].target_residency;
8e37e1a2a Alex Shi 2017-01-12 392 break;
45f1ff59e Rafael J. Wysocki 2018-03-22 393 }
3ed09c945 Nicholas Piggin 2017-06-26 394 idx = i;
71abbbf85 Ai Li 2010-08-09 395 }
4f86d3a8e Len Brown 2007-10-03 396
3ed09c945 Nicholas Piggin 2017-06-26 397 if (idx == -1)
3ed09c945 Nicholas Piggin 2017-06-26 398 idx = 0; /* No states enabled. Must use 0. */
3ed09c945 Nicholas Piggin 2017-06-26 399
45f1ff59e Rafael J. Wysocki 2018-03-22 400 /*
45f1ff59e Rafael J. Wysocki 2018-03-22 401 * Don't stop the tick if the selected state is a polling one or if the
45f1ff59e Rafael J. Wysocki 2018-03-22 402 * expected idle duration is shorter than the tick period length.
45f1ff59e Rafael J. Wysocki 2018-03-22 403 */
5f9f09809 Rafael J. Wysocki 2018-08-10 404 if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
5f9f09809 Rafael J. Wysocki 2018-08-10 405 expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
296bb1e51 Rafael J. Wysocki 2018-04-05 406 unsigned int delta_next_us = ktime_to_us(delta_next);
296bb1e51 Rafael J. Wysocki 2018-04-05 407
45f1ff59e Rafael J. Wysocki 2018-03-22 408 *stop_tick = false;
45f1ff59e Rafael J. Wysocki 2018-03-22 409
5f9f09809 Rafael J. Wysocki 2018-08-10 410 if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
296bb1e51 Rafael J. Wysocki 2018-04-05 411 /*
296bb1e51 Rafael J. Wysocki 2018-04-05 412 * The tick is not going to be stopped and the target
296bb1e51 Rafael J. Wysocki 2018-04-05 413 * residency of the state to be returned is not within
296bb1e51 Rafael J. Wysocki 2018-04-05 414 * the time until the next timer event including the
296bb1e51 Rafael J. Wysocki 2018-04-05 415 * tick, so try to correct that.
296bb1e51 Rafael J. Wysocki 2018-04-05 416 */
296bb1e51 Rafael J. Wysocki 2018-04-05 417 for (i = idx - 1; i >= 0; i--) {
296bb1e51 Rafael J. Wysocki 2018-04-05 418 if (drv->states[i].disabled ||
296bb1e51 Rafael J. Wysocki 2018-04-05 419 dev->states_usage[i].disable)
296bb1e51 Rafael J. Wysocki 2018-04-05 420 continue;
296bb1e51 Rafael J. Wysocki 2018-04-05 421
296bb1e51 Rafael J. Wysocki 2018-04-05 422 idx = i;
296bb1e51 Rafael J. Wysocki 2018-04-05 423 if (drv->states[i].target_residency <= delta_next_us)
296bb1e51 Rafael J. Wysocki 2018-04-05 424 break;
296bb1e51 Rafael J. Wysocki 2018-04-05 425 }
296bb1e51 Rafael J. Wysocki 2018-04-05 426 }
296bb1e51 Rafael J. Wysocki 2018-04-05 427 }
296bb1e51 Rafael J. Wysocki 2018-04-05 428
3ed09c945 Nicholas Piggin 2017-06-26 429 data->last_state_idx = idx;
3ed09c945 Nicholas Piggin 2017-06-26 430
69d25870f Arjan van de Ven 2009-09-21 431 return data->last_state_idx;
4f86d3a8e Len Brown 2007-10-03 432 }
4f86d3a8e Len Brown 2007-10-03 433
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Sun, Aug 12, 2018 at 4:55 PM <[email protected]> wrote:
>
> On Fri, Aug 10, 2018 at 01:15:58PM +0200, Rafael J . Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
[cut]
>
> I tried this patch at my side, firstly just clarify this patch is okay
> for me, but there have other underlying issues I observed the CPU
> staying shallow idle state with tick stopped, so just note at here.
Thanks for testing!
> From my understanding, the rational for this patch is we
> only use the timer event as the reliable wake up source; if there have
> one short timer event then we can select shallow state, otherwise we
> also can select deepest idle state for long expired timer.
>
> This means the idle governor needs to know the reliable info for the
> timer event, so far I observe there at least have two issues for timer
> event delta value cannot be trusted.
>
> The first one issue is caused by timer cancel, I wrote one case for
> CPU_0 starting a hrtimer with pinned mode with short expire time and
> when the CPU_0 goes to sleep this short timeout timer can let idle
> governor selects a shallow state; at the meantime another CPU_1 will
> be used to try to cancel the timer, my purpose is to cheat CPU_0 so can
> see the CPU_0 staying in shallow state for long time; it has low
> percentage to cancel the timer successfully, but I do see seldomly the
> timer can be canceled successfully so CPU_0 will stay in idle for long
> time (I cannot explain why the timer cannot be canceled successfully
> for every time, this might be another issue?). This case is tricky,
> but it's possible happen in drivers with timer cancel.
Yes, it can potentially happen, but I'm not worried about it. If it
happens, that will only be occasionally and without measurable effect
on total energy usage of the system.
> Another issue is caused by spurious interrupts; if we review the
> function tick_nohz_get_sleep_length(), it uses 'ts->idle_entrytime' to
> calculate tick or timer delta, so every time when exit from interrupt
> and before enter idle governor, it needs to update
> 'ts->idle_entrytime'; but for spurious interrupts, it will not call
> irq_enter() and irq_exit() pairs, so it doesn't invoke below flows:
>
> irq_exit()
> `->tick_irq_exit()
> `->tick_nohz_irq_exit()
> `->tick_nohz_start_idle()
>
> As result, after spurious interrupts handling, the idle loop doesn't
> update for ts->idle_entrytime so the governor might read back a stale
> value. I don't really locate this issue, but I can see the CPU is
> waken up without any interrupt handling and then directly go to
> sleep again, the menu governor selects one shallow state so the cpu
> stay in shallow state for long time.
This sounds buggy, but again, spurious interrupts are not expected to
occur too often and if they do, they are a serious enough issue by
themselves.
On Sun, Aug 12, 2018 at 3:44 PM <[email protected]> wrote:
>
> On Sun, Aug 12, 2018 at 12:07:45PM +0200, Rafael J. Wysocki wrote:
>
> [...]
>
> > > > > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > > > > > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > > > > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> > > > > > {
> > > > > > struct menu_device *data = this_cpu_ptr(&menu_devices);
> > > > > > int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > > > > > - int i;
> > > > > > - int first_idx;
> > > > > > - int idx;
> > > > > > + int first_idx = 0;
> > > > > > + int idx, i;
> > > > > > unsigned int interactivity_req;
> > > > > > unsigned int expected_interval;
> > > > > > unsigned long nr_iowaiters, cpu_load;
> > > > > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> > > > > > /* determine the expected residency time, round up */
> > > > > > data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
> > > > > >
> > > > > > + /*
> > > > > > + * If the tick is already stopped, the cost of possible short idle
> > > > > > + * duration misprediction is much higher, because the CPU may be stuck
> > > > > > + * in a shallow idle state for a long time as a result of it. In that
> > > > > > + * case say we might mispredict and use the known time till the closest
> > > > > > + * timer event for the idle state selection.
> > > > > > + */
> > > > > > + if (tick_nohz_tick_stopped()) {
> > > > > > + data->predicted_us = ktime_to_us(delta_next);
> > > > > > + goto select;
> > > > > > + }
> > > > > > +
> > > > >
> > > > > This introduce two potential issues:
> > > > >
> > > > > - This will totally ignore the typical pattern in idle loop; I
> > > > > observed on the mmc driver can trigger multiple times (> 10 times)
> > > > > with consistent interval;
> > > >
> > > > I'm not sure what you mean by "ignore".
> > >
> > > You could see after move code from blow to this position, the typical
> > > pattern interval will not be accounted; so if in the middle of idles
> > > there have a bunch of interrupts with fix pattern, the upper code
> > > cannot detect this pattern anymore.
> >
> > I'm not really following you here.
> >
> > The part of the code skipped for tick_nohz_tick_stopped() doesn't
> > update the data at all AFAICS. It only computes some values that
> > would be discarded later anyway, so I'm not sure what the point of
> > running that computation is.
>
> Sorry I don't explain clearly, so try to rephrase:
>
> With your patch for the tick stopped case, it directly uses tick delta
> value as prediction and goto 'select' tag. So it skips below code
> pieces, these codes have minor improvement for typical pattern which
> can be applied in the middle of idles, for example, the mmc driver
> triggers 16 interrupts with ~1500us interval, these interrupts are all
> handled within the idle loop, so the typical pattern can detect the mmc
> interrupts pattern and it will help idle governor to select a shallower
> idle state so can avoid to break the residency.
>
> You mentioned these computed values would be discarded later, this is
> true for most cases, but it isn't always true actually. Without your
> patch, the governor will discard the computed values only when
> 'data->predicted_us < TICK_USEC', otherwise the interval pattern is
> still be applied in the prediction.
OK, right.
I'll fix that up in v4, thanks!
From: Rafael J. Wysocki <[email protected]>
Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
with stopped tick) missed the case when the target residencies of
deep idle states of CPUs are above the tick boundary which may cause
the CPU to get stuck in a shallow idle state for a long time.
Say there are two CPU idle states available: one shallow, with the
target residency much below the tick boundary and one deep, with
the target residency significantly above the tick boundary. In
that case, if the tick has been stopped already and the expected
next timer event is relatively far in the future, the governor will
assume the idle duration to be equal to TICK_USEC and it will select
the idle state for the CPU accordingly. However, that will cause the
shallow state to be selected even though it would have been more
energy-efficient to select the deep one.
To address this issue, modify the governor to always use the time
till the closest timer event instead of the predicted idle duration
if the latter is less than the tick period length and the tick has
been stopped already. Also make it extend the search for a matching
idle state if the tick is stopped to avoid settling on a shallow
state if deep states with target residencies above the tick period
length are available.
In addition, make it always indicate that the tick should be stopped
if it has been stopped already for consistency.
Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
Reported-by: Leo Yan <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
-> v2: Initialize first_idx properly in the stopped tick case.
v2 -> v3: Compute data->bucket before checking whether or not the tick has been
stopped already to prevent it from becoming stale.
v3 -> v4: Allow the usual state selection to be carried out if the tick has been
stopped in case the predicted idle duration is greater than the tick
period length and a matching state can be found without overriding
the prediction result.
---
drivers/cpuidle/governors/menu.c | 33 +++++++++++++++++++++------------
1 file changed, 21 insertions(+), 12 deletions(-)
Index: linux-pm/drivers/cpuidle/governors/menu.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/menu.c
+++ linux-pm/drivers/cpuidle/governors/menu.c
@@ -349,14 +349,12 @@ static int menu_select(struct cpuidle_dr
* If the tick is already stopped, the cost of possible short
* idle duration misprediction is much higher, because the CPU
* may be stuck in a shallow idle state for a long time as a
- * result of it. In that case say we might mispredict and try
- * to force the CPU into a state for which we would have stopped
- * the tick, unless a timer is going to expire really soon
- * anyway.
+ * result of it. In that case say we might mispredict and use
+ * the known time till the closest timer event for the idle
+ * state selection.
*/
if (data->predicted_us < TICK_USEC)
- data->predicted_us = min_t(unsigned int, TICK_USEC,
- ktime_to_us(delta_next));
+ data->predicted_us = ktime_to_us(delta_next);
} else {
/*
* Use the performance multiplier and the user-configurable
@@ -381,8 +379,20 @@ static int menu_select(struct cpuidle_dr
continue;
if (idx == -1)
idx = i; /* first enabled state */
- if (s->target_residency > data->predicted_us)
- break;
+ if (s->target_residency > data->predicted_us) {
+ if (!tick_nohz_tick_stopped() ||
+ drv->states[idx].target_residency >= TICK_USEC)
+ break;
+
+ /*
+ * If the state selected so far is shallow and the time
+ * till the closest timer event matches this state's
+ * target residency, select this one to avoid getting
+ * stuck in the shallow one for too long.
+ */
+ if (s->target_residency > ktime_to_us(delta_next))
+ break;
+ }
if (s->exit_latency > latency_req) {
/*
* If we break out of the loop for latency reasons, use
@@ -403,14 +413,13 @@ static int menu_select(struct cpuidle_dr
* Don't stop the tick if the selected state is a polling one or if the
* expected idle duration is shorter than the tick period length.
*/
- if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
- expected_interval < TICK_USEC) {
+ if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
+ expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
unsigned int delta_next_us = ktime_to_us(delta_next);
*stop_tick = false;
- if (!tick_nohz_tick_stopped() && idx > 0 &&
- drv->states[idx].target_residency > delta_next_us) {
+ if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
/*
* The tick is not going to be stopped and the target
* residency of the state to be returned is not within
From: Rafael J. Wysocki <[email protected]>
Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
with stopped tick) missed the case when the target residencies of
deep idle states of CPUs are above the tick boundary which may cause
the CPU to get stuck in a shallow idle state for a long time.
Say there are two CPU idle states available: one shallow, with the
target residency much below the tick boundary and one deep, with
the target residency significantly above the tick boundary. In
that case, if the tick has been stopped already and the expected
next timer event is relatively far in the future, the governor will
assume the idle duration to be equal to TICK_USEC and it will select
the idle state for the CPU accordingly. However, that will cause the
shallow state to be selected even though it would have been more
energy-efficient to select the deep one.
To address this issue, modify the governor to always use the time
till the closest timer event instead of the predicted idle duration
if the latter is less than the tick period length and the tick has
been stopped already. Also make it extend the search for a matching
idle state if the tick is stopped to avoid settling on a shallow
state if deep states with target residencies above the tick period
length are available.
In addition, make it always indicate that the tick should be stopped
if it has been stopped already for consistency.
Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
Reported-by: Leo Yan <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
-> v2: Initialize first_idx properly in the stopped tick case.
v2 -> v3: Compute data->bucket before checking whether or not the tick has been
stopped already to prevent it from becoming stale.
v3 -> v4: Allow the usual state selection to be carried out if the tick has been
stopped in case the predicted idle duration is greater than the tick
period length and a matching state can be found without overriding
the prediction result.
v4 -> v5: Rework code to be more straightforward. Functionally, it should
behave like the v4.
---
drivers/cpuidle/governors/menu.c | 36 ++++++++++++++++++++++++------------
1 file changed, 24 insertions(+), 12 deletions(-)
Index: linux-pm/drivers/cpuidle/governors/menu.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/menu.c
+++ linux-pm/drivers/cpuidle/governors/menu.c
@@ -349,14 +349,12 @@ static int menu_select(struct cpuidle_dr
* If the tick is already stopped, the cost of possible short
* idle duration misprediction is much higher, because the CPU
* may be stuck in a shallow idle state for a long time as a
- * result of it. In that case say we might mispredict and try
- * to force the CPU into a state for which we would have stopped
- * the tick, unless a timer is going to expire really soon
- * anyway.
+ * result of it. In that case say we might mispredict and use
+ * the known time till the closest timer event for the idle
+ * state selection.
*/
if (data->predicted_us < TICK_USEC)
- data->predicted_us = min_t(unsigned int, TICK_USEC,
- ktime_to_us(delta_next));
+ data->predicted_us = ktime_to_us(delta_next);
} else {
/*
* Use the performance multiplier and the user-configurable
@@ -381,8 +379,22 @@ static int menu_select(struct cpuidle_dr
continue;
if (idx == -1)
idx = i; /* first enabled state */
- if (s->target_residency > data->predicted_us)
- break;
+ if (s->target_residency > data->predicted_us) {
+ if (!tick_nohz_tick_stopped())
+ break;
+
+ /*
+ * If the state selected so far is shallow and this
+ * state's target residency matches the time till the
+ * closest timer event, select this one to avoid getting
+ * stuck in the shallow one for too long.
+ */
+ if (drv->states[idx].target_residency < TICK_USEC &&
+ s->target_residency <= ktime_to_us(delta_next))
+ idx = i;
+
+ goto out;
+ }
if (s->exit_latency > latency_req) {
/*
* If we break out of the loop for latency reasons, use
@@ -403,14 +415,13 @@ static int menu_select(struct cpuidle_dr
* Don't stop the tick if the selected state is a polling one or if the
* expected idle duration is shorter than the tick period length.
*/
- if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
- expected_interval < TICK_USEC) {
+ if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
+ expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
unsigned int delta_next_us = ktime_to_us(delta_next);
*stop_tick = false;
- if (!tick_nohz_tick_stopped() && idx > 0 &&
- drv->states[idx].target_residency > delta_next_us) {
+ if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
/*
* The tick is not going to be stopped and the target
* residency of the state to be returned is not within
@@ -429,6 +440,7 @@ static int menu_select(struct cpuidle_dr
}
}
+out:
data->last_state_idx = idx;
return data->last_state_idx;
On Tue, Aug 14, 2018 at 12:34:40PM +0200, Rafael J . Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> with stopped tick) missed the case when the target residencies of
> deep idle states of CPUs are above the tick boundary which may cause
> the CPU to get stuck in a shallow idle state for a long time.
>
> Say there are two CPU idle states available: one shallow, with the
> target residency much below the tick boundary and one deep, with
> the target residency significantly above the tick boundary. In
> that case, if the tick has been stopped already and the expected
> next timer event is relatively far in the future, the governor will
> assume the idle duration to be equal to TICK_USEC and it will select
> the idle state for the CPU accordingly. However, that will cause the
> shallow state to be selected even though it would have been more
> energy-efficient to select the deep one.
>
> To address this issue, modify the governor to always use the time
> till the closest timer event instead of the predicted idle duration
> if the latter is less than the tick period length and the tick has
> been stopped already. Also make it extend the search for a matching
> idle state if the tick is stopped to avoid settling on a shallow
> state if deep states with target residencies above the tick period
> length are available.
>
> In addition, make it always indicate that the tick should be stopped
> if it has been stopped already for consistency.
>
> Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
> Reported-by: Leo Yan <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> -> v2: Initialize first_idx properly in the stopped tick case.
>
> v2 -> v3: Compute data->bucket before checking whether or not the tick has been
> stopped already to prevent it from becoming stale.
>
> v3 -> v4: Allow the usual state selection to be carried out if the tick has been
> stopped in case the predicted idle duration is greater than the tick
> period length and a matching state can be found without overriding
> the prediction result.
>
> v4 -> v5: Rework code to be more straightforward. Functionally, it should
> behave like the v4.
> ---
> drivers/cpuidle/governors/menu.c | 36 ++++++++++++++++++++++++------------
> 1 file changed, 24 insertions(+), 12 deletions(-)
>
> Index: linux-pm/drivers/cpuidle/governors/menu.c
> ===================================================================
> --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> +++ linux-pm/drivers/cpuidle/governors/menu.c
> @@ -349,14 +349,12 @@ static int menu_select(struct cpuidle_dr
> * If the tick is already stopped, the cost of possible short
> * idle duration misprediction is much higher, because the CPU
> * may be stuck in a shallow idle state for a long time as a
> - * result of it. In that case say we might mispredict and try
> - * to force the CPU into a state for which we would have stopped
> - * the tick, unless a timer is going to expire really soon
> - * anyway.
> + * result of it. In that case say we might mispredict and use
> + * the known time till the closest timer event for the idle
> + * state selection.
> */
> if (data->predicted_us < TICK_USEC)
> - data->predicted_us = min_t(unsigned int, TICK_USEC,
> - ktime_to_us(delta_next));
> + data->predicted_us = ktime_to_us(delta_next);
> } else {
> /*
> * Use the performance multiplier and the user-configurable
> @@ -381,8 +379,22 @@ static int menu_select(struct cpuidle_dr
> continue;
> if (idx == -1)
> idx = i; /* first enabled state */
> - if (s->target_residency > data->predicted_us)
> - break;
> + if (s->target_residency > data->predicted_us) {
> + if (!tick_nohz_tick_stopped())
> + break;
> +
> + /*
> + * If the state selected so far is shallow and this
> + * state's target residency matches the time till the
> + * closest timer event, select this one to avoid getting
> + * stuck in the shallow one for too long.
> + */
> + if (drv->states[idx].target_residency < TICK_USEC &&
> + s->target_residency <= ktime_to_us(delta_next))
> + idx = i;
I took some time to understand v4 and v5; now I understand you want to
prompt to deeper idle state if the prediction falls in the range
between [TICK_USEC..max_target_residency). This seems to me is a
huristics method but not general enough and IMHO this is more difficult
for code readable.
I agree this patch can resolve the issue you mentioned in the commit
log, but this patch will be fragile for below case, so it will select
state1 but not state2, how about you think for this case?
Idle_state0::
target_residency TICK_USEC Prediction
| | /
V V /
-----------------------------------------------------> Time length
^ ^
| |
Idle_state1: Idle_state2:
target_residency target_residency
> + goto out;
> + }
> if (s->exit_latency > latency_req) {
> /*
> * If we break out of the loop for latency reasons, use
> @@ -403,14 +415,13 @@ static int menu_select(struct cpuidle_dr
> * Don't stop the tick if the selected state is a polling one or if the
> * expected idle duration is shorter than the tick period length.
> */
> - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> - expected_interval < TICK_USEC) {
> + if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> + expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
> unsigned int delta_next_us = ktime_to_us(delta_next);
>
> *stop_tick = false;
>
> - if (!tick_nohz_tick_stopped() && idx > 0 &&
> - drv->states[idx].target_residency > delta_next_us) {
> + if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
> /*
> * The tick is not going to be stopped and the target
> * residency of the state to be returned is not within
> @@ -429,6 +440,7 @@ static int menu_select(struct cpuidle_dr
> }
> }
>
> +out:
> data->last_state_idx = idx;
>
> return data->last_state_idx;
>
On Tue, Aug 14, 2018 at 5:44 PM <[email protected]> wrote:
>
> On Tue, Aug 14, 2018 at 12:34:40PM +0200, Rafael J . Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> > with stopped tick) missed the case when the target residencies of
> > deep idle states of CPUs are above the tick boundary which may cause
> > the CPU to get stuck in a shallow idle state for a long time.
> >
> > Say there are two CPU idle states available: one shallow, with the
> > target residency much below the tick boundary and one deep, with
> > the target residency significantly above the tick boundary. In
> > that case, if the tick has been stopped already and the expected
> > next timer event is relatively far in the future, the governor will
> > assume the idle duration to be equal to TICK_USEC and it will select
> > the idle state for the CPU accordingly. However, that will cause the
> > shallow state to be selected even though it would have been more
> > energy-efficient to select the deep one.
> >
> > To address this issue, modify the governor to always use the time
> > till the closest timer event instead of the predicted idle duration
> > if the latter is less than the tick period length and the tick has
> > been stopped already. Also make it extend the search for a matching
> > idle state if the tick is stopped to avoid settling on a shallow
> > state if deep states with target residencies above the tick period
> > length are available.
> >
> > In addition, make it always indicate that the tick should be stopped
> > if it has been stopped already for consistency.
> >
> > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
> > Reported-by: Leo Yan <[email protected]>
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> > -> v2: Initialize first_idx properly in the stopped tick case.
> >
> > v2 -> v3: Compute data->bucket before checking whether or not the tick has been
> > stopped already to prevent it from becoming stale.
> >
> > v3 -> v4: Allow the usual state selection to be carried out if the tick has been
> > stopped in case the predicted idle duration is greater than the tick
> > period length and a matching state can be found without overriding
> > the prediction result.
> >
> > v4 -> v5: Rework code to be more straightforward. Functionally, it should
> > behave like the v4.
> > ---
> > drivers/cpuidle/governors/menu.c | 36 ++++++++++++++++++++++++------------
> > 1 file changed, 24 insertions(+), 12 deletions(-)
> >
> > Index: linux-pm/drivers/cpuidle/governors/menu.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > @@ -349,14 +349,12 @@ static int menu_select(struct cpuidle_dr
> > * If the tick is already stopped, the cost of possible short
> > * idle duration misprediction is much higher, because the CPU
> > * may be stuck in a shallow idle state for a long time as a
> > - * result of it. In that case say we might mispredict and try
> > - * to force the CPU into a state for which we would have stopped
> > - * the tick, unless a timer is going to expire really soon
> > - * anyway.
> > + * result of it. In that case say we might mispredict and use
> > + * the known time till the closest timer event for the idle
> > + * state selection.
> > */
> > if (data->predicted_us < TICK_USEC)
> > - data->predicted_us = min_t(unsigned int, TICK_USEC,
> > - ktime_to_us(delta_next));
> > + data->predicted_us = ktime_to_us(delta_next);
> > } else {
> > /*
> > * Use the performance multiplier and the user-configurable
> > @@ -381,8 +379,22 @@ static int menu_select(struct cpuidle_dr
> > continue;
> > if (idx == -1)
> > idx = i; /* first enabled state */
> > - if (s->target_residency > data->predicted_us)
> > - break;
> > + if (s->target_residency > data->predicted_us) {
> > + if (!tick_nohz_tick_stopped())
> > + break;
> > +
> > + /*
> > + * If the state selected so far is shallow and this
> > + * state's target residency matches the time till the
> > + * closest timer event, select this one to avoid getting
> > + * stuck in the shallow one for too long.
> > + */
> > + if (drv->states[idx].target_residency < TICK_USEC &&
> > + s->target_residency <= ktime_to_us(delta_next))
> > + idx = i;
>
> I took some time to understand v4 and v5; now I understand you want to
> prompt to deeper idle state if the prediction falls in the range
> between [TICK_USEC..max_target_residency). This seems to me is a
> huristics method but not general enough and IMHO this is more difficult
> for code readable.
>
> I agree this patch can resolve the issue you mentioned in the commit
> log, but this patch will be fragile for below case, so it will select
> state1 but not state2, how about you think for this case?
It is OK to select state1 in this case IMO, because if the tick had
not been stopped already, it would have been stopped after selecting
state1 anyway (it is not a "polling" state and expected_interval is
not below TICK_USEC).
> Idle_state0::
> target_residency TICK_USEC Prediction
> | | /
> V V /
> -----------------------------------------------------> Time length
> ^ ^
> | |
> Idle_state1: Idle_state2:
> target_residency target_residency
>
Basically, the idea is to avoid getting stuck in the idle states for
which the tick would not have been stopped, had it been still running.
On Sun, Aug 12, 2018 at 10:55:15PM +0800, [email protected] wrote:
> The first one issue is caused by timer cancel, I wrote one case for
> CPU_0 starting a hrtimer with pinned mode with short expire time and
> when the CPU_0 goes to sleep this short timeout timer can let idle
> governor selects a shallow state; at the meantime another CPU_1 will
> be used to try to cancel the timer, my purpose is to cheat CPU_0 so can
> see the CPU_0 staying in shallow state for long time; it has low
> percentage to cancel the timer successfully, but I do see seldomly the
> timer can be canceled successfully so CPU_0 will stay in idle for long
> time (I cannot explain why the timer cannot be canceled successfully
> for every time, this might be another issue?). This case is tricky,
> but it's possible happen in drivers with timer cancel.
So this is really difficuly to make hapen I think; you first need the
CPU to go deep idle, such that it disabled the tick. Then you have to
start the hrtimer there (using an IPI I suppose) which will then force
the governor to pick a shallow idle state, and then you have to cancel
the timer before it gets triggered.
And then, if the CPU stays perfectly idle, it will be stuck in that
shallow state... forever more.
_However_ IIRC when we (remotely) cancel an hrtimer, we do not in fact
reprogram the timer hardware. So the timer _will_ trigger.
hrtimer_interrupt() will observe nothing to do and reprogram the
hardware for the next timer (if there is one).
This should be enough to cycle through the idle loop and re-select an
idle state and avoid this whole problem.
If that is not happening, then something is busted and we need to figure
out what.
On Tue, Aug 14, 2018 at 12:34:40PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> with stopped tick) missed the case when the target residencies of
> deep idle states of CPUs are above the tick boundary which may cause
> the CPU to get stuck in a shallow idle state for a long time.
>
> Say there are two CPU idle states available: one shallow, with the
> target residency much below the tick boundary and one deep, with
> the target residency significantly above the tick boundary. In
> that case, if the tick has been stopped already and the expected
> next timer event is relatively far in the future, the governor will
> assume the idle duration to be equal to TICK_USEC and it will select
> the idle state for the CPU accordingly. However, that will cause the
> shallow state to be selected even though it would have been more
> energy-efficient to select the deep one.
>
> To address this issue, modify the governor to always use the time
> till the closest timer event instead of the predicted idle duration
> if the latter is less than the tick period length and the tick has
> been stopped already. Also make it extend the search for a matching
> idle state if the tick is stopped to avoid settling on a shallow
> state if deep states with target residencies above the tick period
> length are available.
>
> In addition, make it always indicate that the tick should be stopped
> if it has been stopped already for consistency.
>
> Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
> Reported-by: Leo Yan <[email protected]>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
Seems OK,
Acked-by: Peter Zijlstra (Intel) <[email protected]>
On Tue, Aug 14, 2018 at 11:44:41PM +0800, [email protected] wrote:
> I agree this patch can resolve the issue you mentioned in the commit
> log, but this patch will be fragile for below case, so it will select
> state1 but not state2, how about you think for this case?
>
> Idle_state0::
> target_residency TICK_USEC Prediction
> | | /
> V V /
> -----------------------------------------------------> Time length
> ^ ^
> | |
> Idle_state1: Idle_state2:
> target_residency target_residency
If someone can demonstate this is an actual problem; then we need to fix
it I suppose. But it would be very good to have a real world
need/reproducer for it first.