2019-03-22 07:32:26

by Abhishek Goel

[permalink] [raw]
Subject: [PATCH 0/2] Auto-promotion logic for cpuidle states

Currently, the cpuidle governors (menu/ladder) determine what idle state a
idling CPU should enter into based on heuristics that depend on the idle
history on that CPU. Given that no predictive heuristic is perfect, there
are cases where the governor predicts a shallow idle state, hoping that
the CPU will be busy soon. However, if no new workload is scheduled on
that CPU in the near future, the CPU will end up in the shallow state.

Motivation
----------
In case of POWER, this is problematic, when the predicted state in the
aforementioned scenario is a lite stop state, as such lite states will
inhibit SMT folding, thereby depriving the other threads in the core from
using the core resources.

To address this, such lite states need to be autopromoted. The cpuidle-core
can queue timer to correspond with the residency value of the next
available state. Thus leading to auto-promotion to a deeper idle state as
soon as possible.

Experiment
----------
Without this patch -
It was seen that for a idle system, a cpu may remain in stop0_lite for few
seconds and then directly goes to a deeper state such as stop2.

With this patch -
A cpu will not remain in stop0_lite for more than the residency of next
available state, and thus it will go to a deeper state in conservative
fashion. Using this, we may spent even less than 20 milliseconds if
susbsequent stop states are enabled. In the worst case, we may end up
spending more than a second, as was the case without this patch. The
worst case will occur in the scenario when no other shallow states are
enbaled, and only deep states are available for auto-promotion.

Abhishek Goel (2):
cpuidle : auto-promotion for cpuidle states
cpuidle : Add auto-promotion flag to cpuidle flags

arch/powerpc/include/asm/opal-api.h | 1 +
drivers/cpuidle/Kconfig | 4 ++
drivers/cpuidle/cpuidle-powernv.c | 13 ++++-
drivers/cpuidle/cpuidle.c | 79 ++++++++++++++++++++++++++++-
drivers/cpuidle/governors/ladder.c | 3 +-
drivers/cpuidle/governors/menu.c | 23 ++++++++-
include/linux/cpuidle.h | 12 +++--
7 files changed, 127 insertions(+), 8 deletions(-)

--
2.17.1



2019-03-22 07:32:46

by Abhishek Goel

[permalink] [raw]
Subject: [PATCH 1/2] cpuidle : auto-promotion for cpuidle states

Currently, the cpuidle governors (menu /ladder) determine what idle state
an idling CPU should enter into based on heuristics that depend on the
idle history on that CPU. Given that no predictive heuristic is perfect,
there are cases where the governor predicts a shallow idle state, hoping
that the CPU will be busy soon. However, if no new workload is scheduled
on that CPU in the near future, the CPU will end up in the shallow state.

In case of POWER, this is problematic, when the predicted state in the
aforementioned scenario is a lite stop state, as such lite states will
inhibit SMT folding, thereby depriving the other threads in the core from
using the core resources.

To address this, such lite states need to be autopromoted. The cpuidle-
core can queue timer to correspond with the residency value of the next
available state. Thus leading to auto-promotion to a deeper idle state as
soon as possible.

Signed-off-by: Abhishek Goel <[email protected]>
---
drivers/cpuidle/cpuidle.c | 79 +++++++++++++++++++++++++++++-
drivers/cpuidle/governors/ladder.c | 3 +-
drivers/cpuidle/governors/menu.c | 23 ++++++++-
include/linux/cpuidle.h | 12 +++--
4 files changed, 111 insertions(+), 6 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 7f108309e..c4d1c1b38 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -36,6 +36,12 @@ static int enabled_devices;
static int off __read_mostly;
static int initialized __read_mostly;

+struct auto_promotion {
+ struct hrtimer hrtimer;
+ int timeout;
+ bool timeout_needed;
+};
+
int cpuidle_disabled(void)
{
return off;
@@ -188,6 +194,64 @@ int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev)
}
#endif /* CONFIG_SUSPEND */

+enum hrtimer_restart auto_promotion_hrtimer_callback(struct hrtimer *hrtimer)
+{
+ return HRTIMER_NORESTART;
+}
+
+#ifdef CONFIG_CPU_IDLE_AUTO_PROMOTION
+DEFINE_PER_CPU(struct auto_promotion, ap);
+
+static void cpuidle_auto_promotion_start(struct cpuidle_state *state, int cpu)
+{
+ struct auto_promotion *this_ap = &per_cpu(ap, cpu);
+
+ if (this_ap->timeout_needed && (state->flags &
+ CPUIDLE_FLAG_AUTO_PROMOTION))
+ hrtimer_start(&this_ap->hrtimer, ns_to_ktime(this_ap->timeout
+ * 1000), HRTIMER_MODE_REL_PINNED);
+}
+
+static void cpuidle_auto_promotion_cancel(int cpu)
+{
+ struct hrtimer *hrtimer;
+
+ hrtimer = &per_cpu(ap, cpu).hrtimer;
+ if (hrtimer_is_queued(hrtimer))
+ hrtimer_cancel(hrtimer);
+}
+
+static void cpuidle_auto_promotion_update(int time, int cpu)
+{
+ per_cpu(ap, cpu).timeout = time;
+}
+
+static void cpuidle_auto_promotion_init(struct cpuidle_driver *drv, int cpu)
+{
+ int i;
+ struct auto_promotion *this_ap = &per_cpu(ap, cpu);
+
+ this_ap->timeout_needed = 0;
+
+ for (i = 0; i < drv->state_count; i++) {
+ if (drv->states[i].flags & CPUIDLE_FLAG_AUTO_PROMOTION) {
+ this_ap->timeout_needed = 1;
+ break;
+ }
+ }
+
+ hrtimer_init(&this_ap->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ this_ap->hrtimer.function = auto_promotion_hrtimer_callback;
+}
+#else
+static inline void cpuidle_auto_promotion_start(struct cpuidle_state *state,
+ int cpu) { }
+static inline void cpuidle_auto_promotion_cancel(int cpu) { }
+static inline void cpuidle_auto_promotion_update(int timeout, int cpu) { }
+static inline void cpuidle_auto_promotion_init(struct cpuidle_driver *drv,
+ int cpu) { }
+#endif
+
/**
* cpuidle_enter_state - enter the state and update stats
* @dev: cpuidle device for this cpu
@@ -225,12 +289,17 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
trace_cpu_idle_rcuidle(index, dev->cpu);
time_start = ns_to_ktime(local_clock());

+ cpuidle_auto_promotion_start(target_state, dev->cpu);
+
stop_critical_timings();
entered_state = target_state->enter(dev, drv, index);
start_critical_timings();

sched_clock_idle_wakeup_event();
time_end = ns_to_ktime(local_clock());
+
+ cpuidle_auto_promotion_cancel(dev->cpu);
+
trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);

/* The cpu is no longer idle or about to enter idle. */
@@ -312,7 +381,13 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
bool *stop_tick)
{
- return cpuidle_curr_governor->select(drv, dev, stop_tick);
+ int timeout, ret;
+
+ timeout = INT_MAX;
+ ret = cpuidle_curr_governor->select(drv, dev, stop_tick, &timeout);
+ cpuidle_auto_promotion_update(timeout, dev->cpu);
+
+ return ret;
}

/**
@@ -658,6 +733,8 @@ int cpuidle_register(struct cpuidle_driver *drv,
device = &per_cpu(cpuidle_dev, cpu);
device->cpu = cpu;

+ cpuidle_auto_promotion_init(drv, cpu);
+
#ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED
/*
* On multiplatform for ARM, the coupled idle states could be
diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
index f0dddc66a..3fcae235e 100644
--- a/drivers/cpuidle/governors/ladder.c
+++ b/drivers/cpuidle/governors/ladder.c
@@ -64,7 +64,8 @@ static inline void ladder_do_selection(struct ladder_device *ldev,
* @dummy: not used
*/
static int ladder_select_state(struct cpuidle_driver *drv,
- struct cpuidle_device *dev, bool *dummy)
+ struct cpuidle_device *dev, bool *dummy,
+ int *unused)
{
struct ladder_device *ldev = this_cpu_ptr(&ladder_devices);
struct ladder_device_state *last_state;
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 61316fc51..25fbe2a99 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -276,7 +276,7 @@ static unsigned int get_typical_interval(struct menu_device *data,
* @stop_tick: indication on whether or not to stop the tick
*/
static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
- bool *stop_tick)
+ bool *stop_tick, int *timeout)
{
struct menu_device *data = this_cpu_ptr(&menu_devices);
int latency_req = cpuidle_governor_latency_req(dev->cpu);
@@ -442,6 +442,27 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
}
}

+#ifdef CPUIDLE_FLAG_AUTO_PROMOTION
+ if (drv->states[idx].flags & CPUIDLE_FLAG_AUTO_PROMOTION) {
+ /* Timeout is intended to be defined as sum of target residency
+ * of next available state, entry latency and exit latency. If
+ * time interval equal to timeout is spent in current state,
+ * and if it is a shallow lite state, we may want to auto-
+ * promote from such state.
+ */
+ *timeout = drv->states[idx].target_residency +
+ 2 * drv->states[idx].exit_latency;
+ for (i = idx+1; i < drv->state_count; i++) {
+ if (drv->states[i].disabled ||
+ dev->states_usage[i].disable)
+ continue;
+ *timeout = drv->states[i].target_residency +
+ 2 * drv->states[i].exit_latency;
+ break;
+ }
+ }
+#endif
+
return idx;
}

diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 4dff74f48..223f089bc 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -68,10 +68,16 @@ struct cpuidle_state {
};

/* Idle State Flags */
-#define CPUIDLE_FLAG_NONE (0x00)
+#define CPUIDLE_FLAG_NONE (0x00)
#define CPUIDLE_FLAG_POLLING (0x01) /* polling state */
#define CPUIDLE_FLAG_COUPLED (0x02) /* state applies to multiple cpus */
-#define CPUIDLE_FLAG_TIMER_STOP (0x04) /* timer is stopped on this state */
+#define CPUIDLE_FLAG_TIMER_STOP (0x04) /* timer is stopped on this state */
+/* State with only and only fast state bit set don't even lose user context.
+ * But such states prevent other sibling threads from thread folding benefits.
+ * And hence we don't want to stay for too long in such states and want to
+ * auto-promote from it.
+ */
+#define CPUIDLE_FLAG_AUTO_PROMOTION (0x08)

#define CPUIDLE_DRIVER_FLAGS_MASK (0xFFFF0000)

@@ -245,7 +251,7 @@ struct cpuidle_governor {

int (*select) (struct cpuidle_driver *drv,
struct cpuidle_device *dev,
- bool *stop_tick);
+ bool *stop_tick, int *timeout);
void (*reflect) (struct cpuidle_device *dev, int index);
};

--
2.17.1


2019-03-22 07:34:54

by Abhishek Goel

[permalink] [raw]
Subject: [PATCH 2/2] cpuidle : Add auto-promotion flag to cpuidle flags

This patch sets up flags for the state which needs to be auto-promoted.
For powernv systems, lite states do not even lose user context. That
information has been used to set the flag for lite states.

Signed-off-by: Abhishek Goel <[email protected]>
---
arch/powerpc/include/asm/opal-api.h | 1 +
drivers/cpuidle/Kconfig | 4 ++++
drivers/cpuidle/cpuidle-powernv.c | 13 +++++++++++--
3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 870fb7b23..735dec731 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -226,6 +226,7 @@
*/

#define OPAL_PM_TIMEBASE_STOP 0x00000002
+#define OPAL_PM_LOSE_USER_CONTEXT 0x00001000
#define OPAL_PM_LOSE_HYP_CONTEXT 0x00002000
#define OPAL_PM_LOSE_FULL_CONTEXT 0x00004000
#define OPAL_PM_NAP_ENABLED 0x00010000
diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index 7e48eb5bf..0ece62684 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -26,6 +26,10 @@ config CPU_IDLE_GOV_MENU
config DT_IDLE_STATES
bool

+config CPU_IDLE_AUTO_PROMOTION
+ bool
+ default y if PPC_POWERNV
+
menu "ARM CPU Idle Drivers"
depends on ARM || ARM64
source "drivers/cpuidle/Kconfig.arm"
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 84b1ebe21..e351f5f9c 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -299,6 +299,7 @@ static int powernv_add_idle_states(void)
for (i = 0; i < dt_idle_states; i++) {
unsigned int exit_latency, target_residency;
bool stops_timebase = false;
+ bool lose_user_context = false;
struct pnv_idle_states_t *state = &pnv_idle_states[i];

/*
@@ -324,6 +325,9 @@ static int powernv_add_idle_states(void)
if (has_stop_states && !(state->valid))
continue;

+ if (state->flags & OPAL_PM_LOSE_USER_CONTEXT)
+ lose_user_context = true;
+
if (state->flags & OPAL_PM_TIMEBASE_STOP)
stops_timebase = true;

@@ -332,12 +336,17 @@ static int powernv_add_idle_states(void)
add_powernv_state(nr_idle_states, "Nap",
CPUIDLE_FLAG_NONE, nap_loop,
target_residency, exit_latency, 0, 0);
+ } else if (has_stop_states & !lose_user_context) {
+ add_powernv_state(nr_idle_states, state->name,
+ CPUIDLE_FLAG_AUTO_PROMOTION,
+ stop_loop, target_residency,
+ exit_latency, state->psscr_val,
+ state->psscr_mask);
} else if (has_stop_states && !stops_timebase) {
add_powernv_state(nr_idle_states, state->name,
CPUIDLE_FLAG_NONE, stop_loop,
target_residency, exit_latency,
- state->psscr_val,
- state->psscr_mask);
+ state->psscr_val, state->psscr_mask);
}

/*
--
2.17.1


2019-03-22 09:47:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/2] cpuidle : auto-promotion for cpuidle states

On Fri, Mar 22, 2019 at 8:31 AM Abhishek Goel
<[email protected]> wrote:
>
> Currently, the cpuidle governors (menu /ladder) determine what idle state
> an idling CPU should enter into based on heuristics that depend on the
> idle history on that CPU. Given that no predictive heuristic is perfect,
> there are cases where the governor predicts a shallow idle state, hoping
> that the CPU will be busy soon. However, if no new workload is scheduled
> on that CPU in the near future, the CPU will end up in the shallow state.
>
> In case of POWER, this is problematic, when the predicted state in the
> aforementioned scenario is a lite stop state, as such lite states will
> inhibit SMT folding, thereby depriving the other threads in the core from
> using the core resources.
>
> To address this, such lite states need to be autopromoted. The cpuidle-
> core can queue timer to correspond with the residency value of the next
> available state. Thus leading to auto-promotion to a deeper idle state as
> soon as possible.

Isn't the tick stopping avoidance sufficient for that?

2019-03-22 13:27:40

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH 1/2] cpuidle : auto-promotion for cpuidle states

On 22/03/2019 10:45, Rafael J. Wysocki wrote:
> On Fri, Mar 22, 2019 at 8:31 AM Abhishek Goel
> <[email protected]> wrote:
>>
>> Currently, the cpuidle governors (menu /ladder) determine what idle state
>> an idling CPU should enter into based on heuristics that depend on the
>> idle history on that CPU. Given that no predictive heuristic is perfect,
>> there are cases where the governor predicts a shallow idle state, hoping
>> that the CPU will be busy soon. However, if no new workload is scheduled
>> on that CPU in the near future, the CPU will end up in the shallow state.
>>
>> In case of POWER, this is problematic, when the predicted state in the
>> aforementioned scenario is a lite stop state, as such lite states will
>> inhibit SMT folding, thereby depriving the other threads in the core from
>> using the core resources.
>>
>> To address this, such lite states need to be autopromoted. The cpuidle-
>> core can queue timer to correspond with the residency value of the next
>> available state. Thus leading to auto-promotion to a deeper idle state as
>> soon as possible.
>
> Isn't the tick stopping avoidance sufficient for that?

I was about to ask the same :)




--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


2019-04-01 05:12:49

by Abhishek Goel

[permalink] [raw]
Subject: Re: [PATCH 1/2] cpuidle : auto-promotion for cpuidle states



On 03/22/2019 06:56 PM, Daniel Lezcano wrote:
> On 22/03/2019 10:45, Rafael J. Wysocki wrote:
>> On Fri, Mar 22, 2019 at 8:31 AM Abhishek Goel
>> <[email protected]> wrote:
>>> Currently, the cpuidle governors (menu /ladder) determine what idle state
>>> an idling CPU should enter into based on heuristics that depend on the
>>> idle history on that CPU. Given that no predictive heuristic is perfect,
>>> there are cases where the governor predicts a shallow idle state, hoping
>>> that the CPU will be busy soon. However, if no new workload is scheduled
>>> on that CPU in the near future, the CPU will end up in the shallow state.
>>>
>>> In case of POWER, this is problematic, when the predicted state in the
>>> aforementioned scenario is a lite stop state, as such lite states will
>>> inhibit SMT folding, thereby depriving the other threads in the core from
>>> using the core resources.
>>>
>>> To address this, such lite states need to be autopromoted. The cpuidle-
>>> core can queue timer to correspond with the residency value of the next
>>> available state. Thus leading to auto-promotion to a deeper idle state as
>>> soon as possible.
>> Isn't the tick stopping avoidance sufficient for that?
> I was about to ask the same :)
>
>
>
>
Thanks for the review.
I performed experiments for three scenarios to collect some data.

case 1 :
Without this patch and without tick retained, i.e. in a upstream kernel,
It would spend more than even a second to get out of stop0_lite.

case 2 : With tick retained(as suggested) -

Generally, we have a sched tick at 4ms(CONF_HZ = 250). Ideally I expected
it to take 8 sched tick to get out of stop0_lite. Experimentally,
observation was

===================================
min            max            99percentile
4ms            12ms          4ms
===================================
*ms = milliseconds

It would take atleast one sched tick to get out of stop0_lite.

case 2 :  With this patch (not stopping tick, but explicitly queuing a
timer)

min            max              99.5percentile
===============================
144us       192us              144us
===============================
*us = microseconds

In this patch, we queue a timer just before entering into a stop0_lite
state. The timer fires at (residency of next available state + exit
latency of next available state * 2).
Let's say if next state(stop0) is available which has residency of 20us, it
should get out in as low as (20+2*2)*8 [Based on the forumla (residency +
2xlatency)*history length] microseconds = 192us. Ideally we would expect 8
iterations, it was observed to get out in 6-7 iterations.
Even if let's say stop2 is next available state(stop0 and stop1 both are
unavailable), it would take (100+2*10)*8 = 960us to get into stop2.

So, We are able to get out of stop0_lite generally in 150us(with this
patch) as
compared to 4ms(with tick retained). As stated earlier, we do not want
to get
stuck into stop0_lite as it inhibits SMT folding for other sibling
threads, depriving
them of core resources. Current patch is using auto-promotion only for
stop0_lite,
as it gives performance benefit(primary reason) along with lowering down
power
consumption. We may extend this model for other states in future.

--Abhishek

2019-04-04 10:22:40

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH 1/2] cpuidle : auto-promotion for cpuidle states


Hi Abhishek,

thanks for taking the time to test the different scenario and give us
the numbers.

On 01/04/2019 07:11, Abhishek wrote:
>
>
> On 03/22/2019 06:56 PM, Daniel Lezcano wrote:
>> On 22/03/2019 10:45, Rafael J. Wysocki wrote:
>>> On Fri, Mar 22, 2019 at 8:31 AM Abhishek Goel
>>> <[email protected]> wrote:
>>>> Currently, the cpuidle governors (menu /ladder) determine what idle
>>>> state
>>>> an idling CPU should enter into based on heuristics that depend on the
>>>> idle history on that CPU. Given that no predictive heuristic is
>>>> perfect,
>>>> there are cases where the governor predicts a shallow idle state,
>>>> hoping
>>>> that the CPU will be busy soon. However, if no new workload is
>>>> scheduled
>>>> on that CPU in the near future, the CPU will end up in the shallow
>>>> state.
>>>>
>>>> In case of POWER, this is problematic, when the predicted state in the
>>>> aforementioned scenario is a lite stop state, as such lite states will
>>>> inhibit SMT folding, thereby depriving the other threads in the core
>>>> from
>>>> using the core resources.

I can understand an idle state can prevent other threads to use the core
resources. But why a deeper idle state does not prevent this also?


>>>> To address this, such lite states need to be autopromoted. The cpuidle-
>>>> core can queue timer to correspond with the residency value of the next
>>>> available state. Thus leading to auto-promotion to a deeper idle
>>>> state as
>>>> soon as possible.
>>> Isn't the tick stopping avoidance sufficient for that?
>> I was about to ask the same :)
>>
>>
>>
>>
> Thanks for the review.
> I performed experiments for three scenarios to collect some data.
>
> case 1 :
> Without this patch and without tick retained, i.e. in a upstream kernel,
> It would spend more than even a second to get out of stop0_lite.
>
> case 2 : With tick retained(as suggested) -
>
> Generally, we have a sched tick at 4ms(CONF_HZ = 250). Ideally I expected
> it to take 8 sched tick to get out of stop0_lite. Experimentally,
> observation was
>
> ===================================
> min            max            99percentile
> 4ms            12ms          4ms
> ===================================
> *ms = milliseconds
>
> It would take atleast one sched tick to get out of stop0_lite.
>
> case 2 :  With this patch (not stopping tick, but explicitly queuing a
> timer)
>
> min            max              99.5percentile
> ===============================
> 144us       192us              144us
> ===============================
> *us = microseconds
>
> In this patch, we queue a timer just before entering into a stop0_lite
> state. The timer fires at (residency of next available state + exit
> latency of next available state * 2).

So for the context, we have a similar issue but from the power
management point of view where a CPU can stay in a shallow state for a
long period, thus consuming a lot of energy.

The window was reduced by preventing stopping the tick when a shallow
state is selected. Unfortunately, if the tick is stopped and we
exit/enter again and we select a shallow state, the situation is the same.

A solution was previously proposed with a timer some years ago, like
this patch does, and merged but there were complains about bad
performance impact, so it has been reverted.

> Let's say if next state(stop0) is available which has residency of 20us, it
> should get out in as low as (20+2*2)*8 [Based on the forumla (residency +
> 2xlatency)*history length] microseconds = 192us. Ideally we would expect 8
> iterations, it was observed to get out in 6-7 iterations.

Can you explain the formula? I don't get the rational. Why using the
exit latency and why multiply it by 2?

Why the timer is not set to the next state's target residency value ?

> Even if let's say stop2 is next available state(stop0 and stop1 both are
> unavailable), it would take (100+2*10)*8 = 960us to get into stop2.
>
> So, We are able to get out of stop0_lite generally in 150us(with this
> patch) as
> compared to 4ms(with tick retained). As stated earlier, we do not want
> to get
> stuck into stop0_lite as it inhibits SMT folding for other sibling
> threads, depriving
> them of core resources. Current patch is using auto-promotion only for
> stop0_lite,
> as it gives performance benefit(primary reason) along with lowering down
> power
> consumption. We may extend this model for other states in future.
>
> --Abhishek
>


--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2019-04-04 10:55:44

by Gautham R Shenoy

[permalink] [raw]
Subject: Re: [PATCH 1/2] cpuidle : auto-promotion for cpuidle states

Hello Daniel,

On Thu, Apr 4, 2019 at 3:52 PM Daniel Lezcano <[email protected]> wrote:
>
>
> Hi Abhishek,
>
> thanks for taking the time to test the different scenario and give us
> the numbers.
>
> On 01/04/2019 07:11, Abhishek wrote:
> >

[.. snip..]

>
> >>>> In case of POWER, this is problematic, when the predicted state in the
> >>>> aforementioned scenario is a lite stop state, as such lite states will
> >>>> inhibit SMT folding, thereby depriving the other threads in the core
> >>>> from
> >>>> using the core resources.
>
> I can understand an idle state can prevent other threads to use the core
> resources. But why a deeper idle state does not prevent this also?

On POWER9, we have the following classes of platform idle states
(called stop states)

lite : These do not lose any context including the user context. In
this state
GPRs are also preserved (stop0_lite)

shallow : These lose user context,but not the hypervisor context. So
GPRs are lost but
not SPRs. (stop0, stop1, stop2)

deep: These lose hypervisor context. (stop4, stop5)

In the case of lite stop state, only instruction dispatch on the CPU thread
is paused. The thread does not give up its registers set in this state for the
use of its busy sibling threads in the core. Hence, SMT folding does not
happen in this state.

With respect to shallow and deep states, not only is the instruction dispatch
paused, it also gives up its registers set for the other siblings to use
These stop states are beneficial for SMT folding.

Hence, if a CPU thread remains in a lite state for too long,
its siblings in the core would not be able to utilize the full resources
of the core for that duration, thereby inhibiting single
thread performance. This is not the case with non-lite states.

--
Thanks and Regards
gautham.

2019-04-04 11:11:53

by Abhishek Goel

[permalink] [raw]
Subject: Re: [PATCH 1/2] cpuidle : auto-promotion for cpuidle states



On 04/04/2019 03:51 PM, Daniel Lezcano wrote:
> Hi Abhishek,
>
> thanks for taking the time to test the different scenario and give us
> the numbers.
>
> On 01/04/2019 07:11, Abhishek wrote:
>>
>> On 03/22/2019 06:56 PM, Daniel Lezcano wrote:
>>> On 22/03/2019 10:45, Rafael J. Wysocki wrote:
>>>> On Fri, Mar 22, 2019 at 8:31 AM Abhishek Goel
>>>> <[email protected]> wrote:
>>>>> Currently, the cpuidle governors (menu /ladder) determine what idle
>>>>> state
>>>>> an idling CPU should enter into based on heuristics that depend on the
>>>>> idle history on that CPU. Given that no predictive heuristic is
>>>>> perfect,
>>>>> there are cases where the governor predicts a shallow idle state,
>>>>> hoping
>>>>> that the CPU will be busy soon. However, if no new workload is
>>>>> scheduled
>>>>> on that CPU in the near future, the CPU will end up in the shallow
>>>>> state.
>>>>>
>>>>> In case of POWER, this is problematic, when the predicted state in the
>>>>> aforementioned scenario is a lite stop state, as such lite states will
>>>>> inhibit SMT folding, thereby depriving the other threads in the core
>>>>> from
>>>>> using the core resources.
> I can understand an idle state can prevent other threads to use the core
> resources. But why a deeper idle state does not prevent this also?
>
>
>>>>> To address this, such lite states need to be autopromoted. The cpuidle-
>>>>> core can queue timer to correspond with the residency value of the next
>>>>> available state. Thus leading to auto-promotion to a deeper idle
>>>>> state as
>>>>> soon as possible.
>>>> Isn't the tick stopping avoidance sufficient for that?
>>> I was about to ask the same :)
>>>
>>>
>>>
>>>
>> Thanks for the review.
>> I performed experiments for three scenarios to collect some data.
>>
>> case 1 :
>> Without this patch and without tick retained, i.e. in a upstream kernel,
>> It would spend more than even a second to get out of stop0_lite.
>>
>> case 2 : With tick retained(as suggested) -
>>
>> Generally, we have a sched tick at 4ms(CONF_HZ = 250). Ideally I expected
>> it to take 8 sched tick to get out of stop0_lite. Experimentally,
>> observation was
>>
>> ===================================
>> min            max            99percentile
>> 4ms            12ms          4ms
>> ===================================
>> *ms = milliseconds
>>
>> It would take atleast one sched tick to get out of stop0_lite.
>>
>> case 2 :  With this patch (not stopping tick, but explicitly queuing a
>> timer)
>>
>> min            max              99.5percentile
>> ===============================
>> 144us       192us              144us
>> ===============================
>> *us = microseconds
>>
>> In this patch, we queue a timer just before entering into a stop0_lite
>> state. The timer fires at (residency of next available state + exit
>> latency of next available state * 2).
> So for the context, we have a similar issue but from the power
> management point of view where a CPU can stay in a shallow state for a
> long period, thus consuming a lot of energy.
>
> The window was reduced by preventing stopping the tick when a shallow
> state is selected. Unfortunately, if the tick is stopped and we
> exit/enter again and we select a shallow state, the situation is the same.
>
> A solution was previously proposed with a timer some years ago, like
> this patch does, and merged but there were complains about bad
> performance impact, so it has been reverted.
>
>> Let's say if next state(stop0) is available which has residency of 20us, it
>> should get out in as low as (20+2*2)*8 [Based on the forumla (residency +
>> 2xlatency)*history length] microseconds = 192us. Ideally we would expect 8
>> iterations, it was observed to get out in 6-7 iterations.
> Can you explain the formula? I don't get the rational. Why using the
> exit latency and why multiply it by 2?
>
> Why the timer is not set to the next state's target residency value ?
>
The idea behind multiplying by 2 is, entry latency + exit latency = 2*
exit latency, i.e.,
using exit latency = entry latency
So in effect, we are using target residency + 2 * exit latency for
timeout of timer.
Latency is generally <=10% of residency. I have tried to be conservative
by including latency
factor in computation for timeout. Thus, this formula will give slightly
greater value compared
to directly using residency of target state.

--Abhishek