From: Rafael J. Wysocki <[email protected]>
Make teo_select() avoid calling tick_nohz_get_sleep_length() if the
candidate idle state to return is state 0 or if state 0 is a polling
one and the target residency of the current candidate one is below
a certain threshold, in which cases it may be assumed that the CPU will
be woken up immediately by a non-timer wakeup source and the timers
are not likely to matter.
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
v1 -> v2: No changes
---
drivers/cpuidle/governors/teo.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
Index: linux-pm/drivers/cpuidle/governors/teo.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/teo.c
+++ linux-pm/drivers/cpuidle/governors/teo.c
@@ -166,6 +166,12 @@
*/
#define NR_RECENT 9
+/*
+ * Idle state target residency threshold used for deciding whether or not to
+ * check the time till the closest expected timer event.
+ */
+#define RESIDENCY_THRESHOLD_NS (15 * NSEC_PER_USEC)
+
/**
* struct teo_bin - Metrics used by the TEO cpuidle governor.
* @intercepts: The "intercepts" metric.
@@ -542,6 +548,22 @@ static int teo_select(struct cpuidle_dri
idx = i;
}
+ /*
+ * Skip the timers check if state 0 is the current candidate one,
+ * because an immediate non-timer wakeup is expected in that case.
+ */
+ if (!idx)
+ goto out_tick;
+
+ /*
+ * If state 0 is a polling one, check if the target residency of
+ * the current candidate state is low enough and skip the timers
+ * check in that case too.
+ */
+ if ((drv->states[0].flags & CPUIDLE_FLAG_POLLING) &&
+ drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS)
+ goto out_tick;
+
duration_ns = tick_nohz_get_sleep_length(&delta_tick);
cpu_data->sleep_length_ns = duration_ns;
Hi Rafael,
On Thu, 3 Aug 2023, Rafael J. Wysocki wrote:
> Index: linux-pm/drivers/cpuidle/governors/teo.c
> ===================================================================
> --- linux-pm.orig/drivers/cpuidle/governors/teo.c
> +++ linux-pm/drivers/cpuidle/governors/teo.c
> @@ -166,6 +166,12 @@
> */
> #define NR_RECENT 9
>
> +/*
> + * Idle state target residency threshold used for deciding whether or not to
> + * check the time till the closest expected timer event.
> + */
> +#define RESIDENCY_THRESHOLD_NS (15 * NSEC_PER_USEC)
> +
I would like to understand why this residency threshold is a fixed value
and not related to TICK_NSEC. I'm sure there is a reason for it - but for
me it is not obvious. Can you please explain it to me?
Thanks,
Anna-Maria
Hi,
On Fri, Aug 11, 2023 at 10:52 AM Anna-Maria Behnsen
<[email protected]> wrote:
>
> Hi Rafael,
>
> On Thu, 3 Aug 2023, Rafael J. Wysocki wrote:
>
> > Index: linux-pm/drivers/cpuidle/governors/teo.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpuidle/governors/teo.c
> > +++ linux-pm/drivers/cpuidle/governors/teo.c
> > @@ -166,6 +166,12 @@
> > */
> > #define NR_RECENT 9
> >
> > +/*
> > + * Idle state target residency threshold used for deciding whether or not to
> > + * check the time till the closest expected timer event.
> > + */
> > +#define RESIDENCY_THRESHOLD_NS (15 * NSEC_PER_USEC)
> > +
>
> I would like to understand why this residency threshold is a fixed value
> and not related to TICK_NSEC. I'm sure there is a reason for it - but for
> me it is not obvious. Can you please explain it to me?
First off, I'm not convinced that there is any direct connection
between the TICK_NSEC value and which idle states can be regarded as
shallow. HZ may vary between 100 and 1000 (an order of magnitude) and
this doesn't affect the target residencies of idle states in any way.
Next, the checks involving this value don't influence the
tick-stopping decision in any way, so I don't see a reason why they
should depend on TICK_NSEC.
Finally, it can be observed that ideally, the return value of
tick_nohz_get_sleep_length() should always be taken into
consideration, because it may influence the idle state selection in
any case. However, if the target residency of the idle state selected
so far is really small, calling it may be skipped in case it is
costly, because its contribution is not likely to be significant.
Worst-case we would end up with a CPU wakeup before the target
residency of the selected idle state has elapsed, so some energy will
be lost and some exit latency will be incurred in vain, so this really
should be done when the numbers involved are small enough.
Now, what does "small enough" mean? My answer is 15 us.