2020-07-07 11:12:37

by Gautham R Shenoy

[permalink] [raw]
Subject: [PATCH 4/5] cpuidle-pseries : Include extended CEDE states in cpuidle framework

From: "Gautham R. Shenoy" <[email protected]>

This patch exposes those extended CEDE states to the cpuidle framework
which are responsive to external interrupts and do not need an H_PROD.

Since as per the PAPR, all the extended CEDE states are non-responsive
to timers, we indicate this to the cpuidle subsystem via the
CPUIDLE_FLAG_TIMER_STOP flag for all those extende CEDE states which
can wake up on external interrupts.

With the patch, we are able to see the extended CEDE state with
latency hint = 1 exposed via the cpuidle framework.

$ cpupower idle-info
CPUidle driver: pseries_idle
CPUidle governor: menu
analyzing CPU 0:

Number of idle states: 3
Available idle states: snooze CEDE XCEDE1
snooze:
Flags/Description: snooze
Latency: 0
Usage: 33429446
Duration: 27006062
CEDE:
Flags/Description: CEDE
Latency: 1
Usage: 10272
Duration: 110786770
XCEDE1:
Flags/Description: XCEDE1
Latency: 12
Usage: 26445
Duration: 1436433815

Benchmark results:
TLDR: Over all we do not see any additional benefit from having XCEDE1 over
CEDE.

ebizzy :
2 threads bound to a big-core. With this patch, we see a 3.39%
regression compared to with only CEDE0 latency fixup.
x With only CEDE0 latency fixup
* With CEDE0 latency fixup + CEDE1
N Min Max Median Avg Stddev
x 10 2893813 5834474 5832448 5327281.3 1055941.4
* 10 2907329 5834923 5831398 5146614.6 1193874.8

context_switch2:
With the context_switch2 there are no observable regressions in the
results.

context_switch2 CPU0 CPU1 (Same Big-core, different small-cores).
No difference with and without patch.
x without_patch
* with_patch
N Min Max Median Avg Stddev
x 500 343644 348778 345444 345584.02 1035.1658
* 500 344310 347646 345776 345877.22 802.19501

context_switch2 CPU0 CPU8 (different big-cores). Minor 0.05% improvement
with patch
x without_patch
* with_patch
N Min Max Median Avg Stddev
x 500 287562 288756 288162 288134.76 262.24328
* 500 287874 288960 288306 288274.66 187.57034

schbench:
No regressions observed with schbench

Without Patch:
Latency percentiles (usec)
50.0th: 29
75.0th: 40
90.0th: 50
95.0th: 61
*99.0th: 13648
99.5th: 14768
99.9th: 15664
min=0, max=29812

With Patch:
Latency percentiles (usec)
50.0th: 30
75.0th: 40
90.0th: 51
95.0th: 59
*99.0th: 13616
99.5th: 14512
99.9th: 15696
min=0, max=15996

Signed-off-by: Gautham R. Shenoy <[email protected]>
---
drivers/cpuidle/cpuidle-pseries.c | 50 +++++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)

diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 502f906..6f893cd 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -362,9 +362,59 @@ static int add_pseries_idle_states(void)
for (i = 0; i < nr_xcede_records; i++) {
u64 latency_tb = xcede_records[i].wakeup_latency_tb_ticks;
u64 latency_us = tb_to_ns(latency_tb) / NSEC_PER_USEC;
+ char name[CPUIDLE_NAME_LEN];
+ unsigned int latency_hint = xcede_records[i].latency_hint;
+ u64 residency_us;
+
+ if (!xcede_records[i].responsive_to_irqs) {
+ pr_info("cpuidle : Skipping XCEDE%d. Not responsive to IRQs\n",
+ latency_hint);
+ continue;
+ }

if (latency_us < min_latency_us)
min_latency_us = latency_us;
+ snprintf(name, CPUIDLE_NAME_LEN, "XCEDE%d", latency_hint);
+
+ /*
+ * As per the section 14.14.1 of PAPR version 2.8.1
+ * says that alling H_CEDE with the value of the cede
+ * latency specifier set greater than zero allows the
+ * processor timer facility to be disabled (so as not
+ * to cause gratuitous wake-ups - the use of H_PROD,
+ * or other external interrupt is required to wake the
+ * processor in this case).
+ *
+ * So, inform the cpuidle-subsystem that the timer
+ * will be stopped for these states.
+ *
+ * Also, bump up the latency by 10us, since cpuidle
+ * would use timer-offload framework which will need
+ * to send an IPI to wakeup a CPU whose timer has
+ * expired.
+ */
+ if (latency_hint > 0) {
+ dedicated_states[nr_states].flags = CPUIDLE_FLAG_TIMER_STOP;
+ latency_us += 10;
+ }
+
+ /*
+ * Thumb rule : Reside in the XCEDE state for at least
+ * 10x the time required to enter and exit that state.
+ */
+ residency_us = latency_us * 10;
+
+ strlcpy(dedicated_states[nr_states].name, (const char *)name,
+ CPUIDLE_NAME_LEN);
+ strlcpy(dedicated_states[nr_states].desc, (const char *)name,
+ CPUIDLE_NAME_LEN);
+ dedicated_states[nr_states].exit_latency = latency_us;
+ dedicated_states[nr_states].target_residency = residency_us;
+ dedicated_states[nr_states].enter = &dedicated_cede_loop;
+ cede_latency_hint[nr_states] = latency_hint;
+ pr_info("cpuidle : Added %s. latency-hint = %d\n",
+ name, latency_hint);
+ nr_states++;
}

/*
--
1.9.4


2020-07-20 06:34:07

by Vaidyanathan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH 4/5] cpuidle-pseries : Include extended CEDE states in cpuidle framework

* Gautham R Shenoy <[email protected]> [2020-07-07 16:41:38]:

> From: "Gautham R. Shenoy" <[email protected]>
>
> This patch exposes those extended CEDE states to the cpuidle framework
> which are responsive to external interrupts and do not need an H_PROD.
>
> Since as per the PAPR, all the extended CEDE states are non-responsive
> to timers, we indicate this to the cpuidle subsystem via the
> CPUIDLE_FLAG_TIMER_STOP flag for all those extende CEDE states which
> can wake up on external interrupts.
>
> With the patch, we are able to see the extended CEDE state with
> latency hint = 1 exposed via the cpuidle framework.
>
> $ cpupower idle-info
> CPUidle driver: pseries_idle
> CPUidle governor: menu
> analyzing CPU 0:
>
> Number of idle states: 3
> Available idle states: snooze CEDE XCEDE1
> snooze:
> Flags/Description: snooze
> Latency: 0
> Usage: 33429446
> Duration: 27006062
> CEDE:
> Flags/Description: CEDE
> Latency: 1
> Usage: 10272
> Duration: 110786770
> XCEDE1:
> Flags/Description: XCEDE1
> Latency: 12
> Usage: 26445
> Duration: 1436433815
>
> Benchmark results:
> TLDR: Over all we do not see any additional benefit from having XCEDE1 over
> CEDE.
>
> ebizzy :
> 2 threads bound to a big-core. With this patch, we see a 3.39%
> regression compared to with only CEDE0 latency fixup.
> x With only CEDE0 latency fixup
> * With CEDE0 latency fixup + CEDE1
> N Min Max Median Avg Stddev
> x 10 2893813 5834474 5832448 5327281.3 1055941.4
> * 10 2907329 5834923 5831398 5146614.6 1193874.8
>
> context_switch2:
> With the context_switch2 there are no observable regressions in the
> results.
>
> context_switch2 CPU0 CPU1 (Same Big-core, different small-cores).
> No difference with and without patch.
> x without_patch
> * with_patch
> N Min Max Median Avg Stddev
> x 500 343644 348778 345444 345584.02 1035.1658
> * 500 344310 347646 345776 345877.22 802.19501
>
> context_switch2 CPU0 CPU8 (different big-cores). Minor 0.05% improvement
> with patch
> x without_patch
> * with_patch
> N Min Max Median Avg Stddev
> x 500 287562 288756 288162 288134.76 262.24328
> * 500 287874 288960 288306 288274.66 187.57034
>
> schbench:
> No regressions observed with schbench
>
> Without Patch:
> Latency percentiles (usec)
> 50.0th: 29
> 75.0th: 40
> 90.0th: 50
> 95.0th: 61
> *99.0th: 13648
> 99.5th: 14768
> 99.9th: 15664
> min=0, max=29812
>
> With Patch:
> Latency percentiles (usec)
> 50.0th: 30
> 75.0th: 40
> 90.0th: 51
> 95.0th: 59
> *99.0th: 13616
> 99.5th: 14512
> 99.9th: 15696
> min=0, max=15996
>
> Signed-off-by: Gautham R. Shenoy <[email protected]>

Reviewed-by: Vaidyanathan Srinivasan <[email protected]>

> ---
> drivers/cpuidle/cpuidle-pseries.c | 50 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 50 insertions(+)
>
> diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
> index 502f906..6f893cd 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -362,9 +362,59 @@ static int add_pseries_idle_states(void)
> for (i = 0; i < nr_xcede_records; i++) {
> u64 latency_tb = xcede_records[i].wakeup_latency_tb_ticks;
> u64 latency_us = tb_to_ns(latency_tb) / NSEC_PER_USEC;
> + char name[CPUIDLE_NAME_LEN];
> + unsigned int latency_hint = xcede_records[i].latency_hint;
> + u64 residency_us;
> +
> + if (!xcede_records[i].responsive_to_irqs) {
> + pr_info("cpuidle : Skipping XCEDE%d. Not responsive to IRQs\n",
> + latency_hint);
> + continue;
> + }
>
> if (latency_us < min_latency_us)
> min_latency_us = latency_us;
> + snprintf(name, CPUIDLE_NAME_LEN, "XCEDE%d", latency_hint);
> +
> + /*
> + * As per the section 14.14.1 of PAPR version 2.8.1
> + * says that alling H_CEDE with the value of the cede
> + * latency specifier set greater than zero allows the
> + * processor timer facility to be disabled (so as not
> + * to cause gratuitous wake-ups - the use of H_PROD,
> + * or other external interrupt is required to wake the
> + * processor in this case).
> + *
> + * So, inform the cpuidle-subsystem that the timer
> + * will be stopped for these states.
> + *
> + * Also, bump up the latency by 10us, since cpuidle
> + * would use timer-offload framework which will need
> + * to send an IPI to wakeup a CPU whose timer has
> + * expired.
> + */
> + if (latency_hint > 0) {
> + dedicated_states[nr_states].flags = CPUIDLE_FLAG_TIMER_STOP;
> + latency_us += 10;
> + }
> +
> + /*
> + * Thumb rule : Reside in the XCEDE state for at least
> + * 10x the time required to enter and exit that state.
> + */
> + residency_us = latency_us * 10;
> +
> + strlcpy(dedicated_states[nr_states].name, (const char *)name,
> + CPUIDLE_NAME_LEN);
> + strlcpy(dedicated_states[nr_states].desc, (const char *)name,
> + CPUIDLE_NAME_LEN);
> + dedicated_states[nr_states].exit_latency = latency_us;
> + dedicated_states[nr_states].target_residency = residency_us;
> + dedicated_states[nr_states].enter = &dedicated_cede_loop;
> + cede_latency_hint[nr_states] = latency_hint;
> + pr_info("cpuidle : Added %s. latency-hint = %d\n",
> + name, latency_hint);
> + nr_states++;

This patch demonstrates the various use cases of the previous patches
in the series that helps interface with the platform firmware better.
On current platforms these benefits are very limited, but the
framework built by the previous patches helps Linux exploit new and
enhanced idle states that will be available on newer platform and
firmware.

--Vaidy