2020-07-17 09:19:46

by Pratik R. Sampat

[permalink] [raw]
Subject: [PATCH v2 0/2] Selftest for cpuidle latency measurement

v1: https://lkml.org/lkml/2020/7/7/1036
Changelog v1 --> v2
1. Based on Shuah Khan's comment, changed exit code to ksft_skip to
indicate the test is being skipped
2. Change the busy workload for baseline measurement from
"yes > /dev/null" to "cat /dev/random to /dev/null", based on
observed CPU utilization for "yes" consuming ~60% CPU while the
latter consumes 100% of CPUs, giving more accurate baseline numbers
---

The patch series introduces a mechanism to measure wakeup latency for
IPI and timer based interrupts
The motivation behind this series is to find significant deviations
behind advertised latency and resisdency values

To achieve this, we introduce a kernel module and expose its control
knobs through the debugfs interface that the selftests can engage with.

The kernel module provides the following interfaces within
/sys/kernel/debug/latency_test/ for,
1. IPI test:
ipi_cpu_dest # Destination CPU for the IPI
ipi_cpu_src # Origin of the IPI
ipi_latency_ns # Measured latency time in ns
2. Timeout test:
timeout_cpu_src # CPU on which the timer to be queued
timeout_expected_ns # Timer duration
timeout_diff_ns # Difference of actual duration vs expected timer
To include the module, check option and include as module
kernel hacking -> Cpuidle latency selftests

The selftest inserts the module, disables all the idle states and
enables them one by one testing the following:
1. Keeping source CPU constant, iterates through all the CPUS measuring
IPI latency for baseline (CPU is busy with
"cat /dev/random > /dev/null" workload) and the when the CPU is
allowed to be at rest
2. Iterating through all the CPUs, sending expected timer durations to
be equivalent to the residency of the the deepest idle state
enabled and extracting the difference in time between the time of
wakeup and the expected timer duration

Usage
-----
Can be used in conjuction to the rest of the selftests.
Default Output location in: tools/testing/cpuidle/cpuidle.log

To run this test specifically:
$ make -C tools/testing/selftests TARGETS="cpuidle" run_tests

There are a few optinal arguments too that the script can take
[-h <help>]
[-m <location of the module>]
[-o <location of the output>]

Sample output snippet
---------------------
--IPI Latency Test---
--Baseline IPI Latency measurement: CPU Busy--
SRC_CPU DEST_CPU IPI_Latency(ns)
...
0 8 1996
0 9 2125
0 10 1264
0 11 1788
0 12 2045
Baseline Average IPI latency(ns): 1843
---Enabling state: 5---
SRC_CPU DEST_CPU IPI_Latency(ns)
0 8 621719
0 9 624752
0 10 622218
0 11 623968
0 12 621303
Expected IPI latency(ns): 100000
Observed Average IPI latency(ns): 622792

--Timeout Latency Test--
--Baseline Timeout Latency measurement: CPU Busy--
Wakeup_src Baseline_delay(ns)
...
8 2249
9 2226
10 2211
11 2183
12 2263
Baseline Average timeout diff(ns): 2226
---Enabling state: 5---
8 10749
9 10911
10 10912
11 12100
12 73276
Expected timeout(ns): 10000200
Observed Average timeout diff(ns): 23589

Pratik Rajesh Sampat (2):
cpuidle: Trace IPI based and timer based wakeup latency from idle
states
selftest/cpuidle: Add support for cpuidle latency measurement

drivers/cpuidle/Makefile | 1 +
drivers/cpuidle/test-cpuidle_latency.c | 150 ++++++++++++
lib/Kconfig.debug | 10 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/cpuidle/Makefile | 6 +
tools/testing/selftests/cpuidle/cpuidle.sh | 257 +++++++++++++++++++++
tools/testing/selftests/cpuidle/settings | 1 +
7 files changed, 426 insertions(+)
create mode 100644 drivers/cpuidle/test-cpuidle_latency.c
create mode 100644 tools/testing/selftests/cpuidle/Makefile
create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh
create mode 100644 tools/testing/selftests/cpuidle/settings

--
2.25.4


2020-07-17 09:20:07

by Pratik R. Sampat

[permalink] [raw]
Subject: [PATCH v2 2/2] selftest/cpuidle: Add support for cpuidle latency measurement

This patch adds support to trace IPI based and timer based wakeup
latency from idle states

Latches onto the test-cpuidle_latency kernel module using the debugfs
interface to send IPIs or schedule a timer based event, which in-turn
populates the debugfs with the latency measurements.

Currently for the IPI and timer tests; first disable all idle states
and then test for latency measurements incrementally enabling each state

Signed-off-by: Pratik Rajesh Sampat <[email protected]>
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/cpuidle/Makefile | 6 +
tools/testing/selftests/cpuidle/cpuidle.sh | 257 +++++++++++++++++++++
tools/testing/selftests/cpuidle/settings | 1 +
4 files changed, 265 insertions(+)
create mode 100644 tools/testing/selftests/cpuidle/Makefile
create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh
create mode 100644 tools/testing/selftests/cpuidle/settings

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 1195bd85af38..ab6cf51f3518 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -7,6 +7,7 @@ TARGETS += capabilities
TARGETS += cgroup
TARGETS += clone3
TARGETS += cpufreq
+TARGETS += cpuidle
TARGETS += cpu-hotplug
TARGETS += drivers/dma-buf
TARGETS += efivarfs
diff --git a/tools/testing/selftests/cpuidle/Makefile b/tools/testing/selftests/cpuidle/Makefile
new file mode 100644
index 000000000000..72fd5d2e974d
--- /dev/null
+++ b/tools/testing/selftests/cpuidle/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+all:
+
+TEST_PROGS := cpuidle.sh
+
+include ../lib.mk
diff --git a/tools/testing/selftests/cpuidle/cpuidle.sh b/tools/testing/selftests/cpuidle/cpuidle.sh
new file mode 100755
index 000000000000..5c313a3abb0c
--- /dev/null
+++ b/tools/testing/selftests/cpuidle/cpuidle.sh
@@ -0,0 +1,257 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+LOG=cpuidle.log
+MODULE=/lib/modules/$(uname -r)/kernel/drivers/cpuidle/test-cpuidle_latency.ko
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+helpme()
+{
+ printf "Usage: $0 [-h] [-todg args]
+ [-h <help>]
+ [-m <location of the module>]
+ [-o <location of the output>]
+ \n"
+ exit 2
+}
+
+parse_arguments()
+{
+ while getopts ht:m:o: arg
+ do
+ case $arg in
+ h) # --help
+ helpme
+ ;;
+ m) # --mod-file
+ MODULE=$OPTARG
+ ;;
+ o) # output log files
+ LOG=$OPTARG
+ ;;
+ \?)
+ helpme
+ ;;
+ esac
+ done
+}
+
+ins_mod()
+{
+ if [ ! -f "$MODULE" ]; then
+ printf "$MODULE module does not exist. Exitting\n"
+ exit $ksft_skip
+ fi
+ printf "Inserting $MODULE module\n\n"
+ insmod $MODULE
+ if [ $? != 0 ]; then
+ printf "Insmod $MODULE failed\n"
+ exit $ksft_skip
+ fi
+}
+
+compute_average()
+{
+ arr=("$@")
+ sum=0
+ size=${#arr[@]}
+ for i in "${arr[@]}"
+ do
+ sum=$((sum + i))
+ done
+ avg=$((sum/size))
+}
+
+# Disable all stop states
+disable_idle()
+{
+ for ((cpu=0; cpu<NUM_CPUS; cpu++))
+ do
+ for ((state=0; state<NUM_STATES; state++))
+ do
+ echo 1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$state/disable
+ done
+ done
+}
+
+# Perform operation on each CPU for the given state
+# $1 - Operation: enable (0) / disable (1)
+# $2 - State to enable
+op_state()
+{
+ for ((cpu=0; cpu<NUM_CPUS; cpu++))
+ do
+ echo $1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$2/disable
+ done
+}
+
+# Extract latency in microseconds and convert to nanoseconds
+extract_latency()
+{
+ for ((state=0; state<NUM_STATES; state++))
+ do
+ latency=$(($(cat /sys/devices/system/cpu/cpu0/cpuidle/state$state/latency) * 1000))
+ latency_arr+=($latency)
+ done
+}
+
+# Run the IPI test
+# $1 run for baseline - busy cpu or regular environment
+# $2 destination cpu
+ipi_test_once()
+{
+ dest_cpu=$2
+ if [ "$1" = "baseline" ]; then
+ # Keep the CPU busy
+ taskset -c $dest_cpu cat /dev/random > /dev/null &
+ task_pid=$!
+ # Wait for the workload to achieve 100% CPU usage
+ sleep 1
+ fi
+ taskset 0x1 echo $dest_cpu > /sys/kernel/debug/latency_test/ipi_cpu_dest
+ ipi_latency=$(cat /sys/kernel/debug/latency_test/ipi_latency_ns)
+ src_cpu=$(cat /sys/kernel/debug/latency_test/ipi_cpu_src)
+ if [ "$1" = "baseline" ]; then
+ kill $task_pid
+ wait $task_pid 2>/dev/null
+ fi
+}
+
+# Incrementally Enable idle states one by one and compute the latency
+run_ipi_tests()
+{
+ extract_latency
+ disable_idle
+ declare -a avg_arr
+ echo -e "--IPI Latency Test---" >> $LOG
+
+ echo -e "--Baseline IPI Latency measurement: CPU Busy--" >> $LOG
+ printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" "IPI_Latency(ns)" >> $LOG
+ for ((cpu=0; cpu<NUM_CPUS; cpu++))
+ do
+ ipi_test_once "baseline" $cpu
+ printf "%-3s %10s %12s\n" $src_cpu $cpu $ipi_latency >> $LOG
+ avg_arr+=($ipi_latency)
+ done
+ compute_average "${avg_arr[@]}"
+ echo -e "Baseline Average IPI latency(ns): $avg" >> $LOG
+
+ for ((state=0; state<NUM_STATES; state++))
+ do
+ unset avg_arr
+ echo -e "---Enabling state: $state---" >> $LOG
+ op_state 0 $state
+ printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" "IPI_Latency(ns)" >> $LOG
+ for ((cpu=0; cpu<NUM_CPUS; cpu++))
+ do
+ # Running IPI test and logging results
+ sleep 1
+ ipi_test_once "test" $cpu
+ printf "%-3s %10s %12s\n" $src_cpu $cpu $ipi_latency >> $LOG
+ avg_arr+=($ipi_latency)
+ done
+ compute_average "${avg_arr[@]}"
+ echo -e "Expected IPI latency(ns): ${latency_arr[$state]}" >> $LOG
+ echo -e "Observed Average IPI latency(ns): $avg" >> $LOG
+ op_state 1 $state
+ done
+}
+
+# Extract the residency in microseconds and convert to nanoseconds.
+# Add 100 ns so that the timer stays for a little longer than the residency
+extract_residency()
+{
+ for ((state=0; state<NUM_STATES; state++))
+ do
+ residency=$(($(cat /sys/devices/system/cpu/cpu0/cpuidle/state$state/residency) * 1000 + 200))
+ residency_arr+=($residency)
+ done
+}
+
+# Run the Timeout test
+# $1 run for baseline - busy cpu or regular environment
+# $2 destination cpu
+# $3 timeout
+timeout_test_once()
+{
+ dest_cpu=$2
+ if [ "$1" = "baseline" ]; then
+ # Keep the CPU busy
+ taskset -c $dest_cpu cat /dev/random > /dev/null &
+ task_pid=$!
+ # Wait for the workload to achieve 100% CPU usage
+ sleep 1
+ fi
+ taskset -c $dest_cpu echo $3 > /sys/kernel/debug/latency_test/timeout_expected_ns
+ # Wait for the result to populate
+ sleep 0.1
+ timeout_diff=$(cat /sys/kernel/debug/latency_test/timeout_diff_ns)
+ src_cpu=$(cat /sys/kernel/debug/latency_test/timeout_cpu_src)
+ if [ "$1" = "baseline" ]; then
+ kill $task_pid
+ wait $task_pid 2>/dev/null
+ fi
+}
+
+run_timeout_tests()
+{
+ extract_residency
+ disable_idle
+ declare -a avg_arr
+ echo -e "\n--Timeout Latency Test--" >> $LOG
+
+ echo -e "--Baseline Timeout Latency measurement: CPU Busy--" >> $LOG
+ printf "%s %10s %10s\n" "Wakeup_src" "Baseline_delay(ns)">> $LOG
+ for ((cpu=0; cpu<NUM_CPUS; cpu++))
+ do
+ timeout_test_once "baseline" $cpu ${residency_arr[0]}
+ printf "%-3s %13s\n" $src_cpu $timeout_diff >> $LOG
+ avg_arr+=($timeout_diff)
+ done
+ compute_average "${avg_arr[@]}"
+ echo -e "Baseline Average timeout diff(ns): $avg" >> $LOG
+
+ for ((state=0; state<NUM_STATES; state++))
+ do
+ echo -e "---Enabling state: $state---" >> $LOG
+ op_state 0 $state
+ printf "%s %10s %10s\n" "Wakeup_src" "Baseline_delay(ns)" "Delay(ns)" >> $LOG
+ unset avg_arr
+ for ((cpu=0; cpu<NUM_CPUS; cpu++))
+ do
+ timeout_test_once "test" $cpu ${residency_arr[$state]}
+ printf "%-3s %13s %18s\n" $src_cpu $baseline_timeout_diff $timeout_diff >> $LOG
+ avg_arr+=($timeout_diff)
+ done
+ compute_average "${avg_arr[@]}"
+ echo -e "Expected timeout(ns): ${residency_arr[$state]}" >> $LOG
+ echo -e "Observed Average timeout diff(ns): $avg" >> $LOG
+ op_state 1 $state
+ done
+}
+
+declare -a residency_arr
+declare -a latency_arr
+
+# Parse arguments
+parse_arguments $@
+
+rm -f $LOG
+touch $LOG
+NUM_CPUS=$(nproc --all)
+NUM_STATES=$(ls -1 /sys/devices/system/cpu/cpu0/cpuidle/ | wc -l)
+
+# Insert the module
+ins_mod $MODULE
+
+printf "Started IPI latency tests\n"
+run_ipi_tests
+
+printf "Started Timer latency tests\n"
+run_timeout_tests
+
+printf "Removing $MODULE module\n"
+printf "Output logged at: $LOG\n"
+rmmod $MODULE
diff --git a/tools/testing/selftests/cpuidle/settings b/tools/testing/selftests/cpuidle/settings
new file mode 100644
index 000000000000..e7b9417537fb
--- /dev/null
+++ b/tools/testing/selftests/cpuidle/settings
@@ -0,0 +1 @@
+timeout=0
--
2.25.4

2020-07-17 09:20:19

by Pratik R. Sampat

[permalink] [raw]
Subject: [PATCH v2 1/2] cpuidle: Trace IPI based and timer based wakeup latency from idle states

Fire directed smp_call_function_single IPIs from a specified source
CPU to the specified target CPU to reduce the noise we have to wade
through in the trace log.
The module is based on the idea written by Srivatsa Bhat and maintained
by Vaidyanathan Srinivasan internally.

Queue HR timer and measure jitter. Wakeup latency measurement for idle
states using hrtimer. Echo a value in ns to timer_test_function and
watch trace. A HRtimer will be queued and when it fires the expected
wakeup vs actual wakeup is computes and delay printed in ns.

Implemented as a module which utilizes debugfs so that it can be
integrated with selftests.

To include the module, check option and include as module
kernel hacking -> Cpuidle latency selftests

[[email protected]: Initial implementation in
cpidle/sysfs]

[[email protected]: wakeup latency measurements using hrtimer
and fix some of the time calculation]

[[email protected]: Fix some whitespace and tab errors and
increase the resolution of IPI wakeup]

Signed-off-by: Pratik Rajesh Sampat <[email protected]>
---
drivers/cpuidle/Makefile | 1 +
drivers/cpuidle/test-cpuidle_latency.c | 150 +++++++++++++++++++++++++
lib/Kconfig.debug | 10 ++
3 files changed, 161 insertions(+)
create mode 100644 drivers/cpuidle/test-cpuidle_latency.c

diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
index f07800cbb43f..2ae05968078c 100644
--- a/drivers/cpuidle/Makefile
+++ b/drivers/cpuidle/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o
obj-$(CONFIG_DT_IDLE_STATES) += dt_idle_states.o
obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_state.o
obj-$(CONFIG_HALTPOLL_CPUIDLE) += cpuidle-haltpoll.o
+obj-$(CONFIG_IDLE_LATENCY_SELFTEST) += test-cpuidle_latency.o

##################################################################################
# ARM SoC drivers
diff --git a/drivers/cpuidle/test-cpuidle_latency.c b/drivers/cpuidle/test-cpuidle_latency.c
new file mode 100644
index 000000000000..61574665e972
--- /dev/null
+++ b/drivers/cpuidle/test-cpuidle_latency.c
@@ -0,0 +1,150 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Module-based API test facility for cpuidle latency using IPIs and timers
+ */
+
+#include <linux/debugfs.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+/* IPI based wakeup latencies */
+struct latency {
+ unsigned int src_cpu;
+ unsigned int dest_cpu;
+ ktime_t time_start;
+ ktime_t time_end;
+ u64 latency_ns;
+} ipi_wakeup;
+
+static void measure_latency(void *info)
+{
+ struct latency *v;
+ ktime_t time_diff;
+
+ v = (struct latency *)info;
+ v->time_end = ktime_get();
+ time_diff = ktime_sub(v->time_end, v->time_start);
+ v->latency_ns = ktime_to_ns(time_diff);
+}
+
+void run_smp_call_function_test(unsigned int cpu)
+{
+ ipi_wakeup.src_cpu = smp_processor_id();
+ ipi_wakeup.dest_cpu = cpu;
+ ipi_wakeup.time_start = ktime_get();
+ smp_call_function_single(cpu, measure_latency, &ipi_wakeup, 1);
+}
+
+/* Timer based wakeup latencies */
+struct timer_data {
+ unsigned int src_cpu;
+ u64 timeout;
+ ktime_t time_start;
+ ktime_t time_end;
+ struct hrtimer timer;
+ u64 timeout_diff_ns;
+} timer_wakeup;
+
+static enum hrtimer_restart timer_called(struct hrtimer *hrtimer)
+{
+ struct timer_data *w;
+ ktime_t time_diff;
+
+ w = container_of(hrtimer, struct timer_data, timer);
+ w->time_end = ktime_get();
+
+ time_diff = ktime_sub(w->time_end, w->time_start);
+ time_diff = ktime_sub(time_diff, ns_to_ktime(w->timeout));
+ w->timeout_diff_ns = ktime_to_ns(time_diff);
+ return HRTIMER_NORESTART;
+}
+
+static void run_timer_test(unsigned int ns)
+{
+ hrtimer_init(&timer_wakeup.timer, CLOCK_MONOTONIC,
+ HRTIMER_MODE_REL);
+ timer_wakeup.timer.function = timer_called;
+ timer_wakeup.time_start = ktime_get();
+ timer_wakeup.src_cpu = smp_processor_id();
+ timer_wakeup.timeout = ns;
+
+ hrtimer_start(&timer_wakeup.timer, ns_to_ktime(ns),
+ HRTIMER_MODE_REL_PINNED);
+}
+
+static struct dentry *dir;
+
+static int cpu_read_op(void *data, u64 *value)
+{
+ *value = ipi_wakeup.dest_cpu;
+ return 0;
+}
+
+static int cpu_write_op(void *data, u64 value)
+{
+ run_smp_call_function_test(value);
+ return 0;
+}
+DEFINE_SIMPLE_ATTRIBUTE(ipi_ops, cpu_read_op, cpu_write_op, "%llu\n");
+
+static int timeout_read_op(void *data, u64 *value)
+{
+ *value = timer_wakeup.timeout;
+ return 0;
+}
+
+static int timeout_write_op(void *data, u64 value)
+{
+ run_timer_test(value);
+ return 0;
+}
+DEFINE_SIMPLE_ATTRIBUTE(timeout_ops, timeout_read_op, timeout_write_op, "%llu\n");
+
+static int __init latency_init(void)
+{
+ struct dentry *temp;
+
+ dir = debugfs_create_dir("latency_test", 0);
+ if (!dir) {
+ pr_alert("latency_test: failed to create /sys/kernel/debug/latency_test\n");
+ return -1;
+ }
+ temp = debugfs_create_file("ipi_cpu_dest",
+ 0666,
+ dir,
+ NULL,
+ &ipi_ops);
+ if (!temp) {
+ pr_alert("latency_test: failed to create /sys/kernel/debug/ipi_cpu_dest\n");
+ return -1;
+ }
+ debugfs_create_u64("ipi_latency_ns", 0444, dir, &ipi_wakeup.latency_ns);
+ debugfs_create_u32("ipi_cpu_src", 0444, dir, &ipi_wakeup.src_cpu);
+
+ temp = debugfs_create_file("timeout_expected_ns",
+ 0666,
+ dir,
+ NULL,
+ &timeout_ops);
+ if (!temp) {
+ pr_alert("latency_test: failed to create /sys/kernel/debug/timeout_expected_ns\n");
+ return -1;
+ }
+ debugfs_create_u64("timeout_diff_ns", 0444, dir, &timer_wakeup.timeout_diff_ns);
+ debugfs_create_u32("timeout_cpu_src", 0444, dir, &timer_wakeup.src_cpu);
+ pr_info("Latency Test module loaded\n");
+ return 0;
+}
+
+static void __exit latency_cleanup(void)
+{
+ pr_info("Cleaning up Latency Test module.\n");
+ debugfs_remove_recursive(dir);
+}
+
+module_init(latency_init);
+module_exit(latency_cleanup);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("IBM Corporation");
+MODULE_DESCRIPTION("Measuring idle latency for IPIs and Timers");
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index d74ac0fd6b2d..e2283790245a 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1375,6 +1375,16 @@ config DEBUG_KOBJECT
If you say Y here, some extra kobject debugging messages will be sent
to the syslog.

+config IDLE_LATENCY_SELFTEST
+ tristate "Cpuidle latency selftests"
+ depends on CPU_IDLE
+ help
+ This option provides a kernel module that runs tests using the IPI and
+ timers to measure latency.
+
+ Say M if you want these self tests to build as a module.
+ Say N if you are unsure.
+
config DEBUG_KOBJECT_RELEASE
bool "kobject release debugging"
depends on DEBUG_OBJECTS_TIMERS
--
2.25.4

2020-07-20 05:01:54

by Gautham R Shenoy

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] cpuidle: Trace IPI based and timer based wakeup latency from idle states

On Fri, Jul 17, 2020 at 02:48:00PM +0530, Pratik Rajesh Sampat wrote:
> Fire directed smp_call_function_single IPIs from a specified source
> CPU to the specified target CPU to reduce the noise we have to wade
> through in the trace log.
> The module is based on the idea written by Srivatsa Bhat and maintained
> by Vaidyanathan Srinivasan internally.
>
> Queue HR timer and measure jitter. Wakeup latency measurement for idle
> states using hrtimer. Echo a value in ns to timer_test_function and
> watch trace. A HRtimer will be queued and when it fires the expected
> wakeup vs actual wakeup is computes and delay printed in ns.
>
> Implemented as a module which utilizes debugfs so that it can be
> integrated with selftests.
>
> To include the module, check option and include as module
> kernel hacking -> Cpuidle latency selftests
>
> [[email protected]: Initial implementation in
> cpidle/sysfs]
>
> [[email protected]: wakeup latency measurements using hrtimer
> and fix some of the time calculation]
>
> [[email protected]: Fix some whitespace and tab errors and
> increase the resolution of IPI wakeup]
>
> Signed-off-by: Pratik Rajesh Sampat <[email protected]>


The debugfs module looks good to me.

Reviewed-by: Gautham R. Shenoy <[email protected]>


> ---
> drivers/cpuidle/Makefile | 1 +
> drivers/cpuidle/test-cpuidle_latency.c | 150 +++++++++++++++++++++++++
> lib/Kconfig.debug | 10 ++
> 3 files changed, 161 insertions(+)
> create mode 100644 drivers/cpuidle/test-cpuidle_latency.c
>
> diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
> index f07800cbb43f..2ae05968078c 100644
> --- a/drivers/cpuidle/Makefile
> +++ b/drivers/cpuidle/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o
> obj-$(CONFIG_DT_IDLE_STATES) += dt_idle_states.o
> obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_state.o
> obj-$(CONFIG_HALTPOLL_CPUIDLE) += cpuidle-haltpoll.o
> +obj-$(CONFIG_IDLE_LATENCY_SELFTEST) += test-cpuidle_latency.o
>
> ##################################################################################
> # ARM SoC drivers
> diff --git a/drivers/cpuidle/test-cpuidle_latency.c b/drivers/cpuidle/test-cpuidle_latency.c
> new file mode 100644
> index 000000000000..61574665e972
> --- /dev/null
> +++ b/drivers/cpuidle/test-cpuidle_latency.c
> @@ -0,0 +1,150 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Module-based API test facility for cpuidle latency using IPIs and timers
> + */
> +
> +#include <linux/debugfs.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +
> +/* IPI based wakeup latencies */
> +struct latency {
> + unsigned int src_cpu;
> + unsigned int dest_cpu;
> + ktime_t time_start;
> + ktime_t time_end;
> + u64 latency_ns;
> +} ipi_wakeup;
> +
> +static void measure_latency(void *info)
> +{
> + struct latency *v;
> + ktime_t time_diff;
> +
> + v = (struct latency *)info;
> + v->time_end = ktime_get();
> + time_diff = ktime_sub(v->time_end, v->time_start);
> + v->latency_ns = ktime_to_ns(time_diff);
> +}
> +
> +void run_smp_call_function_test(unsigned int cpu)
> +{
> + ipi_wakeup.src_cpu = smp_processor_id();
> + ipi_wakeup.dest_cpu = cpu;
> + ipi_wakeup.time_start = ktime_get();
> + smp_call_function_single(cpu, measure_latency, &ipi_wakeup, 1);
> +}
> +
> +/* Timer based wakeup latencies */
> +struct timer_data {
> + unsigned int src_cpu;
> + u64 timeout;
> + ktime_t time_start;
> + ktime_t time_end;
> + struct hrtimer timer;
> + u64 timeout_diff_ns;
> +} timer_wakeup;
> +
> +static enum hrtimer_restart timer_called(struct hrtimer *hrtimer)
> +{
> + struct timer_data *w;
> + ktime_t time_diff;
> +
> + w = container_of(hrtimer, struct timer_data, timer);
> + w->time_end = ktime_get();
> +
> + time_diff = ktime_sub(w->time_end, w->time_start);
> + time_diff = ktime_sub(time_diff, ns_to_ktime(w->timeout));
> + w->timeout_diff_ns = ktime_to_ns(time_diff);
> + return HRTIMER_NORESTART;
> +}
> +
> +static void run_timer_test(unsigned int ns)
> +{
> + hrtimer_init(&timer_wakeup.timer, CLOCK_MONOTONIC,
> + HRTIMER_MODE_REL);
> + timer_wakeup.timer.function = timer_called;
> + timer_wakeup.time_start = ktime_get();
> + timer_wakeup.src_cpu = smp_processor_id();
> + timer_wakeup.timeout = ns;
> +
> + hrtimer_start(&timer_wakeup.timer, ns_to_ktime(ns),
> + HRTIMER_MODE_REL_PINNED);
> +}
> +
> +static struct dentry *dir;
> +
> +static int cpu_read_op(void *data, u64 *value)
> +{
> + *value = ipi_wakeup.dest_cpu;
> + return 0;
> +}
> +
> +static int cpu_write_op(void *data, u64 value)
> +{
> + run_smp_call_function_test(value);
> + return 0;
> +}
> +DEFINE_SIMPLE_ATTRIBUTE(ipi_ops, cpu_read_op, cpu_write_op, "%llu\n");
> +
> +static int timeout_read_op(void *data, u64 *value)
> +{
> + *value = timer_wakeup.timeout;
> + return 0;
> +}
> +
> +static int timeout_write_op(void *data, u64 value)
> +{
> + run_timer_test(value);
> + return 0;
> +}
> +DEFINE_SIMPLE_ATTRIBUTE(timeout_ops, timeout_read_op, timeout_write_op, "%llu\n");
> +
> +static int __init latency_init(void)
> +{
> + struct dentry *temp;
> +
> + dir = debugfs_create_dir("latency_test", 0);
> + if (!dir) {
> + pr_alert("latency_test: failed to create /sys/kernel/debug/latency_test\n");
> + return -1;
> + }
> + temp = debugfs_create_file("ipi_cpu_dest",
> + 0666,
> + dir,
> + NULL,
> + &ipi_ops);
> + if (!temp) {
> + pr_alert("latency_test: failed to create /sys/kernel/debug/ipi_cpu_dest\n");
> + return -1;
> + }
> + debugfs_create_u64("ipi_latency_ns", 0444, dir, &ipi_wakeup.latency_ns);
> + debugfs_create_u32("ipi_cpu_src", 0444, dir, &ipi_wakeup.src_cpu);
> +
> + temp = debugfs_create_file("timeout_expected_ns",
> + 0666,
> + dir,
> + NULL,
> + &timeout_ops);
> + if (!temp) {
> + pr_alert("latency_test: failed to create /sys/kernel/debug/timeout_expected_ns\n");
> + return -1;
> + }
> + debugfs_create_u64("timeout_diff_ns", 0444, dir, &timer_wakeup.timeout_diff_ns);
> + debugfs_create_u32("timeout_cpu_src", 0444, dir, &timer_wakeup.src_cpu);
> + pr_info("Latency Test module loaded\n");
> + return 0;
> +}
> +
> +static void __exit latency_cleanup(void)
> +{
> + pr_info("Cleaning up Latency Test module.\n");
> + debugfs_remove_recursive(dir);
> +}
> +
> +module_init(latency_init);
> +module_exit(latency_cleanup);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("IBM Corporation");
> +MODULE_DESCRIPTION("Measuring idle latency for IPIs and Timers");
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index d74ac0fd6b2d..e2283790245a 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1375,6 +1375,16 @@ config DEBUG_KOBJECT
> If you say Y here, some extra kobject debugging messages will be sent
> to the syslog.
>
> +config IDLE_LATENCY_SELFTEST
> + tristate "Cpuidle latency selftests"
> + depends on CPU_IDLE
> + help
> + This option provides a kernel module that runs tests using the IPI and
> + timers to measure latency.
> +
> + Say M if you want these self tests to build as a module.
> + Say N if you are unsure.
> +
> config DEBUG_KOBJECT_RELEASE
> bool "kobject release debugging"
> depends on DEBUG_OBJECTS_TIMERS
> --
> 2.25.4
>

2020-07-20 05:56:14

by Gautham R Shenoy

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] selftest/cpuidle: Add support for cpuidle latency measurement

Hi Pratik,


On Fri, Jul 17, 2020 at 02:48:01PM +0530, Pratik Rajesh Sampat wrote:
> This patch adds support to trace IPI based and timer based wakeup
> latency from idle states
>
> Latches onto the test-cpuidle_latency kernel module using the debugfs
> interface to send IPIs or schedule a timer based event, which in-turn
> populates the debugfs with the latency measurements.
>
> Currently for the IPI and timer tests; first disable all idle states
> and then test for latency measurements incrementally enabling each state
>
> Signed-off-by: Pratik Rajesh Sampat <[email protected]>

A few comments below.

> ---
> tools/testing/selftests/Makefile | 1 +
> tools/testing/selftests/cpuidle/Makefile | 6 +
> tools/testing/selftests/cpuidle/cpuidle.sh | 257 +++++++++++++++++++++
> tools/testing/selftests/cpuidle/settings | 1 +
> 4 files changed, 265 insertions(+)
> create mode 100644 tools/testing/selftests/cpuidle/Makefile
> create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh
> create mode 100644 tools/testing/selftests/cpuidle/settings
>

[..skip..]

> +
> +ins_mod()
> +{
> + if [ ! -f "$MODULE" ]; then
> + printf "$MODULE module does not exist. Exitting\n"

If the module has been compiled into the kernel (due to a
localyesconfig, for instance), then it is unlikely that we will find
it in /lib/modules. Perhaps you want to check if the debugfs
directories created by the module exist, and if so, print a message
saying that the modules is already loaded or some such?

> + exit $ksft_skip
> + fi
> + printf "Inserting $MODULE module\n\n"
> + insmod $MODULE
> + if [ $? != 0 ]; then
> + printf "Insmod $MODULE failed\n"
> + exit $ksft_skip
> + fi
> +}
> +
> +compute_average()
> +{
> + arr=("$@")
> + sum=0
> + size=${#arr[@]}
> + for i in "${arr[@]}"
> + do
> + sum=$((sum + i))
> + done
> + avg=$((sum/size))

It would be good to assert that "size" isn't 0 here.

> +}
> +
> +# Disable all stop states
> +disable_idle()
> +{
> + for ((cpu=0; cpu<NUM_CPUS; cpu++))
> + do
> + for ((state=0; state<NUM_STATES; state++))
> + do
> + echo 1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$state/disable

So, on offlined CPUs, we won't see
/sys/devices/system/cpu/cpu$cpu/cpuidle/state$state directory. You
should probably perform this operation only on online CPUs.


> + done
> + done
> +}
> +
> +# Perform operation on each CPU for the given state
> +# $1 - Operation: enable (0) / disable (1)
> +# $2 - State to enable
> +op_state()
> +{
> + for ((cpu=0; cpu<NUM_CPUS; cpu++))
> + do
> + echo $1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$2/disable


Ditto

> + done
> +}

This is a helper function. For better readability of the main code you
can define the following wrappers and use them.


cpuidle_enable_state()
{
state=$1
op_state 1 $state
}

cpuidle_disable_state()
{
state=$1
op_state 0 $state

}


> +

[..snip..]

> +run_ipi_tests()
> +{
> + extract_latency
> + disable_idle
> + declare -a avg_arr
> + echo -e "--IPI Latency Test---" >> $LOG
> +
> + echo -e "--Baseline IPI Latency measurement: CPU Busy--" >> $LOG
> + printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" "IPI_Latency(ns)" >> $LOG
> + for ((cpu=0; cpu<NUM_CPUS; cpu++))
> + do
> + ipi_test_once "baseline" $cpu
> + printf "%-3s %10s %12s\n" $src_cpu $cpu $ipi_latency >> $LOG
> + avg_arr+=($ipi_latency)
> + done
> + compute_average "${avg_arr[@]}"
> + echo -e "Baseline Average IPI latency(ns): $avg" >> $LOG
> +
> + for ((state=0; state<NUM_STATES; state++))
> + do
> + unset avg_arr
> + echo -e "---Enabling state: $state---" >> $LOG
> + op_state 0 $state
> + printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" "IPI_Latency(ns)" >> $LOG
> + for ((cpu=0; cpu<NUM_CPUS; cpu++))
> + do

If a CPU is offline, then we should skip it here.

> + # Running IPI test and logging results
> + sleep 1
> + ipi_test_once "test" $cpu
> + printf "%-3s %10s %12s\n" $src_cpu $cpu $ipi_latency >> $LOG
> + avg_arr+=($ipi_latency)
> + done
> + compute_average "${avg_arr[@]}"
> + echo -e "Expected IPI latency(ns): ${latency_arr[$state]}" >> $LOG
> + echo -e "Observed Average IPI latency(ns): $avg" >> $LOG
> + op_state 1 $state
> + done
> +}
> +
> +# Extract the residency in microseconds and convert to nanoseconds.
> +# Add 100 ns so that the timer stays for a little longer than the residency
> +extract_residency()
> +{
> + for ((state=0; state<NUM_STATES; state++))
> + do
> + residency=$(($(cat /sys/devices/system/cpu/cpu0/cpuidle/state$state/residency) * 1000 + 200))
> + residency_arr+=($residency)
> + done
> +}
> +
> +# Run the Timeout test
> +# $1 run for baseline - busy cpu or regular environment
> +# $2 destination cpu
> +# $3 timeout
> +timeout_test_once()
> +{
> + dest_cpu=$2
> + if [ "$1" = "baseline" ]; then
> + # Keep the CPU busy
> + taskset -c $dest_cpu cat /dev/random > /dev/null &
> + task_pid=$!
> + # Wait for the workload to achieve 100% CPU usage
> + sleep 1
> + fi
> + taskset -c $dest_cpu echo $3 > /sys/kernel/debug/latency_test/timeout_expected_ns
> + # Wait for the result to populate
> + sleep 0.1
> + timeout_diff=$(cat /sys/kernel/debug/latency_test/timeout_diff_ns)
> + src_cpu=$(cat /sys/kernel/debug/latency_test/timeout_cpu_src)
> + if [ "$1" = "baseline" ]; then
> + kill $task_pid
> + wait $task_pid 2>/dev/null
> + fi
> +}
> +
> +run_timeout_tests()
> +{
> + extract_residency
> + disable_idle
> + declare -a avg_arr
> + echo -e "\n--Timeout Latency Test--" >> $LOG
> +
> + echo -e "--Baseline Timeout Latency measurement: CPU Busy--" >> $LOG
> + printf "%s %10s %10s\n" "Wakeup_src" "Baseline_delay(ns)">> $LOG
> + for ((cpu=0; cpu<NUM_CPUS; cpu++))
> + do

Again only perform this on online CPUs.

> + timeout_test_once "baseline" $cpu ${residency_arr[0]}
> + printf "%-3s %13s\n" $src_cpu $timeout_diff >> $LOG
> + avg_arr+=($timeout_diff)
> + done
> + compute_average "${avg_arr[@]}"
> + echo -e "Baseline Average timeout diff(ns): $avg" >> $LOG
> +
> + for ((state=0; state<NUM_STATES; state++))
> + do

> + echo -e "---Enabling state: $state---" >> $LOG
> + op_state 0 $state
> + printf "%s %10s %10s\n" "Wakeup_src" "Baseline_delay(ns)" "Delay(ns)" >> $LOG
> + unset avg_arr
> + for ((cpu=0; cpu<NUM_CPUS; cpu++))
> + do

Ditto.

> + timeout_test_once "test" $cpu ${residency_arr[$state]}
> + printf "%-3s %13s %18s\n" $src_cpu $baseline_timeout_diff $timeout_diff >> $LOG
> + avg_arr+=($timeout_diff)
> + done
> + compute_average "${avg_arr[@]}"
> + echo -e "Expected timeout(ns): ${residency_arr[$state]}" >> $LOG
> + echo -e "Observed Average timeout diff(ns): $avg" >> $LOG
> + op_state 1 $state
> + done
> +}
> +

--
Thanks and Regards
gautham.

2020-07-21 11:58:02

by Pratik R. Sampat

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] selftest/cpuidle: Add support for cpuidle latency measurement

Hi Gautham, Thanks for the review.


On 20/07/20 11:22 am, Gautham R Shenoy wrote:
> Hi Pratik,
>
>
> On Fri, Jul 17, 2020 at 02:48:01PM +0530, Pratik Rajesh Sampat wrote:
>> This patch adds support to trace IPI based and timer based wakeup
>> latency from idle states
>>
>> Latches onto the test-cpuidle_latency kernel module using the debugfs
>> interface to send IPIs or schedule a timer based event, which in-turn
>> populates the debugfs with the latency measurements.
>>
>> Currently for the IPI and timer tests; first disable all idle states
>> and then test for latency measurements incrementally enabling each state
>>
>> Signed-off-by: Pratik Rajesh Sampat <[email protected]>
> A few comments below.
>
>> ---
>> tools/testing/selftests/Makefile | 1 +
>> tools/testing/selftests/cpuidle/Makefile | 6 +
>> tools/testing/selftests/cpuidle/cpuidle.sh | 257 +++++++++++++++++++++
>> tools/testing/selftests/cpuidle/settings | 1 +
>> 4 files changed, 265 insertions(+)
>> create mode 100644 tools/testing/selftests/cpuidle/Makefile
>> create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh
>> create mode 100644 tools/testing/selftests/cpuidle/settings
>>
> [..skip..]
>
>> +
>> +ins_mod()
>> +{
>> + if [ ! -f "$MODULE" ]; then
>> + printf "$MODULE module does not exist. Exitting\n"
> If the module has been compiled into the kernel (due to a
> localyesconfig, for instance), then it is unlikely that we will find
> it in /lib/modules. Perhaps you want to check if the debugfs
> directories created by the module exist, and if so, print a message
> saying that the modules is already loaded or some such?
>
That's a good idea. I can can grep for this module within /proc/modules
and not insert it, if it is already there

>> + exit $ksft_skip
>> + fi
>> + printf "Inserting $MODULE module\n\n"
>> + insmod $MODULE
>> + if [ $? != 0 ]; then
>> + printf "Insmod $MODULE failed\n"
>> + exit $ksft_skip
>> + fi
>> +}
>> +
>> +compute_average()
>> +{
>> + arr=("$@")
>> + sum=0
>> + size=${#arr[@]}
>> + for i in "${arr[@]}"
>> + do
>> + sum=$((sum + i))
>> + done
>> + avg=$((sum/size))
> It would be good to assert that "size" isn't 0 here.

Sure

>> +}
>> +
>> +# Disable all stop states
>> +disable_idle()
>> +{
>> + for ((cpu=0; cpu<NUM_CPUS; cpu++))
>> + do
>> + for ((state=0; state<NUM_STATES; state++))
>> + do
>> + echo 1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$state/disable
> So, on offlined CPUs, we won't see
> /sys/devices/system/cpu/cpu$cpu/cpuidle/state$state directory. You
> should probably perform this operation only on online CPUs.

Right. I should make CPU operations only on online CPUs all over the script

[..snip..]

Thanks
Pratik