2020-07-21 12:46:11

by Pratik R. Sampat

[permalink] [raw]
Subject: [PATCH v3 0/2] Selftest for cpuidle latency measurement

v2: https://lkml.org/lkml/2020/7/17/369
Changelog v2-->v3
Based on comments from Gautham R. Shenoy adding the following in the
selftest,
1. Grepping modules to determine if already loaded
2. Wrapper to enable/disable states
3. Preventing any operation/test on offlined CPUs
---

The patch series introduces a mechanism to measure wakeup latency for
IPI and timer based interrupts
The motivation behind this series is to find significant deviations
behind advertised latency and resisdency values

To achieve this, we introduce a kernel module and expose its control
knobs through the debugfs interface that the selftests can engage with.

The kernel module provides the following interfaces within
/sys/kernel/debug/latency_test/ for,
1. IPI test:
ipi_cpu_dest # Destination CPU for the IPI
ipi_cpu_src # Origin of the IPI
ipi_latency_ns # Measured latency time in ns
2. Timeout test:
timeout_cpu_src # CPU on which the timer to be queued
timeout_expected_ns # Timer duration
timeout_diff_ns # Difference of actual duration vs expected timer
To include the module, check option and include as module
kernel hacking -> Cpuidle latency selftests

The selftest inserts the module, disables all the idle states and
enables them one by one testing the following:
1. Keeping source CPU constant, iterates through all the CPUS measuring
IPI latency for baseline (CPU is busy with
"cat /dev/random > /dev/null" workload) and the when the CPU is
allowed to be at rest
2. Iterating through all the CPUs, sending expected timer durations to
be equivalent to the residency of the the deepest idle state
enabled and extracting the difference in time between the time of
wakeup and the expected timer duration

Usage
-----
Can be used in conjuction to the rest of the selftests.
Default Output location in: tools/testing/cpuidle/cpuidle.log

To run this test specifically:
$ make -C tools/testing/selftests TARGETS="cpuidle" run_tests

There are a few optinal arguments too that the script can take
[-h <help>]
[-m <location of the module>]
[-o <location of the output>]

Sample output snippet
---------------------
--IPI Latency Test---
--Baseline IPI Latency measurement: CPU Busy--
SRC_CPU DEST_CPU IPI_Latency(ns)
...
0 8 1996
0 9 2125
0 10 1264
0 11 1788
0 12 2045
Baseline Average IPI latency(ns): 1843
---Enabling state: 5---
SRC_CPU DEST_CPU IPI_Latency(ns)
0 8 621719
0 9 624752
0 10 622218
0 11 623968
0 12 621303
Expected IPI latency(ns): 100000
Observed Average IPI latency(ns): 622792

--Timeout Latency Test--
--Baseline Timeout Latency measurement: CPU Busy--
Wakeup_src Baseline_delay(ns)
...
8 2249
9 2226
10 2211
11 2183
12 2263
Baseline Average timeout diff(ns): 2226
---Enabling state: 5---
8 10749
9 10911
10 10912
11 12100
12 73276
Expected timeout(ns): 10000200
Observed Average timeout diff(ns): 23589

Pratik Rajesh Sampat (2):
cpuidle: Trace IPI based and timer based wakeup latency from idle
states
selftest/cpuidle: Add support for cpuidle latency measurement

drivers/cpuidle/Makefile | 1 +
drivers/cpuidle/test-cpuidle_latency.c | 150 ++++++++++
lib/Kconfig.debug | 10 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/cpuidle/Makefile | 6 +
tools/testing/selftests/cpuidle/cpuidle.sh | 310 +++++++++++++++++++++
tools/testing/selftests/cpuidle/settings | 1 +
7 files changed, 479 insertions(+)
create mode 100644 drivers/cpuidle/test-cpuidle_latency.c
create mode 100644 tools/testing/selftests/cpuidle/Makefile
create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh
create mode 100644 tools/testing/selftests/cpuidle/settings

--
2.25.4


2020-07-21 12:47:19

by Pratik R. Sampat

[permalink] [raw]
Subject: [PATCH v3 2/2] selftest/cpuidle: Add support for cpuidle latency measurement

This patch adds support to trace IPI based and timer based wakeup
latency from idle states

Latches onto the test-cpuidle_latency kernel module using the debugfs
interface to send IPIs or schedule a timer based event, which in-turn
populates the debugfs with the latency measurements.

Currently for the IPI and timer tests; first disable all idle states
and then test for latency measurements incrementally enabling each state

Signed-off-by: Pratik Rajesh Sampat <[email protected]>
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/cpuidle/Makefile | 6 +
tools/testing/selftests/cpuidle/cpuidle.sh | 310 +++++++++++++++++++++
tools/testing/selftests/cpuidle/settings | 1 +
4 files changed, 318 insertions(+)
create mode 100644 tools/testing/selftests/cpuidle/Makefile
create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh
create mode 100644 tools/testing/selftests/cpuidle/settings

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 1195bd85af38..ab6cf51f3518 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -7,6 +7,7 @@ TARGETS += capabilities
TARGETS += cgroup
TARGETS += clone3
TARGETS += cpufreq
+TARGETS += cpuidle
TARGETS += cpu-hotplug
TARGETS += drivers/dma-buf
TARGETS += efivarfs
diff --git a/tools/testing/selftests/cpuidle/Makefile b/tools/testing/selftests/cpuidle/Makefile
new file mode 100644
index 000000000000..72fd5d2e974d
--- /dev/null
+++ b/tools/testing/selftests/cpuidle/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+all:
+
+TEST_PROGS := cpuidle.sh
+
+include ../lib.mk
diff --git a/tools/testing/selftests/cpuidle/cpuidle.sh b/tools/testing/selftests/cpuidle/cpuidle.sh
new file mode 100755
index 000000000000..19cc24ccd4af
--- /dev/null
+++ b/tools/testing/selftests/cpuidle/cpuidle.sh
@@ -0,0 +1,310 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+LOG=cpuidle.log
+MODULE=/lib/modules/$(uname -r)/kernel/drivers/cpuidle/test-cpuidle_latency.ko
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+helpme()
+{
+ printf "Usage: $0 [-h] [-todg args]
+ [-h <help>]
+ [-m <location of the module>]
+ [-o <location of the output>]
+ \n"
+ exit 2
+}
+
+parse_arguments()
+{
+ while getopts ht:m:o: arg
+ do
+ case $arg in
+ h) # --help
+ helpme
+ ;;
+ m) # --mod-file
+ MODULE=$OPTARG
+ ;;
+ o) # output log files
+ LOG=$OPTARG
+ ;;
+ \?)
+ helpme
+ ;;
+ esac
+ done
+}
+
+ins_mod()
+{
+ if [ ! -f "$MODULE" ]; then
+ printf "$MODULE module does not exist. Exitting\n"
+ exit $ksft_skip
+ fi
+ # Check if the module is already loaded
+ grep "test_cpuidle_latency" /proc/modules
+ if [ $? == 0 ]; then
+ printf "Module: $MODULE already loaded\n\n"
+ return 0
+ fi
+ printf "Inserting $MODULE module\n\n"
+ insmod $MODULE
+ if [ $? != 0 ]; then
+ printf "Insmod $MODULE failed\n"
+ exit $ksft_skip
+ fi
+}
+
+compute_average()
+{
+ arr=("$@")
+ sum=0
+ size=${#arr[@]}
+ if [ $size == 0 ]; then
+ avg=0
+ return 1
+ fi
+ for i in "${arr[@]}"
+ do
+ sum=$((sum + i))
+ done
+ avg=$((sum/size))
+}
+
+# Disable all stop states
+disable_idle()
+{
+ for ((cpu=0; cpu<NUM_CPUS; cpu++))
+ do
+ local cpu_status=$(cpu_is_online $cpu)
+ if [ $cpu_status == 0 ]; then
+ continue
+ fi
+ for ((state=0; state<NUM_STATES; state++))
+ do
+ echo 1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$state/disable
+ done
+ done
+}
+
+# Perform operation on each CPU for the given state
+# $1 - Operation: enable (0) / disable (1)
+# $2 - State to enable
+op_state()
+{
+ for ((cpu=0; cpu<NUM_CPUS; cpu++))
+ do
+ local cpu_status=$(cpu_is_online $cpu)
+ if [ $cpu_status == 0 ]; then
+ continue
+ fi
+ echo $1 > /sys/devices/system/cpu/cpu$cpu/cpuidle/state$2/disable
+ done
+}
+
+cpuidle_enable_state()
+{
+ state=$1
+ op_state 0 $state
+}
+
+cpuidle_disable_state()
+{
+ state=$1
+ op_state 1 $state
+}
+
+cpu_is_online()
+{
+ cpu=$1
+ status=$(cat /sys/devices/system/cpu/cpu$cpu/online)
+ echo $status
+}
+
+# Extract latency in microseconds and convert to nanoseconds
+extract_latency()
+{
+ for ((state=0; state<NUM_STATES; state++))
+ do
+ latency=$(($(cat /sys/devices/system/cpu/cpu0/cpuidle/state$state/latency) * 1000))
+ latency_arr+=($latency)
+ done
+}
+
+# Run the IPI test
+# $1 run for baseline - busy cpu or regular environment
+# $2 destination cpu
+ipi_test_once()
+{
+ dest_cpu=$2
+ if [ "$1" = "baseline" ]; then
+ # Keep the CPU busy
+ taskset -c $dest_cpu cat /dev/random > /dev/null &
+ task_pid=$!
+ # Wait for the workload to achieve 100% CPU usage
+ sleep 1
+ fi
+ taskset 0x1 echo $dest_cpu > /sys/kernel/debug/latency_test/ipi_cpu_dest
+ ipi_latency=$(cat /sys/kernel/debug/latency_test/ipi_latency_ns)
+ src_cpu=$(cat /sys/kernel/debug/latency_test/ipi_cpu_src)
+ if [ "$1" = "baseline" ]; then
+ kill $task_pid
+ wait $task_pid 2>/dev/null
+ fi
+}
+
+# Incrementally Enable idle states one by one and compute the latency
+run_ipi_tests()
+{
+ extract_latency
+ disable_idle
+ declare -a avg_arr
+ echo -e "--IPI Latency Test---" >> $LOG
+
+ echo -e "--Baseline IPI Latency measurement: CPU Busy--" >> $LOG
+ printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" "IPI_Latency(ns)" >> $LOG
+ for ((cpu=0; cpu<NUM_CPUS; cpu++))
+ do
+ local cpu_status=$(cpu_is_online $cpu)
+ if [ $cpu_status == 0 ]; then
+ continue
+ fi
+ ipi_test_once "baseline" $cpu
+ printf "%-3s %10s %12s\n" $src_cpu $cpu $ipi_latency >> $LOG
+ avg_arr+=($ipi_latency)
+ done
+ compute_average "${avg_arr[@]}"
+ echo -e "Baseline Average IPI latency(ns): $avg" >> $LOG
+
+ for ((state=0; state<NUM_STATES; state++))
+ do
+ unset avg_arr
+ echo -e "---Enabling state: $state---" >> $LOG
+ cpuidle_enable_state $state
+ printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" "IPI_Latency(ns)" >> $LOG
+ for ((cpu=0; cpu<NUM_CPUS; cpu++))
+ do
+ local cpu_status=$(cpu_is_online $cpu)
+ if [ $cpu_status == 0 ]; then
+ continue
+ fi
+ # Running IPI test and logging results
+ sleep 1
+ ipi_test_once "test" $cpu
+ printf "%-3s %10s %12s\n" $src_cpu $cpu $ipi_latency >> $LOG
+ avg_arr+=($ipi_latency)
+ done
+ compute_average "${avg_arr[@]}"
+ echo -e "Expected IPI latency(ns): ${latency_arr[$state]}" >> $LOG
+ echo -e "Observed Average IPI latency(ns): $avg" >> $LOG
+ cpuidle_disable_state $state
+ done
+}
+
+# Extract the residency in microseconds and convert to nanoseconds.
+# Add 100 ns so that the timer stays for a little longer than the residency
+extract_residency()
+{
+ for ((state=0; state<NUM_STATES; state++))
+ do
+ residency=$(($(cat /sys/devices/system/cpu/cpu0/cpuidle/state$state/residency) * 1000 + 200))
+ residency_arr+=($residency)
+ done
+}
+
+# Run the Timeout test
+# $1 run for baseline - busy cpu or regular environment
+# $2 destination cpu
+# $3 timeout
+timeout_test_once()
+{
+ dest_cpu=$2
+ if [ "$1" = "baseline" ]; then
+ # Keep the CPU busy
+ taskset -c $dest_cpu cat /dev/random > /dev/null &
+ task_pid=$!
+ # Wait for the workload to achieve 100% CPU usage
+ sleep 1
+ fi
+ taskset -c $dest_cpu echo $3 > /sys/kernel/debug/latency_test/timeout_expected_ns
+ # Wait for the result to populate
+ sleep 0.1
+ timeout_diff=$(cat /sys/kernel/debug/latency_test/timeout_diff_ns)
+ src_cpu=$(cat /sys/kernel/debug/latency_test/timeout_cpu_src)
+ if [ "$1" = "baseline" ]; then
+ kill $task_pid
+ wait $task_pid 2>/dev/null
+ fi
+}
+
+run_timeout_tests()
+{
+ extract_residency
+ disable_idle
+ declare -a avg_arr
+ echo -e "\n--Timeout Latency Test--" >> $LOG
+
+ echo -e "--Baseline Timeout Latency measurement: CPU Busy--" >> $LOG
+ printf "%s %10s %10s\n" "Wakeup_src" "Baseline_delay(ns)">> $LOG
+ for ((cpu=0; cpu<NUM_CPUS; cpu++))
+ do
+ local cpu_status=$(cpu_is_online $cpu)
+ if [ $cpu_status == 0 ]; then
+ continue
+ fi
+ timeout_test_once "baseline" $cpu ${residency_arr[0]}
+ printf "%-3s %13s\n" $src_cpu $timeout_diff >> $LOG
+ avg_arr+=($timeout_diff)
+ done
+ compute_average "${avg_arr[@]}"
+ echo -e "Baseline Average timeout diff(ns): $avg" >> $LOG
+
+ for ((state=0; state<NUM_STATES; state++))
+ do
+ unset avg_arr
+ echo -e "---Enabling state: $state---" >> $LOG
+ cpuidle_enable_state $state
+ printf "%s %10s %10s\n" "Wakeup_src" "Baseline_delay(ns)" "Delay(ns)" >> $LOG
+ for ((cpu=0; cpu<NUM_CPUS; cpu++))
+ do
+ local cpu_status=$(cpu_is_online $cpu)
+ if [ $cpu_status == 0 ]; then
+ continue
+ fi
+ timeout_test_once "test" $cpu ${residency_arr[$state]}
+ printf "%-3s %13s %18s\n" $src_cpu $baseline_timeout_diff $timeout_diff >> $LOG
+ avg_arr+=($timeout_diff)
+ done
+ compute_average "${avg_arr[@]}"
+ echo -e "Expected timeout(ns): ${residency_arr[$state]}" >> $LOG
+ echo -e "Observed Average timeout diff(ns): $avg" >> $LOG
+ cpuidle_disable_state $state
+ done
+}
+
+declare -a residency_arr
+declare -a latency_arr
+
+# Parse arguments
+parse_arguments $@
+
+rm -f $LOG
+touch $LOG
+NUM_CPUS=$(nproc --all)
+NUM_STATES=$(ls -1 /sys/devices/system/cpu/cpu0/cpuidle/ | wc -l)
+
+# Insert the module
+ins_mod $MODULE
+
+printf "Started IPI latency tests\n"
+run_ipi_tests
+
+printf "Started Timer latency tests\n"
+run_timeout_tests
+
+printf "Removing $MODULE module\n"
+printf "Output logged at: $LOG\n"
+rmmod $MODULE
diff --git a/tools/testing/selftests/cpuidle/settings b/tools/testing/selftests/cpuidle/settings
new file mode 100644
index 000000000000..e7b9417537fb
--- /dev/null
+++ b/tools/testing/selftests/cpuidle/settings
@@ -0,0 +1 @@
+timeout=0
--
2.25.4

2020-07-21 15:00:34

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] Selftest for cpuidle latency measurement

On 21/07/2020 14:42, Pratik Rajesh Sampat wrote:
> v2: https://lkml.org/lkml/2020/7/17/369
> Changelog v2-->v3
> Based on comments from Gautham R. Shenoy adding the following in the
> selftest,
> 1. Grepping modules to determine if already loaded
> 2. Wrapper to enable/disable states
> 3. Preventing any operation/test on offlined CPUs
> ---
>
> The patch series introduces a mechanism to measure wakeup latency for
> IPI and timer based interrupts
> The motivation behind this series is to find significant deviations
> behind advertised latency and resisdency values

Why do you want to measure for the timer and the IPI ? Whatever the
source of the wakeup, the exit latency remains the same, no ?

Is all this kernel-ish code really needed ?

What about using a highres periodic timer and make it expires every eg.
50ms x 2400, so it is 120 secondes and measure the deviation. Repeat the
operation for each idle states.

And in order to make it as much accurate as possible, set the program
affinity on a CPU and isolate this one by preventing other processes to
be scheduled on and migrate the interrupts on the other CPUs.

That will be all userspace code, no?





--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2020-07-22 15:38:03

by Pratik R. Sampat

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] Selftest for cpuidle latency measurement

Hello Daniel,

On 21/07/20 8:27 pm, Daniel Lezcano wrote:
> On 21/07/2020 14:42, Pratik Rajesh Sampat wrote:
>> v2: https://lkml.org/lkml/2020/7/17/369
>> Changelog v2-->v3
>> Based on comments from Gautham R. Shenoy adding the following in the
>> selftest,
>> 1. Grepping modules to determine if already loaded
>> 2. Wrapper to enable/disable states
>> 3. Preventing any operation/test on offlined CPUs
>> ---
>>
>> The patch series introduces a mechanism to measure wakeup latency for
>> IPI and timer based interrupts
>> The motivation behind this series is to find significant deviations
>> behind advertised latency and resisdency values
> Why do you want to measure for the timer and the IPI ? Whatever the
> source of the wakeup, the exit latency remains the same, no ?
>
> Is all this kernel-ish code really needed ?
>
> What about using a highres periodic timer and make it expires every eg.
> 50ms x 2400, so it is 120 secondes and measure the deviation. Repeat the
> operation for each idle states.
>
> And in order to make it as much accurate as possible, set the program
> affinity on a CPU and isolate this one by preventing other processes to
> be scheduled on and migrate the interrupts on the other CPUs.
>
> That will be all userspace code, no?
>
>
The kernel module may not needed now that you mention it.
IPI latencies could be measured using pipes and threads using
pthread_attr_setaffinity_np to control the experiment, as you
suggested. This should internally fire a smp_call_function_single.

The original idea was to essentially measure it as closely as possible
in the kernel without involving the kernel-->userspace overhead.
However, the user-space approach may not be too much of a problem as
we are collecting a baseline and the delta of the latency is what we
would be concerned about anyways!

With respect to measuring both timers and IPI latencies: In principle
yes, the exit latency should remain the same but if there is a
deviation in reality we may want to measure it.

I'll implement this experiment in the userspace and get back with the
numbers to confirm.

Thanks for your comments!
Best,
Pratik

>
>