LinuxLists.cc - Re: [PATCH v2 0/3] Enable EAS for CPPC/ACPI based systems

2022-04-07 15:59:54

Subject: Re: [PATCH v2 0/3] Enable EAS for CPPC/ACPI based systems

Hello,

The following branch contains all the required patches:
https://gitlab.arm.com/linux-arm/linux-pg/-/tree/pg/eas_acpi_v2

Regards,
Pierre

On 4/7/22 11:36, Itaru Kitayama wrote:
> Do you happen to have your own dev git tree that has this series?
>
> Itaru.
>
> On Thu, Apr 7, 2022 at 17:54 Pierre Gondois <[email protected] <mailto:[email protected]>> wrote:
>
> From: Pierre Gondois <[email protected] <mailto:[email protected]>>
>
> v2:
> - Remove inline hint of cppc_cpufreq_search_cpu_data(). [Mark]
> - Use EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL(). [Mark]
> - Use a bitmap to squeeze CPU efficiency class values. [Mark]
>
> 0. Overview
>
> The current Energy Model (EM) for CPUs requires knowledge about CPU
> performance states and their power consumption. Both of these
> information is not available for ACPI based systems.
>
> In ACPI, describing power efficiency of CPUs can be done through the
> following arm specific field:
>
> ACPI 6.4, s5.2.12.14 "GIC CPU Interface (GICC) Structure",
> "Processor Power Efficiency Class field":
> Describes the relative power efficiency of the associated pro-
> cessor. Lower efficiency class numbers are more efficient than
> higher ones (e.g. efficiency class 0 should be treated as more
> efficient than efficiency class 1). However, absolute values
> of this number have no meaning: 2 isn't necessarily half as
> efficient as 1.
>
> Add an 'efficiency_class' field to describe the relative power
> efficiency of CPUs. CPUs relying on this field will have performance
> states (power and frequency values) artificially created. Such EM will
> be referred to as an artificial EM.
>
> The artificial EM is used for the CPPC driver.
>
> 1. Dependencies
>
> This patch-set has a dependency on:
> - [0/8] Introduce support for artificial Energy Model
> https://lkml.org/lkml/2022/3/16/850 <https://lkml.org/lkml/2022/3/16/850>
> introduces a new callback in the Energy Model (EM) and prevents the
> registration of devices using power values from an EM when the EM
> is artificial. Not having this patch-set would break builds.
> - This patch-set based on linux-next.
>
> 2. Testing
>
> This patch-set has been tested on a Juno-r2 and a Pixel4. Two types
> of tests were done: energy testing, and performance testing.
>
> The energy testing was done with 2 sets of tasks:
> - homogeneous tasks (#Tasks at 5% utilization and 16ms period)
> - heterogeneous tasks (#Tasks at 5|10|15% utilization and 16ms period).
> If a test has 3 tasks, then there is one with each utilization
> (1 at 5%, 1 at 10%, 1 at 15%).
> Tasks spawn on the biggest CPU(s) of the platform. If there are
> multiple big CPUs, tasks spawn alternatively on big CPUs.
>
> 2.1. Juno-r2 testing
>
> The Juno-r2 has 6 CPUs:
> - 4 little [0, 3-5], max_capa=383
> - 2 big [1-2], max_capa=1024
> Base kernel is v5.17-rc5.
>
> 2.1.1. Energy testing
>
> The tests were done on:
> - a system using a DT and the scmi cpufreq driver. Comparison
> is done between no-EAS and EAS.
> - a system using ACPI and the cppc cpufreq driver. Comparison
> is done between CPPC-no-EAS and CPPC-EAS. CPPC-EAS uses
> the artificial EM.
>
> Energy numbers come from the Juno energy counter, by summing
> little and big clusters energy spending. There has been 5 iterations
> of each test. Lower energy spending is better.
>
> 2.1.1.1. Homogeneous tasks
>
> Energy results (Joules):
> +--------+-------------------+-----------------------------+
> | | no-EAS | EAS |
> +--------+---------+---------+-------------------+---------+
> | #Tasks | Mean | ci(+/-) | Mean | ci(+/-) |
> +--------+---------+---------+-------------------+---------+
> | 10 | 7.89 | 0.26 | 6.99 (-11.36) | 0.49 |
> | 20 | 13.42 | 0.32 | 13.42 ( -0.02) | 0.08 |
> | 30 | 21.43 | 0.98 | 21.62 ( +0.87) | 0.63 |
> | 40 | 30.03 | 0.82 | 30.31 ( +0.94) | 0.37 |
> | 50 | 43.19 | 0.56 | 43.50 ( +0.72) | 0.52 |
> +--------+---------+---------+-------------------+---------+
> +--------+-------------------+-----------------------------+
> | | CPPC-no-EAS | CPPC-EAS |
> +--------+---------+---------+-------------------+---------+
> | #Tasks | Mean | ci(+/-) | Mean | ci(+/-) |
> +--------+---------+---------+-------------------+---------+
> | 10 | 7.86 | 0.37 | 5.64 (-28.23) | 0.05 |
> | 20 | 13.36 | 0.20 | 10.92 (-18.31) | 0.31 |
> | 30 | 19.28 | 0.34 | 18.30 ( -5.07) | 0.64 |
> | 40 | 28.33 | 0.59 | 27.13 ( -4.23) | 0.42 |
> | 50 | 40.78 | 0.58 | 40.77 ( -0.04) | 0.45 |
> +--------+---------+---------+-------------------+---------+
>
> Missed activations were measured while comparing CPPC-no-EAS/CPPC-EAS
> energy values. They were of 0.00% for all tests and both
> configurations. Missed activations start to appear in a significant
> number starting from ~70 tasks.
>
> 2.1.1.2. Heterogeneous tasks
>
> Energy results (Joules):
> +--------+-------------------+-----------------------------+
> | | no-EAS | EAS |
> +--------+---------+---------+-------------------+---------+
> | #Tasks | Mean | ci(+/-) | Mean | ci(+/-) |
> +--------+---------+---------+-------------------+---------+
> | 3 | 5.25 | 0.50 | 4.58 (-12.82%) | 0.07 |
> | 9 | 12.30 | 0.28 | 11.45 ( -6.97%) | 0.34 |
> | 15 | 20.06 | 1.32 | 20.60 ( 2.66%) | 1.00 |
> | 21 | 30.03 | 0.63 | 30.07 ( 0.12%) | 0.41 |
> +--------+---------+---------+-------------------+---------+
> +--------+-------------------+-----------------------------+
> | | CPPC-no-EAS | CPPC-EAS |
> +--------+---------+---------+-------------------+---------+
> | #Tasks | Mean | ci(+/-) | Mean | ci(+/-) |
> +--------+---------+---------+-------------------+---------+
> | 3 | 4.58 | 0.31 | 3.65 (-20.31%) | 0.05 |
> | 9 | 11.53 | 0.20 | 9.23 (-19.97%) | 0.22 |
> | 15 | 19.19 | 0.16 | 18.33 ( -4.49%) | 0.71 |
> | 21 | 29.07 | 0.29 | 29.06 ( -0.01%) | 0.08 |
> +--------+---------+---------+-------------------+---------+
>
> Missed activations were measured while comparing CPPC-no-EAS/CPPC-EAS
> energy values. They were of 0.00% for all tests and both
> configurations. Missed activations start to appear in a significant
> number starting from ~36 tasks.
>
> 2.1.1.3. Analysis:
>
> The artificial EM often shows better energy gains than the EM,
> especially for small loads. Indeed, the artificial power values
> show a huge energy gain by placing tasks on little CPUs. The 6%
> margin is always reached, so tasks are easily placed on little
> CPUs. The margin is not always reached with real power values,
> leading to tasks staying on big CPUs.
>
> 2.1.2. Performance testing
>
> 10 iterations of HackBench with the "--pipe --thread" options and
> 1000 loops. Compared value is the testing time in seconds. A lower
> timing is better.
> +----------------+-------------------+---------------------------+
> | | CPPC-no-EAS | CPPC-EAS |
> +--------+-------+---------+---------+-----------------+---------+
> | Groups | Tasks | Mean | ci(+/-) | Mean | ci(+/-) |
> +--------+-------+---------+---------+-----------------+---------+
> | 1 | 40 | 2.39 | 0.19 | 2.39 (-0.24%) | 0.07 |
> | 2 | 80 | 5.56 | 0.48 | 5.28 (-5.02%) | 0.42 |
> | 4 | 160 | 12.15 | 0.84 | 12.06 (-0.80%) | 0.48 |
> | 8 | 320 | 23.03 | 0.94 | 23.12 (+0.36%) | 0.70 |
> +--------+-------+---------+---------+-----------------+---------+
>
> The performance is overall sligthly better, but stays in the margin
> or error.
>
>
> 2.2. Pixel4 testing
>
> Pixel4 has 7 CPUs:
> - 4 little [0-3], max_capa=261
> - 3 medium [4-6], max_capa=861
> - 1 big [7], max_capa=1024
>
> Base kernel is android-10.0.0_r0.81. The performance states advertised
> in the DT were modified with performance states that would be generated
> by this patch-set.
> The artificial EM was set such as little CPUs > medium CPUs > big CPU,
> meaning little CPUs are the most energy efficient.
> Comparing the power/capacity ratio, little CPUs' performance states are
> all more energy efficient than the medium CPUs' performance states.
> This is wrong when comparing medium and big CPUs.
>
> 2.2.1. Energy testing
>
> The 2 sets of tests (heterogeneous/homogeneous) were tested while
> registering battery voltage and current (power is obtained by
> multiplying them).
> Voltage is averaged over a rolling period of ~11s and current over a
> period of ~6s. Usb-C cable is plugged in but alimentation is cut.
> Pixel4 is on airplane mode. The tests lasts 120s, the first 50s and
> last 10s are trimmed as the power is slowly raising to reach a
> plateau.
> Are compared:
> - android with EAS (but NO_FIND_BEST_TARGET is set):
> echo ENERGY_AWARE > /sys/kernel/debug/sched_features
> echo NO_FIND_BEST_TARGET > /sys/kernel/debug/sched_features
> - android without EAS:
> echo NO_ENERGY_AWARE > /sys/kernel/debug/sched_features
> - android with the artificial energy model
> Lower energy spending is better.
>
> 2.2.1.2. Homogeneous tasks
>
> Energy results (in uW):
> +--------+-------------------+-----------------------------+
> | | Without EAS | With EAS |
> +--------+---------+---------+-------------------+---------+
> | #Tasks | Mean | ci(+/-) | Mean | ci(+/-) |
> +--------+---------+---------+-------------------+---------+
> | 10 | 6.21+05 | 3.12+02 | 5.09+05 (-18.01%) | 2.18+03 |
> | 20 | 9.12+05 | 9.71+02 | 7.91+05 (-13.26%) | 9.92+02 |
> | 30 | 1.25+06 | 2.02+03 | 1.09+06 (-12.12%) | 2.00+03 |
> | 40 | 2.05+06 | 5.15+03 | 1.38+06 (-32.36%) | 1.21+03 |
> | 50 | 3.03+06 | 6.94+03 | 1.89+06 (-37.44%) | 3.21+03 |
> +--------+---------+---------+-------------------+---------+
> +--------+-------------------+-----------------------------+
> | | Without EAS | With patch |
> +--------+---------+---------+-------------------+---------+
> | #Tasks | Mean | ci(+/-) | Mean | ci(+/-) |
> +--------+---------+---------+-------------------+---------+
> | 10 | 6.21+05 | 3.12+02 | 4.39+05 (-29.29%) | 5.63+02 |
> | 20 | 9.12+05 | 9.71+02 | 7.30+05 (-19.90%) | 1.98+03 |
> | 30 | 1.25+06 | 2.02+03 | 1.01+06 (-18.60%) | 1.72+03 |
> | 40 | 2.05+06 | 5.15+03 | 1.38+06 (-32.60%) | 3.93+03 |
> | 50 | 3.03+06 | 6.94+03 | 2.05+06 (-32.08%) | 1.25+04 |
> +--------+---------+---------+-------------------+---------+
>
> 2.2.1.2. Heterogeneous tasks
>
> Energy results (in uW):
> +--------+-------------------+-----------------------------+
> | | Without EAS | With EAS |
> +--------+---------+---------+-------------------+---------+
> | #Tasks | Mean | ci(+/-) | Mean | ci(+/-) |
> +--------+---------+---------+-------------------+---------+
> | 3 | 5.14+05 | 1.06+03 | 3.76+05 (-26.82%) | 4.58+02 |
> | 9 | 8.52+05 | 1.18+03 | 7.25+05 (-14.96%) | 1.39+03 |
> | 15 | 1.42+06 | 3.14+03 | 1.20+06 (-15.41%) | 1.06+04 |
> | 21 | 2.73+06 | 3.49+03 | 1.49+06 (-45.47%) | 3.43+03 |
> | 27 | 3.17+06 | 6.92+03 | 2.42+06 (-23.77%) | 8.43+03 |
> +--------+---------+---------+-------------------+---------+
> +--------+-------------------+-----------------------------+
> | | Without EAS | With patch |
> +--------+---------+---------+-------------------+---------+
> | #Tasks | Mean | ci(+/-) | Mean | ci(+/-) |
> +--------+---------+---------+-------------------+---------+
> | 3 | 5.14+05 | 1.06+03 | 3.82+05 (-25.70%) | 7.67+02 |
> | 9 | 8.52+05 | 1.18+03 | 7.05+05 (-17.30%) | 9.79+02 |
> | 15 | 1.42+06 | 3.14+03 | 1.05+06 (-26.00%) | 1.15+03 |
> | 21 | 2.73+06 | 3.49+03 | 1.53+06 (-43.68%) | 2.23+03 |
> | 27 | 3.17+06 | 6.92+03 | 2.86+06 ( -9.77%) | 4.26+03 |
> +--------+---------+---------+-------------------+---------+
>
> 2.2.1.2. Analysis
>
> Similarly to Juno, the artificial performance states show a huge
> gain to place tasks on small CPUs, leading to better energy results.
>
> 2.2.2. Performance testing
>
> 10 iterations of PcMark. Compared value is the final score
> (PcmaWorkv3Score). A bigger score is better.
> +----------------+-------------------------+-------------------------+
> | Without EAS | With EAS | With patch |
> +------+---------+---------------+---------+---------------+---------+
> | Mean | ci(+/-) | Mean | ci(+/-) | Mean | ci(+/-) |
> +------+---------+---------------+---------+---------------+---------+
> | 8026 | 86 | 8003 | 74 | 7840 (-2.00%) | 104 |
> +------+---------+---------------+---------+---------------+---------+
>
> Performance is lower, but still in the margin of error.
>
>
> 3. Summary
>
> The artificial performance states show overall better energy results
> and a small performance decrease. They lead to a more aggressive task
> placement on the most energy efficient CPUs, and this explains the
> results.
>
> Pierre Gondois (3):
> cpufreq: CPPC: Add cppc_cpufreq_search_cpu_data
> cpufreq: CPPC: Add per_cpu efficiency_class
> cpufreq: CPPC: Register EM based on efficiency class information
>
> arch/arm64/kernel/smp.c | 1 +
> drivers/cpufreq/cppc_cpufreq.c | 201 +++++++++++++++++++++++++++++++++
> 2 files changed, 202 insertions(+)
>
> --
> 2.25.1
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected] <mailto:[email protected]>
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>
>