2020-03-28 13:00:03

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 0/2] cpufreq: intel_pstate: Run in the passive mode by default on systems without HWP

Hi All,

These two patches modify the intel_pstate driver to run in the passive mode by
default on systems without HWP (refer to the changelog of patch [2/2] for the
motivation part).

Internal testing of the system performance in 5.6-rc indicates that the
difference between the active mode with the powersave scaling algorithm and the
passive mode with the schedutil governor should be negligible for the majority
of users, so it should be safe to change the default behavior of the driver as
per the above.

Patch [1/2] makes changes to select the schedutil governor and set it as the
default one when intel_pstate is selected in Kconfig.

Patch [2/2] changes intel_pstate to start in the passive by default if HWP is
not supported (or if it is disabled via the kernel command line).

Please refer to the patch changelogs for more information.

Thanks!




2020-03-28 13:00:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 2/2] cpufreq: intel_pstate: Use passive mode by default without HWP

From: "Rafael J. Wysocki" <[email protected]>

After recent changes allowing scale-invariant utilization to be
used on x86, the schedutil governor on top of intel_pstate in the
passive mode should be on par with (or better than) the active mode
"powersave" algorithm of intel_pstate on systems in which
hardware-managed P-states (HWP) are not used, so it should not be
necessary to use the internal scaling algorithm in those cases.

Accordingly, modify intel_pstate to start in the passive mode by
default if the processor at hand does not support HWP of if the driver
is requested to avoid using HWP through the kernel command line.

Among other things, that will allow utilization clamps and the
support for RT/DL tasks in the schedutil governor to be utilized on
systems in which intel_pstate is used.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
Documentation/admin-guide/pm/intel_pstate.rst | 32 ++++++++++++++++-----------
drivers/cpufreq/intel_pstate.c | 3 ++-
2 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst
index ad392f3aee06..39d80bc29ccd 100644
--- a/Documentation/admin-guide/pm/intel_pstate.rst
+++ b/Documentation/admin-guide/pm/intel_pstate.rst
@@ -62,9 +62,10 @@ on the capabilities of the processor.
Active Mode
-----------

-This is the default operation mode of ``intel_pstate``. If it works in this
-mode, the ``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq``
-policies contains the string "intel_pstate".
+This is the default operation mode of ``intel_pstate`` for processors with
+hardware-managed P-states (HWP) support. If it works in this mode, the
+``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq`` policies
+contains the string "intel_pstate".

In this mode the driver bypasses the scaling governors layer of ``CPUFreq`` and
provides its own scaling algorithms for P-state selection. Those algorithms
@@ -138,12 +139,13 @@ internal P-state selection logic to be less performance-focused.
Active Mode Without HWP
~~~~~~~~~~~~~~~~~~~~~~~

-This is the default operation mode for processors that do not support the HWP
-feature. It also is used by default with the ``intel_pstate=no_hwp`` argument
-in the kernel command line. However, in this mode ``intel_pstate`` may refuse
-to work with the given processor if it does not recognize it. [Note that
-``intel_pstate`` will never refuse to work with any processor with the HWP
-feature enabled.]
+This operation mode is optional for processors that do not support the HWP
+feature or when the ``intel_pstate=no_hwp`` argument is passed to the kernel in
+the command line. The active mode is used in those cases if the
+``intel_pstate=active`` argument is passed to the kernel in the command line.
+In this mode ``intel_pstate`` may refuse to work with processors that are not
+recognized by it. [Note that ``intel_pstate`` will never refuse to work with
+any processor with the HWP feature enabled.]

In this mode ``intel_pstate`` registers utilization update callbacks with the
CPU scheduler in order to run a P-state selection algorithm, either
@@ -188,10 +190,14 @@ is not set.
Passive Mode
------------

-This mode is used if the ``intel_pstate=passive`` argument is passed to the
-kernel in the command line (it implies the ``intel_pstate=no_hwp`` setting too).
-Like in the active mode without HWP support, in this mode ``intel_pstate`` may
-refuse to work with the given processor if it does not recognize it.
+This is the default operation mode of ``intel_pstate`` for processors without
+hardware-managed P-states (HWP) support. It is always used if the
+``intel_pstate=passive`` argument is passed to the kernel in the command line
+regardless of whether or not the given processor supports HWP. [Note that the
+``intel_pstate=no_hwp`` setting implies ``intel_pstate=passive`` if it is used
+without ``intel_pstate=active``.] Like in the active mode without HWP support,
+in this mode ``intel_pstate`` may refuse to work with processors that are not
+recognized by it.

If the driver works in this mode, the ``scaling_driver`` policy attribute in
``sysfs`` for all ``CPUFreq`` policies contains the string "intel_cpufreq".
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index d2297839374d..b24a5c5ec4f9 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2769,6 +2769,8 @@ static int __init intel_pstate_init(void)
pr_info("Invalid MSRs\n");
return -ENODEV;
}
+ /* Without HWP start in the passive mode. */
+ default_driver = &intel_cpufreq;

hwp_cpu_matched:
/*
@@ -2814,7 +2816,6 @@ static int __init intel_pstate_setup(char *str)
if (!strcmp(str, "disable")) {
no_load = 1;
} else if (!strcmp(str, "passive")) {
- pr_info("Passive mode enabled\n");
default_driver = &intel_cpufreq;
no_hwp = 1;
}
--
2.16.4




2020-04-07 15:36:21

by Giovanni Gherdovich

[permalink] [raw]
Subject: Re: [PATCH 0/2] cpufreq: intel_pstate: Run in the passive mode by default on systems without HWP

On Sat, 2020-03-28 at 13:54 +0100, Rafael J. Wysocki wrote:
> Hi All,
>
> These two patches modify the intel_pstate driver to run in the passive mode by
> default on systems without HWP (refer to the changelog of patch [2/2] for the
> motivation part).
>
> Internal testing of the system performance in 5.6-rc indicates that the
> difference between the active mode with the powersave scaling algorithm and the
> passive mode with the schedutil governor should be negligible for the majority
> of users, so it should be safe to change the default behavior of the driver as
> per the above.
>
> Patch [1/2] makes changes to select the schedutil governor and set it as the
> default one when intel_pstate is selected in Kconfig.
>
> Patch [2/2] changes intel_pstate to start in the passive by default if HWP is
> not supported (or if it is disabled via the kernel command line).
>
> Please refer to the patch changelogs for more information.
>

Hello Rafael,

just to say that I'm very happy about this patch; I see it as a sensible
roll-out strategy for wide adoption of schedutil on x86, as it initially
applies to non-HWP only and not to the entire processor model range. As we get
more reports of its behavior on the field, we'll see when and how to move
forward from there.

I didn't reply last week as I was handling some bug reports for frequency
invariance on x86; one from LKML (Chris Wilson from Intel found that it
crashes when cpu0 is taken offline) and two more reported internally at
SUSE. Nothing major though, I am writing fixes for all those and will send
a bugfix series within the next few days.


Thanks!
Giovanni

2020-07-09 21:03:26

by Doug Smythies

[permalink] [raw]
Subject: RE: [PATCH 2/2] cpufreq: intel_pstate: Use passive mode by default without HWP

Hi Rafael,

As you may or may not recall, I am attempting to untangle
and separate multiple compounding issues around the
intel_pstate driver and HWP (or not).

Until everything is figured out, I am using the following rules:

. never use x86_energy_perf_policy.
. For HWP disabled: never change from active to passive or via versa, but rather do it via boot.
. after boot always check and reset the various power limit log bits that are set.
. never compile the kernel (well, until after any tests), which will set those bits again.
. never run prime95 high heat torture test, which will set those bits again.
. try to never do anything else that will set those bits again.

On 2020.03.28 05:58 Rafael J. Wysocki wrote:
>
> From: "Rafael J. Wysocki" <[email protected]>
>
> After recent changes allowing scale-invariant utilization to be
> used on x86, the schedutil governor on top of intel_pstate in the
> passive mode should be on par with (or better than) the active mode
> "powersave" algorithm of intel_pstate on systems in which
> hardware-managed P-states (HWP) are not used, so it should not be
> necessary to use the internal scaling algorithm in those cases.
>
> Accordingly, modify intel_pstate to start in the passive mode by
> default if the processor at hand does not support HWP of if the driver
> is requested to avoid using HWP through the kernel command line.
>
> Among other things, that will allow utilization clamps and the
> support for RT/DL tasks in the schedutil governor to be utilized on
> systems in which intel_pstate is used.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> Documentation/admin-guide/pm/intel_pstate.rst | 32 ++++++++++++++++-----------
> drivers/cpufreq/intel_pstate.c | 3 ++-
> 2 files changed, 21 insertions(+), 14 deletions(-)
>
> diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-
> guide/pm/intel_pstate.rst
> index ad392f3aee06..39d80bc29ccd 100644
> --- a/Documentation/admin-guide/pm/intel_pstate.rst
> +++ b/Documentation/admin-guide/pm/intel_pstate.rst
> @@ -62,9 +62,10 @@ on the capabilities of the processor.
> Active Mode
> -----------
>
> -This is the default operation mode of ``intel_pstate``. If it works in this
> -mode, the ``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq``
> -policies contains the string "intel_pstate".
> +This is the default operation mode of ``intel_pstate`` for processors with
> +hardware-managed P-states (HWP) support. If it works in this mode, the
> +``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq`` policies
> +contains the string "intel_pstate".
>
> In this mode the driver bypasses the scaling governors layer of ``CPUFreq`` and
> provides its own scaling algorithms for P-state selection. Those algorithms
> @@ -138,12 +139,13 @@ internal P-state selection logic to be less performance-focused.
> Active Mode Without HWP
> ~~~~~~~~~~~~~~~~~~~~~~~
>
> -This is the default operation mode for processors that do not support the HWP
> -feature. It also is used by default with the ``intel_pstate=no_hwp`` argument
> -in the kernel command line. However, in this mode ``intel_pstate`` may refuse
> -to work with the given processor if it does not recognize it. [Note that
> -``intel_pstate`` will never refuse to work with any processor with the HWP
> -feature enabled.]
> +This operation mode is optional for processors that do not support the HWP
> +feature or when the ``intel_pstate=no_hwp`` argument is passed to the kernel in
> +the command line. The active mode is used in those cases if the
> +``intel_pstate=active`` argument is passed to the kernel in the command line.

???
I can not see anywhere in the code where the kernel command line argument
"intel_pstate=active" is dealt with.

> +In this mode ``intel_pstate`` may refuse to work with processors that are not
> +recognized by it. [Note that ``intel_pstate`` will never refuse to work with
> +any processor with the HWP feature enabled.]
>
> In this mode ``intel_pstate`` registers utilization update callbacks with the
> CPU scheduler in order to run a P-state selection algorithm, either
> @@ -188,10 +190,14 @@ is not set.
> Passive Mode
> ------------
>
> -This mode is used if the ``intel_pstate=passive`` argument is passed to the
> -kernel in the command line (it implies the ``intel_pstate=no_hwp`` setting too).
> -Like in the active mode without HWP support, in this mode ``intel_pstate`` may
> -refuse to work with the given processor if it does not recognize it.
> +This is the default operation mode of ``intel_pstate`` for processors without
> +hardware-managed P-states (HWP) support. It is always used if the
> +``intel_pstate=passive`` argument is passed to the kernel in the command line
> +regardless of whether or not the given processor supports HWP. [Note that the
> +``intel_pstate=no_hwp`` setting implies ``intel_pstate=passive`` if it is used
> +without ``intel_pstate=active``.]

??? as above. I can not see where intel_pstate=active is dealt with in
the code.

> Like in the active mode without HWP support,
> +in this mode ``intel_pstate`` may refuse to work with processors that are not
> +recognized by it.
>
> If the driver works in this mode, the ``scaling_driver`` policy attribute in
> ``sysfs`` for all ``CPUFreq`` policies contains the string "intel_cpufreq".
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index d2297839374d..b24a5c5ec4f9 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -2769,6 +2769,8 @@ static int __init intel_pstate_init(void)
> pr_info("Invalid MSRs\n");
> return -ENODEV;
> }
> + /* Without HWP start in the passive mode. */
> + default_driver = &intel_cpufreq;
>
> hwp_cpu_matched:
> /*
> @@ -2814,7 +2816,6 @@ static int __init intel_pstate_setup(char *str)
> if (!strcmp(str, "disable")) {
> no_load = 1;
> } else if (!strcmp(str, "passive")) {
> - pr_info("Passive mode enabled\n");
> default_driver = &intel_cpufreq;
> no_hwp = 1;
> }
> --
> 2.16.4

Example 1: i5-9600k (hwp capable) (kernel 5.8-rc4):

Grub:
GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1 consoleblank=450 intel_pstate=active intel_pstate=no_hwp cpuidle_sysfs_switch
cpuidle.governor=teo"

/proc/cmdline:
BOOT_IMAGE=/boot/vmlinuz-5.8.0-rc4-stock root=UUID=0ac356c1-caa9-4c2e-8229-4408bd998dbd ro ipv6.disable=1 consoleblank=450
intel_pstate=active intel_pstate=no_hwp cpuidle_sysfs_switch cpuidle.governor=teo

Result:

doug@s18:~$ grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver
/sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:intel_cpufreq
/sys/devices/system/cpu/cpu1/cpufreq/scaling_driver:intel_cpufreq
/sys/devices/system/cpu/cpu2/cpufreq/scaling_driver:intel_cpufreq
/sys/devices/system/cpu/cpu3/cpufreq/scaling_driver:intel_cpufreq
/sys/devices/system/cpu/cpu4/cpufreq/scaling_driver:intel_cpufreq
/sys/devices/system/cpu/cpu5/cpufreq/scaling_driver:intel_cpufreq

Example 2: i7-2600k (does not have hwp) (kernel 5.8-rc1)

Grub:
GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1 consoleblank=300 intel_pstate=active cpuidle_sysfs_switch cpuidle.governor=teo"

/proc/cmdline:
BOOT_IMAGE=/boot/vmlinuz-5.8.0-rc1-stock root=UUID=bcbc624b-892b-46ca-9e9e-102daf644170 ro ipv6.disable=1 consoleblank=300
intel_pstate=active cpuidle_sysfs_switch cpuidle.governor=teo

Result:

doug@s15:~$ grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver
/sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:intel_cpufreq
/sys/devices/system/cpu/cpu1/cpufreq/scaling_driver:intel_cpufreq
/sys/devices/system/cpu/cpu2/cpufreq/scaling_driver:intel_cpufreq
/sys/devices/system/cpu/cpu3/cpufreq/scaling_driver:intel_cpufreq
/sys/devices/system/cpu/cpu4/cpufreq/scaling_driver:intel_cpufreq
/sys/devices/system/cpu/cpu5/cpufreq/scaling_driver:intel_cpufreq
/sys/devices/system/cpu/cpu6/cpufreq/scaling_driver:intel_cpufreq
/sys/devices/system/cpu/cpu7/cpufreq/scaling_driver:intel_cpufreq

... Doug


2020-07-13 12:20:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/2] cpufreq: intel_pstate: Use passive mode by default without HWP

On Thu, Jul 9, 2020 at 11:01 PM Doug Smythies <[email protected]> wrote:
>
> Hi Rafael,
>
> As you may or may not recall, I am attempting to untangle
> and separate multiple compounding issues around the
> intel_pstate driver and HWP (or not).
>
> Until everything is figured out, I am using the following rules:
>
> . never use x86_energy_perf_policy.
> . For HWP disabled: never change from active to passive or via versa, but rather do it via boot.
> . after boot always check and reset the various power limit log bits that are set.
> . never compile the kernel (well, until after any tests), which will set those bits again.
> . never run prime95 high heat torture test, which will set those bits again.
> . try to never do anything else that will set those bits again.
>
> On 2020.03.28 05:58 Rafael J. Wysocki wrote:
> >
> > From: "Rafael J. Wysocki" <[email protected]>
> >
> > After recent changes allowing scale-invariant utilization to be
> > used on x86, the schedutil governor on top of intel_pstate in the
> > passive mode should be on par with (or better than) the active mode
> > "powersave" algorithm of intel_pstate on systems in which
> > hardware-managed P-states (HWP) are not used, so it should not be
> > necessary to use the internal scaling algorithm in those cases.
> >
> > Accordingly, modify intel_pstate to start in the passive mode by
> > default if the processor at hand does not support HWP of if the driver
> > is requested to avoid using HWP through the kernel command line.
> >
> > Among other things, that will allow utilization clamps and the
> > support for RT/DL tasks in the schedutil governor to be utilized on
> > systems in which intel_pstate is used.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> > Documentation/admin-guide/pm/intel_pstate.rst | 32 ++++++++++++++++-----------
> > drivers/cpufreq/intel_pstate.c | 3 ++-
> > 2 files changed, 21 insertions(+), 14 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-
> > guide/pm/intel_pstate.rst
> > index ad392f3aee06..39d80bc29ccd 100644
> > --- a/Documentation/admin-guide/pm/intel_pstate.rst
> > +++ b/Documentation/admin-guide/pm/intel_pstate.rst
> > @@ -62,9 +62,10 @@ on the capabilities of the processor.
> > Active Mode
> > -----------
> >
> > -This is the default operation mode of ``intel_pstate``. If it works in this
> > -mode, the ``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq``
> > -policies contains the string "intel_pstate".
> > +This is the default operation mode of ``intel_pstate`` for processors with
> > +hardware-managed P-states (HWP) support. If it works in this mode, the
> > +``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq`` policies
> > +contains the string "intel_pstate".
> >
> > In this mode the driver bypasses the scaling governors layer of ``CPUFreq`` and
> > provides its own scaling algorithms for P-state selection. Those algorithms
> > @@ -138,12 +139,13 @@ internal P-state selection logic to be less performance-focused.
> > Active Mode Without HWP
> > ~~~~~~~~~~~~~~~~~~~~~~~
> >
> > -This is the default operation mode for processors that do not support the HWP
> > -feature. It also is used by default with the ``intel_pstate=no_hwp`` argument
> > -in the kernel command line. However, in this mode ``intel_pstate`` may refuse
> > -to work with the given processor if it does not recognize it. [Note that
> > -``intel_pstate`` will never refuse to work with any processor with the HWP
> > -feature enabled.]
> > +This operation mode is optional for processors that do not support the HWP
> > +feature or when the ``intel_pstate=no_hwp`` argument is passed to the kernel in
> > +the command line. The active mode is used in those cases if the
> > +``intel_pstate=active`` argument is passed to the kernel in the command line.
>
> ???
> I can not see anywhere in the code where the kernel command line argument
> "intel_pstate=active" is dealt with.

My bad, sorry about this.

I'll send a patch to fix this issue shortly.

Thanks!