2020-01-30 14:48:56

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 2/2] intel_idle: Introduce 'states_off' module parameter

From: Rafael J. Wysocki <[email protected]>

In certain system configurations it may not be desirable to use some
C-states assumed to be available by intel_idle and the driver needs
to be prevented from using them even before the cpuidle sysfs
interface becomes accessible to user space. Currently, the only way
to achieve that is by setting the 'max_cstate' module parameter to a
value lower than the index of the shallowest of the C-states in
question, but that may be overly intrusive, because it effectively
makes all of the idle states deeper than the 'max_cstate' one go
away (and the C-state to avoid may be in the middle of the range
normally regarded as available).

To allow that limitation to be overcome, introduce a new module
parameter called 'states_off' to represent a list of idle states to
be disabled by default in the form of a bitmask and update the
documentation to cover it.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
Documentation/admin-guide/pm/intel_idle.rst | 15 ++++++++++++++-
drivers/idle/intel_idle.c | 24 +++++++++++++++++++++---
2 files changed, 35 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/idle/intel_idle.c
===================================================================
--- linux-pm.orig/drivers/idle/intel_idle.c
+++ linux-pm/drivers/idle/intel_idle.c
@@ -63,6 +63,7 @@ static struct cpuidle_driver intel_idle_
};
/* intel_idle.max_cstate=0 disables driver */
static int max_cstate = CPUIDLE_STATE_MAX - 1;
+static unsigned int disabled_states_mask;

static unsigned int mwait_substates;

@@ -1234,6 +1235,9 @@ static void __init intel_idle_init_cstat
if (cx->type > ACPI_STATE_C2)
state->flags |= CPUIDLE_FLAG_TLB_FLUSHED;

+ if (disabled_states_mask & BIT(cstate))
+ state->flags |= CPUIDLE_FLAG_OFF;
+
state->enter = intel_idle;
state->enter_s2idle = intel_idle_s2idle;
}
@@ -1466,9 +1470,10 @@ static void __init intel_idle_init_cstat
/* Structure copy. */
drv->states[drv->state_count] = cpuidle_state_table[cstate];

- if ((icpu->use_acpi || use_acpi) &&
- intel_idle_off_by_default(mwait_hint) &&
- !(cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_ALWAYS_ENABLE))
+ if ((disabled_states_mask & BIT(drv->state_count)) ||
+ ((icpu->use_acpi || use_acpi) &&
+ intel_idle_off_by_default(mwait_hint) &&
+ !(cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_ALWAYS_ENABLE)))
drv->states[drv->state_count].flags |= CPUIDLE_FLAG_OFF;

drv->state_count++;
@@ -1487,6 +1492,10 @@ static void __init intel_idle_init_cstat
static void __init intel_idle_cpuidle_driver_init(struct cpuidle_driver *drv)
{
cpuidle_poll_state_init(drv);
+
+ if (disabled_states_mask & BIT(0))
+ drv->states[0].flags |= CPUIDLE_FLAG_OFF;
+
drv->state_count = 1;

if (icpu)
@@ -1667,3 +1676,12 @@ device_initcall(intel_idle_init);
* is the easiest way (currently) to continue doing that.
*/
module_param(max_cstate, int, 0444);
+/*
+ * The positions of the bits that are set in the two's complement representation
+ * of this value are the indices of the idle states to be disabled by default
+ * (as reflected by the names of the corresponding idle state directories in
+ * sysfs, "state0", "state1" ... "state<i>" ..., where <i> is the index of the
+ * given state).
+ */
+module_param_named(states_off, disabled_states_mask, uint, 0444);
+MODULE_PARM_DESC(states_off, "Mask of disabled idle states");
Index: linux-pm/Documentation/admin-guide/pm/intel_idle.rst
===================================================================
--- linux-pm.orig/Documentation/admin-guide/pm/intel_idle.rst
+++ linux-pm/Documentation/admin-guide/pm/intel_idle.rst
@@ -168,7 +168,7 @@ and ``idle=nomwait``. If any of them is
``MWAIT`` instruction is not allowed to be used, so the initialization of
``intel_idle`` will fail.

-Apart from that there are three module parameters recognized by ``intel_idle``
+Apart from that there are four module parameters recognized by ``intel_idle``
itself that can be set via the kernel command line (they cannot be updated via
sysfs, so that is the only way to change their values).

@@ -195,6 +195,19 @@ driver ignore the system's ACPI tables e
recognized processor models, respectively (they both are unset by default and
``use_acpi`` has no effect if ``no_acpi`` is set).

+The value of the ``states_off`` module parameter (0 by default) represents a
+list of idle states to be disabled by default in the form of a bitmask. Namely,
+the positions of the bits that are set in the two's complement representation of
+that value are the indices of idle states to be disabled by default (as
+reflected by the names of the corresponding idle state directories in ``sysfs``,
+:file:`state0`, :file:`state1` ... :file:`state<i>` ..., where ``<i>`` is the
+index of the given idle state; see :ref:`idle-states-representation` in
+:doc:`cpuidle`). For example, if ``states_off`` is equal to 3, the driver will
+disable idle states 0 and 1 by default, and if it is equal to 8, idle state 3
+will be disabled by default and so on (bit positions beyond the maximum idle
+state index are ignored). The idle states disabled this way can be enabled (on
+a per-CPU basis) from user space via ``sysfs``.
+

.. _intel-idle-core-and-package-idle-states:





2020-01-31 11:08:57

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 2/2] intel_idle: Introduce 'states_off' module parameter

From: Rafael J. Wysocki
> Sent: 30 January 2020 14:47
>
> In certain system configurations it may not be desirable to use some
> C-states assumed to be available by intel_idle and the driver needs
> to be prevented from using them even before the cpuidle sysfs
> interface becomes accessible to user space. Currently, the only way
> to achieve that is by setting the 'max_cstate' module parameter to a
> value lower than the index of the shallowest of the C-states in
> question, but that may be overly intrusive, because it effectively
> makes all of the idle states deeper than the 'max_cstate' one go
> away (and the C-state to avoid may be in the middle of the range
> normally regarded as available).
>
> To allow that limitation to be overcome, introduce a new module
> parameter called 'states_off' to represent a list of idle states to
> be disabled by default in the form of a bitmask and update the
> documentation to cover it.

The problem I see is that there are (at least) 3 different ways of
referring to the C-States:

1) The state names, C1, C1E, C3, C7 etc.
I'm not sure these are visible outside intel_idle.c.
2) The maximum allowed latency in us.
3) The index into the cpu-dependant tables in intel_idle.c.

Boot parameters that set 3 are completely hopeless for normal
users. The C-state names might be - but they aren't documented.

Unless you know exactly which cpu table is being used the
only constraint a user can request is the latency.

(I've had the misfortune to read intel_idle.c in the last week.
Almost impenetrable TLA ridden uncommented code.)

...
> + * The positions of the bits that are set in the two's complement representation
> + * of this value are the indices of the idle states to be disabled by default
> + * (as reflected by the names of the corresponding idle state directories in
> + * sysfs, "state0", "state1" ... "state<i>" ..., where <i> is the index of the
> + * given state).

What has 'two's complement' got to do with anything?

...
> +The value of the ``states_off`` module parameter (0 by default) represents a
> +list of idle states to be disabled by default in the form of a bitmask. Namely,
> +the positions of the bits that are set in the two's complement representation of
> +that value are the indices of idle states to be disabled by default (as
> +reflected by the names of the corresponding idle state directories in ``sysfs``,
> +:file:`state0`, :file:`state1` ... :file:`state<i>` ..., where ``<i>`` is the
> +index of the given idle state; see :ref:`idle-states-representation` in
> +:doc:`cpuidle`). For example, if ``states_off`` is equal to 3, the driver will
> +disable idle states 0 and 1 by default, and if it is equal to 8, idle state 3
> +will be disabled by default and so on (bit positions beyond the maximum idle
> +state index are ignored). The idle states disabled this way can be enabled (on
> +a per-CPU basis) from user space via ``sysfs``.

A few line breaks would make that easier to read.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-01-31 11:24:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/2] intel_idle: Introduce 'states_off' module parameter

On Fri, Jan 31, 2020 at 12:07 PM David Laight <[email protected]> wrote:
>
> From: Rafael J. Wysocki
> > Sent: 30 January 2020 14:47
> >
> > In certain system configurations it may not be desirable to use some
> > C-states assumed to be available by intel_idle and the driver needs
> > to be prevented from using them even before the cpuidle sysfs
> > interface becomes accessible to user space. Currently, the only way
> > to achieve that is by setting the 'max_cstate' module parameter to a
> > value lower than the index of the shallowest of the C-states in
> > question, but that may be overly intrusive, because it effectively
> > makes all of the idle states deeper than the 'max_cstate' one go
> > away (and the C-state to avoid may be in the middle of the range
> > normally regarded as available).
> >
> > To allow that limitation to be overcome, introduce a new module
> > parameter called 'states_off' to represent a list of idle states to
> > be disabled by default in the form of a bitmask and update the
> > documentation to cover it.
>
> The problem I see is that there are (at least) 3 different ways of
> referring to the C-States:

So the mask is not referring to the C-states in the first place.

> 1) The state names, C1, C1E, C3, C7 etc.
> I'm not sure these are visible outside intel_idle.c.

Yes, they are, in sysfs.

> 2) The maximum allowed latency in us.
> 3) The index into the cpu-dependant tables in intel_idle.c.
>
> Boot parameters that set 3 are completely hopeless for normal
> users. The C-state names might be - but they aren't documented.
>
> Unless you know exactly which cpu table is being used the
> only constraint a user can request is the latency.

So this mask refers to the idle states numbering in sysfs, as stated
in the documentation update. That covers state0 which is not a
C-state too.

> (I've had the misfortune to read intel_idle.c in the last week.
> Almost impenetrable TLA ridden uncommented code.)

I have some patches to improve that, will post them after this is settled.

> ...
> > + * The positions of the bits that are set in the two's complement representation
> > + * of this value are the indices of the idle states to be disabled by default
> > + * (as reflected by the names of the corresponding idle state directories in
> > + * sysfs, "state0", "state1" ... "state<i>" ..., where <i> is the index of the
> > + * given state).
>
> What has 'two's complement' got to do with anything?

Well, it is the representation in which bits are used. Kind of as
opposed to decimal or hex digits. But I can replace that phrase with
"bits that are set in this number" easily enough.

> ...
> > +The value of the ``states_off`` module parameter (0 by default) represents a
> > +list of idle states to be disabled by default in the form of a bitmask. Namely,
> > +the positions of the bits that are set in the two's complement representation of
> > +that value are the indices of idle states to be disabled by default (as
> > +reflected by the names of the corresponding idle state directories in ``sysfs``,
> > +:file:`state0`, :file:`state1` ... :file:`state<i>` ..., where ``<i>`` is the
> > +index of the given idle state; see :ref:`idle-states-representation` in
> > +:doc:`cpuidle`). For example, if ``states_off`` is equal to 3, the driver will
> > +disable idle states 0 and 1 by default, and if it is equal to 8, idle state 3
> > +will be disabled by default and so on (bit positions beyond the maximum idle
> > +state index are ignored). The idle states disabled this way can be enabled (on
> > +a per-CPU basis) from user space via ``sysfs``.
>
> A few line breaks would make that easier to read.

Fair enough.

Thanks!

2020-01-31 11:26:40

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH 2/2] intel_idle: Introduce 'states_off' module parameter

On Fri, 2020-01-31 at 11:07 +0000, David Laight wrote:
> Unless you know exactly which cpu table is being used the
> only constraint a user can request is the latency.

Hi David,

in all my use-cases I always know what is the CPU I am dealing with and
what are the C-states. Simply because in my view they are always CPU-
dependent in terms of what they do and how are they named.

What you say sounds to me like you would want to disable some C-states
without knowing anything (or much) about the CPU you are dealing with
and the C-state names.

If so, could you please share examples of such use-cases?

Thanks!

--
Best Regards,
Artem Bityutskiy

2020-01-31 11:56:55

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 2/2] intel_idle: Introduce 'states_off' module parameter

From: Artem Bityutskiy >
> Sent: 31 January 2020 11:24
> On Fri, 2020-01-31 at 11:07 +0000, David Laight wrote:
> > Unless you know exactly which cpu table is being used the
> > only constraint a user can request is the latency.
>
> Hi David,
>
> in all my use-cases I always know what is the CPU I am dealing with and
> what are the C-states. Simply because in my view they are always CPU-
> dependent in terms of what they do and how are they named.
>
> What you say sounds to me like you would want to disable some C-states
> without knowing anything (or much) about the CPU you are dealing with
> and the C-state names.
>
> If so, could you please share examples of such use-cases?

Dunno, but clearly you want to disable (say) C3 while leaving C6
enabled.

I was trying to find why it was taking 600+us for a RT process
to get rescheduled when it had only been sleeping for a few us.

I found where it was sleeping, but that didn't help at all.
Someone pointed me at a 'random' pdf that referred to /dev/cpu_dma_latency.
Setting that to a small value (eg 20) helps no end.
But there are no references in the code or man pages to that.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-01-31 12:05:05

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/2] intel_idle: Introduce 'states_off' module parameter

On Fri, Jan 31, 2020 at 12:54 PM David Laight <[email protected]> wrote:
>
> From: Artem Bityutskiy >
> > Sent: 31 January 2020 11:24
> > On Fri, 2020-01-31 at 11:07 +0000, David Laight wrote:
> > > Unless you know exactly which cpu table is being used the
> > > only constraint a user can request is the latency.
> >
> > Hi David,
> >
> > in all my use-cases I always know what is the CPU I am dealing with and
> > what are the C-states. Simply because in my view they are always CPU-
> > dependent in terms of what they do and how are they named.
> >
> > What you say sounds to me like you would want to disable some C-states
> > without knowing anything (or much) about the CPU you are dealing with
> > and the C-state names.
> >
> > If so, could you please share examples of such use-cases?
>
> Dunno, but clearly you want to disable (say) C3 while leaving C6
> enabled.
>
> I was trying to find why it was taking 600+us for a RT process
> to get rescheduled when it had only been sleeping for a few us.
>
> I found where it was sleeping, but that didn't help at all.
> Someone pointed me at a 'random' pdf that referred to /dev/cpu_dma_latency.
> Setting that to a small value (eg 20) helps no end.
> But there are no references in the code or man pages to that.

There is a piece of kernel documentation regarding it, however:

https://www.kernel.org/doc/html/latest/admin-guide/pm/cpuidle.html#cpu-pm-qos