2014-01-15 05:39:45

by Len Brown

[permalink] [raw]
Subject: [PATCH REGRESSION FIX] x86 idle: restore mwait_idle()

From: Len Brown <[email protected]>

In Linux-3.9 we removed the mwait_idle() loop:
'x86 idle: remove mwait_idle() and "idle=mwait" cmdline param'
(69fb3676df3329a7142803bb3502fa59dc0db2e3)

The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT loop,
until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.

But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.

2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.

https://bugzilla.kernel.org/show_bug.cgi?id=60770

So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.

Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in context
For 3.11, 3.12, 3.13, this patch applies cleanly

Cc: Mike Galbraith <[email protected]>
Cc: Ian Malone <[email protected]>
Cc: Josh Boyer <[email protected]>
Cc: <[email protected]> # 3.9, 3.10, 3.11, 3.12, 3.13
Signed-off-by: Len Brown <[email protected]>
---
arch/x86/kernel/process.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 3fb8d95..db471a8 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -398,6 +398,49 @@ static void amd_e400_idle(void)
default_idle();
}

+/*
+ * Intel Core2 and older machines prefer MWAIT over HALT for C1.
+ * We can't rely on cpuidle installing MWAIT, because it will not load
+ * on systems that support only C1 -- so the boot default must be MWAIT.
+ *
+ * Some AMD machines are the opposite, they depend on using HALT.
+ *
+ * So for default C1, which is used during boot until cpuidle loads,
+ * use MWAIT-C1 on Intel HW that has it, else use HALT.
+ */
+static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
+{
+ if (c->x86_vendor != X86_VENDOR_INTEL)
+ return 0;
+
+ if (!cpu_has(c, X86_FEATURE_MWAIT))
+ return 0;
+
+ return 1;
+}
+
+/*
+ * MONITOR/MWAIT with no hints, used for default default C1 state.
+ * This invokes MWAIT with interrutps enabled and no flags,
+ * which is backwards compatible with the original MWAIT implementation.
+ */
+
+static void mwait_idle(void)
+{
+ if (!need_resched()) {
+ if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
+ clflush((void *)&current_thread_info()->flags);
+
+ __monitor((void *)&current_thread_info()->flags, 0, 0);
+ smp_mb();
+ if (!need_resched())
+ __sti_mwait(0, 0);
+ else
+ local_irq_enable();
+ } else
+ local_irq_enable();
+}
+
void select_idle_routine(const struct cpuinfo_x86 *c)
{
#ifdef CONFIG_SMP
@@ -411,6 +454,9 @@ void select_idle_routine(const struct cpuinfo_x86 *c)
/* E400: APIC timer interrupt does not wake up CPU from C1e */
pr_info("using AMD E400 aware idle routine\n");
x86_idle = amd_e400_idle;
+ } else if (prefer_mwait_c1_over_halt(c)) {
+ pr_info("using mwait in idle threads\n");
+ x86_idle = mwait_idle;
} else
x86_idle = default_idle;
}
--
1.8.5.2.309.ga25014b


2014-01-15 09:29:15

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH REGRESSION FIX] x86 idle: restore mwait_idle()

On Wed, 2014-01-15 at 00:37 -0500, Len Brown wrote:
> From: Len Brown <[email protected]>
>
> In Linux-3.9 we removed the mwait_idle() loop:
> 'x86 idle: remove mwait_idle() and "idle=mwait" cmdline param'
> (69fb3676df3329a7142803bb3502fa59dc0db2e3)
>
> The reasoning was that modern machines should be sufficiently
> happy during the boot process using the default_idle() HALT loop,
> until cpuidle loads and either acpi_idle or intel_idle
> invoke the newer MWAIT-with-hints idle loop.
>
> But two machines reported problems:
> 1. Certain Core2-era machines support MWAIT-C1 and HALT only.
> MWAIT-C1 is preferred for optimal power and performance.
> But if they support just C1, cpuidle never loads and
> so they use the boot-time default idle loop forever.

Q6600 box (allegedly) has a slightly greenish tinge again.

Tested-by: Mike Galbraith <[email protected]>

> 2. Some laptops will boot-hang if HALT is used,
> but will boot successfully if MWAIT is used.
> This appears to be a hidden assumption in BIOS SMI,
> that is presumably valid on the proprietary OS
> where the BIOS was validated.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=60770
>
> So here we effectively revert the patch above, restoring
> the mwait_idle() loop. However, we don't bother restoring
> the idle=mwait cmdline parameter, since it appears to add
> no value.
>
> Maintainer notes:
> For 3.9, simply revert 69fb3676df
> for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in context
> For 3.11, 3.12, 3.13, this patch applies cleanly
>
> Cc: Mike Galbraith <[email protected]>
> Cc: Ian Malone <[email protected]>
> Cc: Josh Boyer <[email protected]>
> Cc: <[email protected]> # 3.9, 3.10, 3.11, 3.12, 3.13
> Signed-off-by: Len Brown <[email protected]>
> ---
> arch/x86/kernel/process.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 46 insertions(+)
>
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 3fb8d95..db471a8 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -398,6 +398,49 @@ static void amd_e400_idle(void)
> default_idle();
> }
>
> +/*
> + * Intel Core2 and older machines prefer MWAIT over HALT for C1.
> + * We can't rely on cpuidle installing MWAIT, because it will not load
> + * on systems that support only C1 -- so the boot default must be MWAIT.
> + *
> + * Some AMD machines are the opposite, they depend on using HALT.
> + *
> + * So for default C1, which is used during boot until cpuidle loads,
> + * use MWAIT-C1 on Intel HW that has it, else use HALT.
> + */
> +static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
> +{
> + if (c->x86_vendor != X86_VENDOR_INTEL)
> + return 0;
> +
> + if (!cpu_has(c, X86_FEATURE_MWAIT))
> + return 0;
> +
> + return 1;
> +}
> +
> +/*
> + * MONITOR/MWAIT with no hints, used for default default C1 state.
> + * This invokes MWAIT with interrutps enabled and no flags,
> + * which is backwards compatible with the original MWAIT implementation.
> + */
> +
> +static void mwait_idle(void)
> +{
> + if (!need_resched()) {
> + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
> + clflush((void *)&current_thread_info()->flags);
> +
> + __monitor((void *)&current_thread_info()->flags, 0, 0);
> + smp_mb();
> + if (!need_resched())
> + __sti_mwait(0, 0);
> + else
> + local_irq_enable();
> + } else
> + local_irq_enable();
> +}
> +
> void select_idle_routine(const struct cpuinfo_x86 *c)
> {
> #ifdef CONFIG_SMP
> @@ -411,6 +454,9 @@ void select_idle_routine(const struct cpuinfo_x86 *c)
> /* E400: APIC timer interrupt does not wake up CPU from C1e */
> pr_info("using AMD E400 aware idle routine\n");
> x86_idle = amd_e400_idle;
> + } else if (prefer_mwait_c1_over_halt(c)) {
> + pr_info("using mwait in idle threads\n");
> + x86_idle = mwait_idle;
> } else
> x86_idle = default_idle;
> }

2014-01-16 22:00:32

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH REGRESSION FIX] x86 idle: restore mwait_idle()

On 01/14/2014 09:37 PM, Len Brown wrote:
> From: Len Brown <[email protected]>
>
> In Linux-3.9 we removed the mwait_idle() loop:
> 'x86 idle: remove mwait_idle() and "idle=mwait" cmdline param'
> (69fb3676df3329a7142803bb3502fa59dc0db2e3)
>
> The reasoning was that modern machines should be sufficiently
> happy during the boot process using the default_idle() HALT loop,
> until cpuidle loads and either acpi_idle or intel_idle
> invoke the newer MWAIT-with-hints idle loop.
>
> But two machines reported problems:
> 1. Certain Core2-era machines support MWAIT-C1 and HALT only.
> MWAIT-C1 is preferred for optimal power and performance.
> But if they support just C1, cpuidle never loads and
> so they use the boot-time default idle loop forever.
>
> 2. Some laptops will boot-hang if HALT is used,
> but will boot successfully if MWAIT is used.
> This appears to be a hidden assumption in BIOS SMI,
> that is presumably valid on the proprietary OS
> where the BIOS was validated.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=60770
>
> So here we effectively revert the patch above, restoring
> the mwait_idle() loop. However, we don't bother restoring
> the idle=mwait cmdline parameter, since it appears to add
> no value.
>
> Maintainer notes:
> For 3.9, simply revert 69fb3676df
> for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in context
> For 3.11, 3.12, 3.13, this patch applies cleanly
>
> Cc: Mike Galbraith <[email protected]>
> Cc: Ian Malone <[email protected]>
> Cc: Josh Boyer <[email protected]>
> Cc: <[email protected]> # 3.9, 3.10, 3.11, 3.12, 3.13
> Signed-off-by: Len Brown <[email protected]>
> ---
> arch/x86/kernel/process.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 46 insertions(+)
>
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 3fb8d95..db471a8 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -398,6 +398,49 @@ static void amd_e400_idle(void)
> default_idle();
> }
>
> +/*
> + * Intel Core2 and older machines prefer MWAIT over HALT for C1.
> + * We can't rely on cpuidle installing MWAIT, because it will not load
> + * on systems that support only C1 -- so the boot default must be MWAIT.
> + *
> + * Some AMD machines are the opposite, they depend on using HALT.
> + *
> + * So for default C1, which is used during boot until cpuidle loads,
> + * use MWAIT-C1 on Intel HW that has it, else use HALT.
> + */
> +static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
> +{
> + if (c->x86_vendor != X86_VENDOR_INTEL)
> + return 0;
> +
> + if (!cpu_has(c, X86_FEATURE_MWAIT))
> + return 0;
> +
> + return 1;
> +}
> +
> +/*
> + * MONITOR/MWAIT with no hints, used for default default C1 state.
> + * This invokes MWAIT with interrutps enabled and no flags,
> + * which is backwards compatible with the original MWAIT implementation.
> + */
> +
> +static void mwait_idle(void)
> +{
> + if (!need_resched()) {
> + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
> + clflush((void *)&current_thread_info()->flags);
> +
> + __monitor((void *)&current_thread_info()->flags, 0, 0);
> + smp_mb();
> + if (!need_resched())
> + __sti_mwait(0, 0);
> + else
> + local_irq_enable();
> + } else
> + local_irq_enable();
> +}

Admittedly, there may be relatively few users left, but SMP users on
C1-only Core 2 machines can, in principle, benefit from the monitor
functionality of mwait to avoid rescheduling IPIs. This stuff changed
recently so it now works with the cpuidle drivers (it used to be
terminally broken). Should something be twiddling TS_POLLING
differently so that HLT gets the IPIs but mwait doesn't?

--Andy

> +
> void select_idle_routine(const struct cpuinfo_x86 *c)
> {
> #ifdef CONFIG_SMP
> @@ -411,6 +454,9 @@ void select_idle_routine(const struct cpuinfo_x86 *c)
> /* E400: APIC timer interrupt does not wake up CPU from C1e */
> pr_info("using AMD E400 aware idle routine\n");
> x86_idle = amd_e400_idle;
> + } else if (prefer_mwait_c1_over_halt(c)) {
> + pr_info("using mwait in idle threads\n");
> + x86_idle = mwait_idle;
> } else
> x86_idle = default_idle;
> }
>

2014-01-17 04:21:03

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH REGRESSION FIX] x86 idle: restore mwait_idle()

On Thu, 2014-01-16 at 14:00 -0800, Andy Lutomirski wrote:
> On 01/14/2014 09:37 PM, Len Brown wrote:
> > From: Len Brown <[email protected]>
> >
> > In Linux-3.9 we removed the mwait_idle() loop:
> > 'x86 idle: remove mwait_idle() and "idle=mwait" cmdline param'
> > (69fb3676df3329a7142803bb3502fa59dc0db2e3)
> >
> > The reasoning was that modern machines should be sufficiently
> > happy during the boot process using the default_idle() HALT loop,
> > until cpuidle loads and either acpi_idle or intel_idle
> > invoke the newer MWAIT-with-hints idle loop.
> >
> > But two machines reported problems:
> > 1. Certain Core2-era machines support MWAIT-C1 and HALT only.
> > MWAIT-C1 is preferred for optimal power and performance.
> > But if they support just C1, cpuidle never loads and
> > so they use the boot-time default idle loop forever.
> >
> > 2. Some laptops will boot-hang if HALT is used,
> > but will boot successfully if MWAIT is used.
> > This appears to be a hidden assumption in BIOS SMI,
> > that is presumably valid on the proprietary OS
> > where the BIOS was validated.
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=60770
> >
> > So here we effectively revert the patch above, restoring
> > the mwait_idle() loop. However, we don't bother restoring
> > the idle=mwait cmdline parameter, since it appears to add
> > no value.
> >
> > Maintainer notes:
> > For 3.9, simply revert 69fb3676df
> > for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in context
> > For 3.11, 3.12, 3.13, this patch applies cleanly
> >
> > Cc: Mike Galbraith <[email protected]>
> > Cc: Ian Malone <[email protected]>
> > Cc: Josh Boyer <[email protected]>
> > Cc: <[email protected]> # 3.9, 3.10, 3.11, 3.12, 3.13
> > Signed-off-by: Len Brown <[email protected]>
> > ---
> > arch/x86/kernel/process.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 46 insertions(+)
> >
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index 3fb8d95..db471a8 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -398,6 +398,49 @@ static void amd_e400_idle(void)
> > default_idle();
> > }
> >
> > +/*
> > + * Intel Core2 and older machines prefer MWAIT over HALT for C1.
> > + * We can't rely on cpuidle installing MWAIT, because it will not load
> > + * on systems that support only C1 -- so the boot default must be MWAIT.
> > + *
> > + * Some AMD machines are the opposite, they depend on using HALT.
> > + *
> > + * So for default C1, which is used during boot until cpuidle loads,
> > + * use MWAIT-C1 on Intel HW that has it, else use HALT.
> > + */
> > +static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
> > +{
> > + if (c->x86_vendor != X86_VENDOR_INTEL)
> > + return 0;
> > +
> > + if (!cpu_has(c, X86_FEATURE_MWAIT))
> > + return 0;
> > +
> > + return 1;
> > +}
> > +
> > +/*
> > + * MONITOR/MWAIT with no hints, used for default default C1 state.
> > + * This invokes MWAIT with interrutps enabled and no flags,
> > + * which is backwards compatible with the original MWAIT implementation.
> > + */
> > +
> > +static void mwait_idle(void)
> > +{
> > + if (!need_resched()) {
> > + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
> > + clflush((void *)&current_thread_info()->flags);
> > +
> > + __monitor((void *)&current_thread_info()->flags, 0, 0);
> > + smp_mb();
> > + if (!need_resched())
> > + __sti_mwait(0, 0);
> > + else
> > + local_irq_enable();
> > + } else
> > + local_irq_enable();
> > +}
>
> Admittedly, there may be relatively few users left, but SMP users on
> C1-only Core 2 machines can, in principle, benefit from the monitor
^hugely
> functionality of mwait to avoid rescheduling IPIs. This stuff changed
> recently so it now works with the cpuidle drivers (it used to be
> terminally broken). Should something be twiddling TS_POLLING
> differently so that HLT gets the IPIs but mwait doesn't?

Urk, definitely. The IPI is the primary cause of the size large cross
core scheduling performance regression for my aging, but lovely Q6600.

taskset 0xc pipe-test 1

3.8.13 3.397977 usecs/loop -- avg 3.400336 588.2 KHz
master+ 4.798547 usecs/loop -- avg 4.791692 417.4 KHz

-Mike

2014-01-18 09:34:44

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH REGRESSION FIX] x86 idle: restore mwait_idle()

On Fri, 2014-01-17 at 05:20 +0100, Mike Galbraith wrote:

> taskset 0xc pipe-test 1
>
> 3.8.13 3.397977 usecs/loop -- avg 3.400336 588.2 KHz
> master+ 4.798547 usecs/loop -- avg 4.791692 417.4 KHz

Bah, those are apple/grape, these are apple/apple.

idle: kill unnecessary mwait_idle() resched IPIs

Set/clear polling instead.

Q6600, pipe-test scheduling cross core:
3.8.13 487.2 KHz 1.000
3.13.0-master 415.5 KHz .852
3.13.0-master+ 415.2 KHz .852 + restore mwait_idle
3.13.0-master++ 488.5 KHz 1.002 + restore mwait_idle + IPI fix

Signed-off-by: Mike Galbraith <[email protected]>
---
arch/x86/kernel/process.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -427,18 +427,18 @@ static int prefer_mwait_c1_over_halt(con

static void mwait_idle(void)
{
- if (!need_resched()) {
+ if (!current_set_polling_and_test()) {
if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
clflush((void *)&current_thread_info()->flags);

__monitor((void *)&current_thread_info()->flags, 0, 0);
- smp_mb();
if (!need_resched())
__sti_mwait(0, 0);
else
local_irq_enable();
} else
local_irq_enable();
+ __current_clr_polling();
}

void select_idle_routine(const struct cpuinfo_x86 *c)

2014-01-18 16:15:21

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH REGRESSION FIX] x86 idle: restore mwait_idle()

On Sat, 2014-01-18 at 10:33 +0100, Mike Galbraith wrote:
> On Fri, 2014-01-17 at 05:20 +0100, Mike Galbraith wrote:
>
> > taskset 0xc pipe-test 1
> >
> > 3.8.13 3.397977 usecs/loop -- avg 3.400336 588.2 KHz
> > master+ 4.798547 usecs/loop -- avg 4.791692 417.4 KHz
>
> Bah, those are apple/grape, these are apple/apple.

Or, to make it more correct for 3.10..13, add the clflush barriers as
well for afflicted CPUs.

idle: kill unnecessary mwait_idle() resched IPIs

Set/clear polling instead.

Q6600, pipe-test scheduling cross core:
3.8.13 487.2 KHz 1.000
3.13.0-master 415.5 KHz .852
3.13.0-master+ 415.2 KHz .852 + restore mwait_idle
3.13.0-master++ 488.5 KHz 1.002 + restore mwait_idle + IPI fix

Signed-off-by: Mike Galbraith <[email protected]>
Cc: <[email protected]> # 3.10+
---
arch/x86/kernel/process.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -427,18 +427,21 @@ static int prefer_mwait_c1_over_halt(con

static void mwait_idle(void)
{
- if (!need_resched()) {
- if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
+ if (!current_set_polling_and_test()) {
+ if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) {
+ mb();
clflush((void *)&current_thread_info()->flags);
+ mb();
+ }

__monitor((void *)&current_thread_info()->flags, 0, 0);
- smp_mb();
if (!need_resched())
__sti_mwait(0, 0);
else
local_irq_enable();
} else
local_irq_enable();
+ __current_clr_polling();
}

void select_idle_routine(const struct cpuinfo_x86 *c)