For Intel 7400 series CPUs, the recommendation is to use a clflush on the
monitored address just before monitor and mwait pair [1]. This clflush makes
sure that there are no false wakeups from mwait when the monitored address
was recently written to.
[1] "MONITOR/MWAIT Recommendations for Intel Xeon Processor 7400 series"
section in specification update document of 7400 series
http://download.intel.com/design/xeon/specupdt/32033601.pdf
Signed-off-by: Venkatesh Pallipadi <[email protected]>
---
arch/x86/kernel/cpu/intel.c | 3 +++
arch/x86/kernel/process.c | 6 ++++++
include/asm-x86/cpufeature.h | 1 +
3 files changed, 10 insertions(+)
Index: tip/arch/x86/kernel/cpu/intel.c
===================================================================
--- tip.orig/arch/x86/kernel/cpu/intel.c 2008-10-07 13:04:01.000000000 -0700
+++ tip/arch/x86/kernel/cpu/intel.c 2008-10-07 13:55:08.000000000 -0700
@@ -262,6 +262,9 @@ static void __cpuinit init_intel(struct
ds_init_intel(c);
}
+ if (c->x86 == 6 && c->x86_model == 29 && cpu_has_clflush)
+ set_cpu_cap(c, X86_FEATURE_CLFLUSH_MONITOR);
+
#ifdef CONFIG_X86_64
if (c->x86 == 15)
c->x86_cache_alignment = c->x86_clflush_size * 2;
Index: tip/arch/x86/kernel/process.c
===================================================================
--- tip.orig/arch/x86/kernel/process.c 2008-10-07 13:04:01.000000000 -0700
+++ tip/arch/x86/kernel/process.c 2008-10-07 13:47:07.000000000 -0700
@@ -155,6 +155,9 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
{
if (!need_resched()) {
+ if (boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
+ clflush((void *)¤t_thread_info()->flags);
+
__monitor((void *)¤t_thread_info()->flags, 0, 0);
smp_mb();
if (!need_resched())
@@ -166,6 +169,9 @@ void mwait_idle_with_hints(unsigned long
static void mwait_idle(void)
{
if (!need_resched()) {
+ if (boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
+ clflush((void *)¤t_thread_info()->flags);
+
__monitor((void *)¤t_thread_info()->flags, 0, 0);
smp_mb();
if (!need_resched())
Index: tip/include/asm-x86/cpufeature.h
===================================================================
--- tip.orig/include/asm-x86/cpufeature.h 2008-10-07 13:07:59.000000000 -0700
+++ tip/include/asm-x86/cpufeature.h 2008-10-07 13:39:41.000000000 -0700
@@ -91,6 +91,7 @@
#define X86_FEATURE_NOPL (3*32+20) /* The NOPL (0F 1F) instructions */
#define X86_FEATURE_AMDC1E (3*32+21) /* AMD C1E detected */
#define X86_FEATURE_XTOPOLOGY (3*32+22) /* cpu topology enum extensions */
+#define X86_FEATURE_CLFLUSH_MONITOR (3*32+23) /* clflush needed with monitor */
/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
#define X86_FEATURE_XMM3 (4*32+ 0) /* "pni" SSE-3 */
Venki Pallipadi wrote:
> For Intel 7400 series CPUs, the recommendation is to use a clflush on the
> monitored address just before monitor and mwait pair [1]. This clflush makes
> sure that there are no false wakeups from mwait when the monitored address
> was recently written to.
>
> [1] "MONITOR/MWAIT Recommendations for Intel Xeon Processor 7400 series"
> section in specification update document of 7400 series
> http://download.intel.com/design/xeon/specupdt/32033601.pdf
>
> Signed-off-by: Venkatesh Pallipadi <[email protected]>
This seems very expensive. It really makes me wonder if it wouldn't
just be better to either declare monitor/mwait non-functional on this
chip, or make sure that mwaits can handle false wakeups.
-hpa
On Tue, Oct 07, 2008 at 02:09:12PM -0700, H. Peter Anvin wrote:
> Venki Pallipadi wrote:
> > For Intel 7400 series CPUs, the recommendation is to use a clflush on the
> > monitored address just before monitor and mwait pair [1]. This clflush makes
> > sure that there are no false wakeups from mwait when the monitored address
> > was recently written to.
> >
> > [1] "MONITOR/MWAIT Recommendations for Intel Xeon Processor 7400 series"
> > section in specification update document of 7400 series
> > http://download.intel.com/design/xeon/specupdt/32033601.pdf
> >
> > Signed-off-by: Venkatesh Pallipadi <[email protected]>
>
> This seems very expensive. It really makes me wonder if it wouldn't
> just be better to either declare monitor/mwait non-functional on this
> chip, or make sure that mwaits can handle false wakeups.
>
mwait can handle false wakeups. Today we wake backup all the way and find
out there is nothing to do and go back to idle. And second time around this
false wakeup does not happen as we do not write to monitored address in the
interim. The problem we saw was the places where we try to look at how
long each idle period was and take power management decision for the next idle.
Such algorithms get confused with false wakeups.
Yes. Other alternative is to disable mwaits altogether on these CPUs. I can
send a patch to do that. But, the patch will be somewhat more complicated
as kernel advertises the MWAIT capability to firmware with a ACPI _PDC method
and BIOS has to then give IO port based C2 for us to use in such case.
Avoiding mwait just for C1 is easy though.
Thanks,
Venki
Venki Pallipadi wrote:
> On Tue, Oct 07, 2008 at 02:09:12PM -0700, H. Peter Anvin wrote:
>> Venki Pallipadi wrote:
>>> For Intel 7400 series CPUs, the recommendation is to use a clflush on the
>>> monitored address just before monitor and mwait pair [1]. This clflush makes
>>> sure that there are no false wakeups from mwait when the monitored address
>>> was recently written to.
>>>
>>> [1] "MONITOR/MWAIT Recommendations for Intel Xeon Processor 7400 series"
>>> section in specification update document of 7400 series
>>> http://download.intel.com/design/xeon/specupdt/32033601.pdf
>>>
>>> Signed-off-by: Venkatesh Pallipadi <[email protected]>
>> This seems very expensive. It really makes me wonder if it wouldn't
>> just be better to either declare monitor/mwait non-functional on this
>> chip, or make sure that mwaits can handle false wakeups.
>
> mwait can handle false wakeups. Today we wake backup all the way and find
> out there is nothing to do and go back to idle. And second time around this
> false wakeup does not happen as we do not write to monitored address in the
> interim. The problem we saw was the places where we try to look at how
> long each idle period was and take power management decision for the next idle.
> Such algorithms get confused with false wakeups.
Do you have any concept of how often this happens? Every time? Small
fraction?
> Yes. Other alternative is to disable mwaits altogether on these CPUs. I can
> send a patch to do that. But, the patch will be somewhat more complicated
> as kernel advertises the MWAIT capability to firmware with a ACPI _PDC method
> and BIOS has to then give IO port based C2 for us to use in such case.
> Avoiding mwait just for C1 is easy though.
It would be my weak preference, although I'm willing to be convinced
that mwait is worth the power savings even with the workaround.
-hpa
>-----Original Message-----
>From: H. Peter Anvin [mailto:[email protected]]
>Sent: Tuesday, October 07, 2008 4:03 PM
>To: Pallipadi, Venkatesh
>Cc: Ingo Molnar; Thomas Gleixner; linux-kernel
>Subject: Re: [PATCH] x86: Add clflush before monitor for Intel
>7400 series
>
>Venki Pallipadi wrote:
>
>> Yes. Other alternative is to disable mwaits altogether on
>these CPUs. I can
>> send a patch to do that. But, the patch will be somewhat
>more complicated
>> as kernel advertises the MWAIT capability to firmware with a
>ACPI _PDC method
>> and BIOS has to then give IO port based C2 for us to use in
>such case.
>> Avoiding mwait just for C1 is easy though.
>
>It would be my weak preference, although I'm willing to be convinced
>that mwait is worth the power savings even with the workaround.
>
> -hpa
hpa,
Do you still have reservations about this being expensive.
Note that this is only done when a CPU is about to go idle and
The cost of clflush itself will be minimal compared to idle
entry + idle exit latency.
Thanks,
Venki
Pallipadi, Venkatesh wrote:
>
> hpa,
>
> Do you still have reservations about this being expensive.
> Note that this is only done when a CPU is about to go idle and
> The cost of clflush itself will be minimal compared to idle
> entry + idle exit latency.
>
I guess I'm a bit confused about the tradeoff of CLFLUSH versus simply
disabling MWAIT. This is a relatively recent processor and so
optimizing matters (if this was a P4 I would be more worried about what
has least impact on the kernel as a whole.)
Entry and exit latency do matter (specifically, exit latency matters for
longer waits and the combined entry+exit latency matters for short waits.)
On the other hand, perhaps what we need to do is to get the fix in and
worry about performance later.
-hpa
>-----Original Message-----
>From: H. Peter Anvin [mailto:[email protected]]
>Sent: Friday, October 17, 2008 1:13 PM
>To: Pallipadi, Venkatesh
>Cc: Ingo Molnar; Thomas Gleixner; linux-kernel
>Subject: Re: [PATCH] x86: Add clflush before monitor for Intel
>7400 series
>
>Pallipadi, Venkatesh wrote:
>>
>> hpa,
>>
>> Do you still have reservations about this being expensive.
>> Note that this is only done when a CPU is about to go idle and
>> The cost of clflush itself will be minimal compared to idle
>> entry + idle exit latency.
>>
>
>I guess I'm a bit confused about the tradeoff of CLFLUSH versus simply
>disabling MWAIT. This is a relatively recent processor and so
>optimizing matters (if this was a P4 I would be more worried about what
>has least impact on the kernel as a whole.)
As of now this is only for a limited models of CPUs (MP CPUs)
and not a norm. It is a errata on this CPU and not
something that is going to become architectural.
>Entry and exit latency do matter (specifically, exit latency
>matters for
>longer waits and the combined entry+exit latency matters for
>short waits.)
Agreed. In this case the overhead of clflush is very low (~30 cycles)
and C1 entry+exit latency will be of the order of few hundreds of cycles for
fastest wakeup.
>On the other hand, perhaps what we need to do is to get the fix in and
>worry about performance later.
>
> -hpa
Thanks,
Venki
Pallipadi, Venkatesh wrote:
>>>
>> I guess I'm a bit confused about the tradeoff of CLFLUSH versus simply
>> disabling MWAIT. This is a relatively recent processor and so
>> optimizing matters (if this was a P4 I would be more worried about what
>> has least impact on the kernel as a whole.)
>
> As of now this is only for a limited models of CPUs (MP CPUs)
> and not a norm. It is a errata on this CPU and not
> something that is going to become architectural.
>
Yes, I understand. However, there is still the option to disable MWAIT,
and the question is still open if it makes sense.
-hpa
"H. Peter Anvin" <[email protected]> writes:
> Pallipadi, Venkatesh wrote:
>>>>
>>> I guess I'm a bit confused about the tradeoff of CLFLUSH versus simply
>>> disabling MWAIT. This is a relatively recent processor and so
>>> optimizing matters (if this was a P4 I would be more worried about what
>>> has least impact on the kernel as a whole.)
>> As of now this is only for a limited models of CPUs (MP CPUs)
>> and not a norm. It is a errata on this CPU and not
>> something that is going to become architectural.
>>
>
> Yes, I understand. However, there is still the option to disable
> MWAIT, and the question is still open if it makes sense.
This would mean falling back to the IO port interface for deeper
C states. Even with the CLFLUSH MWAIT is a much better (and I believe
faster) interface.
-Andi
--
[email protected]