2014-04-09 00:17:21

by Brown, Len

[permalink] [raw]
Subject: RE: [tip:x86/urgent] x86 idle: Repair large-server 50-watt idle-power regression

Davidlohr,

Thanks for the note.

Ideally (on Linux in general, and on servers, in particular) we strive
for the performance impact of power saving features to be small enough
to be considered in "measurement noise".

Your report for 160 core Westmere AIM numbers being hit at 10-25%
shows 15% measurement noise? But even if true, this looks bad.

Any chance you can re-run, with the following two tweaks,
one at a time?

I'd be curious if you can wrap the invocation in turbostat -v
and capture that output to how what states we are seeing
during the benchmark run.

thanks,
-Len



#1: skip flush for C1

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index f80b700..6027d06 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -377,7 +377,7 @@ static int intel_idle(struct cpuidle_device *dev,

if (!current_set_polling_and_test()) {

- if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
+ if ((eax > 0) && this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
clflush((void *)&current_thread_info()->flags);

__monitor((void *)&current_thread_info()->flags, 0, 0);


#2: skip flush for C1 and C1E

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index f80b700..6027d06 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -377,7 +377,7 @@ static int intel_idle(struct cpuidle_device *dev,

if (!current_set_polling_and_test()) {

- if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
+ if ((eax > 1) && this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
clflush((void *)&current_thread_info()->flags);

__monitor((void *)&current_thread_info()->flags, 0, 0);


????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?


2014-04-09 08:18:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86 idle: Repair large-server 50-watt idle-power regression

On Tue, Apr 08, 2014 at 09:43:54PM +0000, Brown, Len wrote:
> #1: skip flush for C1
> #2: skip flush for C1 and C1E

One stray thought I had: IFF the erratum is due to the cache flushing of
the higher C states, these two patches should still be correct while
also (possibly) avoiding the performance penalty, since C1* doesn't
flush caches yet.

Of course, I've no clue if this is indeed the root cause of this
particular erratum and I'm not sure you guys are allowed to say even if
you would know :/

2014-04-15 03:27:30

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86 idle: Repair large-server 50-watt idle-power regression

On Tue, 2014-04-08 at 21:43 +0000, Brown, Len wrote:
> Davidlohr,
>
> Thanks for the note.
>
> Ideally (on Linux in general, and on servers, in particular) we strive
> for the performance impact of power saving features to be small enough
> to be considered in "measurement noise".
>
> Your report for 160 core Westmere AIM numbers being hit at 10-25%
> shows 15% measurement noise? But even if true, this looks bad.
>
> Any chance you can re-run, with the following two tweaks,
> one at a time?
>
> I'd be curious if you can wrap the invocation in turbostat -v
> and capture that output to how what states we are seeing
> during the benchmark run.

Hi Len, apologies for the delay, I've been having to use my machine for
other things and haven't gotten time to get around to this yet. I'll get
you the requested information sometime this week.