2015-05-06 20:36:42

by Martin Steigerwald

[permalink] [raw]
Subject: [BUG] ThinkPad T520 overheating with P-State driver

Hello Kristen, hello,

Laptop overheats with Intel P-State driver like follows:

[ 6743.833543] CPU0: Package temperature above threshold, cpu clock
throttled (total events = 1)
[ 6743.833545] CPU1: Package temperature above threshold, cpu clock
throttled (total events = 1)
[ 6743.833567] CPU2: Package temperature above threshold, cpu clock
throttled (total events = 1)
[ 6743.833568] CPU3: Package temperature above threshold, cpu clock
throttled (total events = 1)
[ 6743.834580] CPU0: Package temperature/speed normal
[ 6743.834581] CPU1: Package temperature/speed normal
[ 6743.834607] CPU3: Package temperature/speed normal
[ 6743.834608] CPU2: Package temperature/speed normal

This happens on high cpu load and is easily triggerable by playing
PlaneShift or OpenMW.

Also reports a MCE machine check error sometimes.


This may partly be due to an aging hardware.


Yet, after I switched from Intel P-State driver to acpi-cpufreq driver the
issue got *much* better. I switched after I found that Intel P-State driver
doesn?t respect "noturbo" setting at all on this ThinkPad:

[Bug 97261] New: Intel P-State driver does not honor no_turbo
https://bugzilla.kernel.org/show_bug.cgi?id=97261

(I wanted to limit maximum performance in order to prevent the overheating)


I get frequencies like:

3080566
3068945
3009082
2999902

despite no_turbo setting. So it basically switches the complete dual core
hypertreading CPU into turbo mode. Despite not even all cores being used by
processes. PlaneShift basically runs single-threaded. Then there are some
other processes active in background from time to time, but but they do not
use the other core completely usually.


Yet with acpi-cpufreq without limiting maximum performance at all, I get the
following with the *same* workload:

Every 5,0s: cat cpu?/cpufreq/scaling_cur_freq ; sensors
Wed May 6 22:12:41 2015

2501000
2501000
1000000
1000000
acpitz-virtual-0
Adapter: Virtual device
temp1: +95.0?C (crit = +98.0?C)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +96.0?C (high = +86.0?C, crit = +100.0?C)
Core 0: +88.0?C (high = +86.0?C, crit = +100.0?C)
Core 1: +90.0?C (high = +86.0?C, crit = +100.0?C)

thinkpad-isa-0000
Adapter: ISA adapter
fan1: 3578 RPM



Every 5,0s: cat cpu?/cpufreq/scaling_cur_freq ; sensors
Wed May 6 22:22:22 2015

2501000
1000000
800000
1600000
acpitz-virtual-0
Adapter: Virtual device
temp1: +96.0?C (crit = +98.0?C)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +97.0?C (high = +86.0?C, crit = +100.0?C)
Core 0: +89.0?C (high = +86.0?C, crit = +100.0?C)
Core 1: +90.0?C (high = +86.0?C, crit = +100.0?C)

thinkpad-isa-0000
Adapter: ISA adapter
fan1: 3600 RPM


I barely see the Core temps go up to more than 92 degrees while with P-State
they were consistently hitting 96-98 degrees.

The performance is better and its overheating way less. It hit throttling
just twice so far, despite warmer room temperature today, while it basically
hits throttling to the extent PlaneShift or OpenMW become basically
unplayable with Intel P-State.

Its not perfect as I think it shouldn?t hit overheating at all, but well,
maybe thats aging hardware.


It is often said Intel P-State is technically better, but now I see that
acpi-cpufreq runs way better on my machine here.


martin@merkaba:~> phoronix-test-suite system-info

Phoronix Test Suite v5.2.1
System Information

Hardware:
Processor: Intel Core i5-2520M @ 2.50GHz (4 Cores), Motherboard: LENOVO
42433WG, Chipset: Intel 2nd Generation Core Family DRAM, Memory: 16384MB,
Disk: 300GB INTEL SSDSA2CW30 + 480GB Crucial_CT480M50, Graphics: Intel HD
3000 (1300MHz), Audio: Intel 6 /C200, Monitor: P24T-7 LED, Network: Intel
82579LM Gigabit Connection + Intel Centrino Advanced-N 6205

Software:
OS: Debian unstable, Kernel: 4.0.1-tp520-btrfs-trim-norace+ (x86_64),
Desktop: KDE 4.14.2, Display Server: X Server 1.16.4, Display Driver: intel
2.21.15, OpenGL: 3.3 Mesa 10.4.2, Compiler: GCC 4.9.2, File-System: btrfs,
Screen Resolution: 3840x1080


martin@merkaba:~> LANG=C lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 42
Model name: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
Stepping: 7
CPU MHz: 1200.000
CPU max MHz: 2501.0000
CPU min MHz: 800.0000
BogoMIPS: 4983.83
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
NUMA node0 CPU(s): 0-3



There is also some other bug report about this:

Please change intel_pstate default to disable
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1188647

appears to be quite old, but still seems unresolved.

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7


2015-05-07 02:59:39

by Doug Smythies

[permalink] [raw]
Subject: RE: [BUG] ThinkPad T520 overheating with P-State driver

On 2015.05.06 13:37 Martin Steigerwald wrote:

> I get frequencies like:
>
> 3080566
> 3068945
> 3009082
> 2999902

Please know that the intel_pstate driver reports actual CPU frequencies
over the last sample interval. In terms of heat, they don't mean anything,
you would have to look at how much time the CPU spent in the C0 and other states.

> Yet with acpi-cpufreq without limiting maximum performance at all, I get the
> following with the *same* workload:

> 2501000
> 2501000
> 1000000
> 1000000

Please know that the acpi-cpufreq driver reports what frequency (pstate)
It is asking for, not what might actually be happening.
If your CPU 2 and 3 were ever active during that sample time
They would be at the same frequency as your turbostate CPU 0 or 1,
which ever is higher.

> There is also some other bug report about this:
> Please change intel_pstate default to disable
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1188647
> appears to be quite old, but still seems unresolved.

That bug report is very old, and is closed.
I made a late entry on that bug report on 2014.06.08,
and have submitted a patch set to deal with, among other things,
the long duration issue.

2015-05-07 08:20:29

by Martin Steigerwald

[permalink] [raw]
Subject: Re: [BUG] ThinkPad T520 overheating with P-State driver

Am Mittwoch, 6. Mai 2015, 19:51:19 schrieb Doug Smythies:
> On 2015.05.06 13:37 Martin Steigerwald wrote:
> > I get frequencies like:
> >
> > 3080566
> > 3068945
> > 3009082
> > 2999902
>
> Please know that the intel_pstate driver reports actual CPU frequencies
> over the last sample interval. In terms of heat, they don't mean anything,
> you would have to look at how much time the CPU spent in the C0 and other
> states.

Okay, so that would be

merkaba:/sys/devices/system/cpu/cpu0/cpufreq> ls stats
time_in_state total_trans

for acpi-cpufreq, and for Intel P-State I suppose there is something
similar?

Powertop can show this as well I think. Can these value be resetted so I
could test just for this workload?

> > Yet with acpi-cpufreq without limiting maximum performance at all, I get
> > the following with the *same* workload:
> >
> > 2501000
> > 2501000
> > 1000000
> > 1000000
>
> Please know that the acpi-cpufreq driver reports what frequency (pstate)
> It is asking for, not what might actually be happening.
> If your CPU 2 and 3 were ever active during that sample time
> They would be at the same frequency as your turbostate CPU 0 or 1,
> which ever is higher.

Okay. So I can?t see anything from it.

What I experience tough it that the workload of playing PlaneShift works
*much* better with acpi-cpufreq on this machine.

I bet the machine may need a fan replacement, or cleaning, but last time I
looked it appeared to be clean enough. Yet there is a huge difference between
hitting forced throttling each 2 minutes or just three or four times during
an evening.

Also it stayed longer in forced throttling state, with acpi-cpufreq it was
there quite shortly. And the room was one degree warmer even.

So at least under my conditions acpi-cpufreq appears to work way better,
cause hitting forced throttling as much as it hits with P-State actually
makes things slower, way slower. In my understanding it should avoid hitting
this forced throttling.

I tried thermald then, but it injected idle states up to the point where the
machine became basically unusable.

I am quite full with things, but I once I have a bit more time I would like
to gather some debug data. Would state statistics be enough or would
anything else be needed?

Of course, if you just say, Intel P-State does a better job at ultilizing
maximum power and is not supposed to work on aged hardware with some cooling
deficiences, I can spare my time. Still then it would be good it it respected
the no_turbo setting.

Also it seems to somewhat respect max_perf_pct, but if there is work to do
it exceeds it as well. It would be good to have a way to tell it "Hey, this
machine is a bit old, it doesn?t do the cooling as well as it when it was
new and it could hit the CPU for 3,2 GHz (instead of 2,5 GHz) for half an
hour without ever overheating, so please be a bit more gentle with it, until
I eventually replaced fan or did whatever is needed to bring it back to full
cooling functionality".

> > There is also some other bug report about this:
> > Please change intel_pstate default to disable
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1188647
> > appears to be quite old, but still seems unresolved.
>
> That bug report is very old, and is closed.
> I made a late entry on that bug report on 2014.06.08,
> and have submitted a patch set to deal with, among other things,
> the long duration issue.

Oh, I thought it was just closed by disabling Intel P-State, I didn?t see
any actual fix to the issue in there. Ah, okay, your last comment mentioned
fixed, I was not sure whether they really fixed the issue from reading your
comment.

Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2015-05-07 14:47:46

by Doug Smythies

[permalink] [raw]
Subject: RE: [BUG] ThinkPad T520 overheating with P-State driver

On 2015.05.07 01:20 Martin Steigerwald wrote:
> Am Mittwoch, 6. Mai 2015, 19:51:19 schrieb Doug Smythies:
>> On 2015.05.06 13:37 Martin Steigerwald wrote:

Martin,

It would be best to continue with specifics of your actual
potential bug, in your bug report [1]. I'll reply
to some of your questions there.

>>> There is also some other bug report about this:
>>> Please change intel_pstate default to disable
>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1188647
>>> appears to be quite old, but still seems unresolved.
>>
>> That bug report is very old, and is closed.
>> I made a late entry on that bug report on 2014.06.08,
>> and have submitted a patch set to deal with, among other things,
>> the long duration issue.

> Oh, I thought it was just closed by disabling Intel P-State, I didn?t see
> any actual fix to the issue in there. Ah, okay, your last comment mentioned
> fixed, I was not sure whether they really fixed the issue from reading your
> comment.

The follow up bug report is this one:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1333322
"enable intel_pstate as default with thermald for x86"

[1] https://bugzilla.kernel.org/show_bug.cgi?id=97261