2006-10-12 22:17:41

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Strange entries in /proc/acpi/thermal_zone for Thinkpad X60

I have a Thinkpad X60 with an Intel Core Duo T2400. In
/proc/acpi/thermal_zone, I'm getting two subdirectories, each with their
own set of files:

/proc/acpi/thermal_zone/THM0/cooling_mode:
<setting not supported>
cooling mode: critical

/proc/acpi/thermal_zone/THM0/polling_frequency:
<polling disabled>

/proc/acpi/thermal_zone/THM0/state:
state: ok

/proc/acpi/thermal_zone/THM0/temperature:
temperature: 53 C

/proc/acpi/thermal_zone/THM0/trip_points:
critical (S5): 127 C


/proc/acpi/thermal_zone/THM1/cooling_mode:
<setting not supported>
cooling mode: passive

/proc/acpi/thermal_zone/THM1/polling_frequency:
<polling disabled>

/proc/acpi/thermal_zone/THM1/state:
state: ok

/proc/acpi/thermal_zone/THM1/temperature:
temperature: 53 C

/proc/acpi/thermal_zone/THM1/trip_points:
critical (S5): 97 C
passive: 93 C: tc1=5 tc2=4 tsp=600 devices=0xf7eaa264 0xf7eaa244


The interesting thing is that the two sets of files are not consistent -
sometimes they don't even show the same temperature.

The reason I'm interested in this is that I think it's behind some of my
cpufreq problems. Sometimes the kernel decides that I just can't raise
the max frequency above 1GHz, because its been thermally limited (I've
put printks in to confirm that its the ACPI thermal limit on the policy
notifier chain which is limiting the max speed). It seems to me that
having a thermal zone for each core is a BIOS bug, since they're really
the same chip, but the THM1 entries should be ignored. I don't believe
the CPU has ever approached either 97 C, let alone 127; while I put it
under a fair amount of load, it is sitting on a desktop with no airflow
obstructions, so if it really is overheating it suggests a serious
design problem with the hardware.

But I'm just speculating; I'm not really sure what all this means. Any
clues?

Thanks,
J


2006-10-13 03:59:35

by Robert Hancock

[permalink] [raw]
Subject: Re: Strange entries in /proc/acpi/thermal_zone for Thinkpad X60

Jeremy Fitzhardinge wrote:
> I have a Thinkpad X60 with an Intel Core Duo T2400. In
> /proc/acpi/thermal_zone, I'm getting two subdirectories, each with their
> own set of files:
>

So your machine has two thermal zones..

> The interesting thing is that the two sets of files are not consistent -
> sometimes they don't even show the same temperature.

I would expect they wouldn't, otherwise there would be no reason for the
BIOS people to set up two thermal zones..

>
> The reason I'm interested in this is that I think it's behind some of my
> cpufreq problems. Sometimes the kernel decides that I just can't raise
> the max frequency above 1GHz, because its been thermally limited (I've
> put printks in to confirm that its the ACPI thermal limit on the policy
> notifier chain which is limiting the max speed). It seems to me that
> having a thermal zone for each core is a BIOS bug, since they're really
> the same chip, but the THM1 entries should be ignored. I don't believe

How do you know they are one for each core? ACPI thermal zones can be
anywhere in the machine that needs OS-controlled cooling. Could be the
CPU heatsink, voltage regulator, or someplace else.

> the CPU has ever approached either 97 C, let alone 127; while I put it
> under a fair amount of load, it is sitting on a desktop with no airflow
> obstructions, so if it really is overheating it suggests a serious
> design problem with the hardware.
>
> But I'm just speculating; I'm not really sure what all this means. Any
> clues?

I think we need more information to decide what is going on here.. what
temperatures are registering in the thermal zones when the CPU clock is
being limited?

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2006-10-13 04:25:48

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: Strange entries in /proc/acpi/thermal_zone for Thinkpad X60

Robert Hancock wrote:
> I would expect they wouldn't, otherwise there would be no reason for
> the BIOS people to set up two thermal zones..

Ah, OK. I misunderstood what thermal zones are.

> How do you know they are one for each core? ACPI thermal zones can be
> anywhere in the machine that needs OS-controlled cooling. Could be the
> CPU heatsink, voltage regulator, or someplace else.

Right, bad assumption on my part. Is there any way to find out what
they might correspond to? /proc/acpi/ibm/thermal has a bunch of
temperature-like numbers in them; I guess there should be some
correlation between those and the thermal zones.

> I think we need more information to decide what is going on here..
> what temperatures are registering in the thermal zones when the CPU
> clock is being limited?

I'll gather a bit more info.

J

2006-10-13 04:51:01

by Brown, Len

[permalink] [raw]
Subject: Re: Strange entries in /proc/acpi/thermal_zone for Thinkpad X60

On Thursday 12 October 2006 18:19, Jeremy Fitzhardinge wrote:
> I have a Thinkpad X60 with an Intel Core Duo T2400. In
> /proc/acpi/thermal_zone, I'm getting two subdirectories, each with their
> own set of files:
>
> /proc/acpi/thermal_zone/THM0/cooling_mode:
> <setting not supported>
> cooling mode: critical
>
> /proc/acpi/thermal_zone/THM0/polling_frequency:
> <polling disabled>
>
> /proc/acpi/thermal_zone/THM0/state:
> state: ok
>
> /proc/acpi/thermal_zone/THM0/temperature:
> temperature: 53 C
>
> /proc/acpi/thermal_zone/THM0/trip_points:
> critical (S5): 127 C


This means that if THM0 reaches 127, your system will shut down.
You don't have much control over this one -- but you could probably
lower the temperature to do a critical shut-down earlier with something like this:

echo -n "126:125:124:123:122" >trip_points
note that the 1st is the critical one, and the others are hot:passive:active:active place holders.

>
> /proc/acpi/thermal_zone/THM1/cooling_mode:
> <setting not supported>
> cooling mode: passive
>
> /proc/acpi/thermal_zone/THM1/polling_frequency:
> <polling disabled>
>
> /proc/acpi/thermal_zone/THM1/state:
> state: ok
>
> /proc/acpi/thermal_zone/THM1/temperature:
> temperature: 53 C
>
> /proc/acpi/thermal_zone/THM1/trip_points:
> critical (S5): 97 C
> passive: 93 C: tc1=5 tc2=4 tsp=600 devices=0xf7eaa264 0xf7eaa244

You are not given the opportunity to set the active trip points here.
Looks like you have just a passive trip point at 93 and we would
expect to throttle when we go above 93.
Presumably some other method should be kicking in the fans before
this passive point is reached.

The theory is that...
If the fans kicked in earlier than you liked, you should be able to lower
the passive trip point to below that temperature to make the system
throttle before the fans kick in.

But probably the root cause of your issue is that the fans are _not_ kicking in...

for grins you can probably raise the passive point with something like this

# echo -n "98:97:96:53:45" > trip_points

but it seems that you are doing passive cooling way before you
get anywhere near 93, so that is the mystery.

-Len

>
>
> The interesting thing is that the two sets of files are not consistent -
> sometimes they don't even show the same temperature.
>
> The reason I'm interested in this is that I think it's behind some of my
> cpufreq problems. Sometimes the kernel decides that I just can't raise
> the max frequency above 1GHz, because its been thermally limited (I've
> put printks in to confirm that its the ACPI thermal limit on the policy
> notifier chain which is limiting the max speed). It seems to me that
> having a thermal zone for each core is a BIOS bug, since they're really
> the same chip, but the THM1 entries should be ignored. I don't believe
> the CPU has ever approached either 97 C, let alone 127; while I put it
> under a fair amount of load, it is sitting on a desktop with no airflow
> obstructions, so if it really is overheating it suggests a serious
> design problem with the hardware.
>
> But I'm just speculating; I'm not really sure what all this means. Any
> clues?
>
> Thanks,
> J
>
> _______________________________________________
> Cpufreq mailing list
> [email protected]
> http://lists.linux.org.uk/mailman/listinfo/cpufreq
>

2006-10-13 14:45:12

by Pavel Machek

[permalink] [raw]
Subject: Re: Strange entries in /proc/acpi/thermal_zone for Thinkpad X60

Hi!

> I have a Thinkpad X60 with an Intel Core Duo T2400. In
> /proc/acpi/thermal_zone, I'm getting two subdirectories,
> each with their own set of files:

Looks okay to me. One thermal zone is cpu temperature, and second is
temperature of something else.

> The interesting thing is that the two sets of files are
> not consistent - sometimes they don't even show the same
> temperature.

You have two (actually you have more, see tp_smapi) physical
thermometers.

> The reason I'm interested in this is that I think it's
> behind some of my cpufreq problems. Sometimes the
> kernel decides that I just can't raise the max frequency
> above 1GHz, because its been thermally limited (I've put
> printks in to confirm that its the ACPI thermal limit on
> the policy notifier chain which is limiting the max
> speed). It seems to me that having a thermal zone for
> each core is a BIOS bug, since they're really the same
> chip, but the THM1 entries should be ignored. I don't

THM1 does not seem to be cpu temperature.

Pavel
--
Thanks for all the (sleeping) penguins.

2006-10-13 18:10:50

by Tomasz Torcz

[permalink] [raw]
Subject: Re: Strange entries in /proc/acpi/thermal_zone for Thinkpad X60

On Thu, Oct 12, 2006 at 09:28:12PM -0700, Jeremy Fitzhardinge wrote:
> Robert Hancock wrote:
> >I would expect they wouldn't, otherwise there would be no reason for
> >the BIOS people to set up two thermal zones..
>
> Ah, OK. I misunderstood what thermal zones are.
>
> >How do you know they are one for each core? ACPI thermal zones can be
> >anywhere in the machine that needs OS-controlled cooling. Could be the
> >CPU heatsink, voltage regulator, or someplace else.
>
> Right, bad assumption on my part. Is there any way to find out what
> they might correspond to? /proc/acpi/ibm/thermal has a bunch of
> temperature-like numbers in them; I guess there should be some
> correlation between those and the thermal zones.

There are many temperature sensors in Thinkpads. There's even map of
them somewhere on http://www.thinkwiki.org.

--
Tomasz Torcz "Funeral in the morning, IDE hacking
[email protected] in the afternoon and evening." - Alan Cox


Attachments:
(No filename) (994.00 B)
(No filename) (229.00 B)
Download all attachments