2013-05-15 04:19:03

by Sonny Rao

[permalink] [raw]
Subject: Fans at full speed after resume

Hi, I've seen a regression in kernels since 3.7 on x86 devices where
the kernel turns the system fans on to max speed after resuming from
ram. Other people have noticed it as well, for example see
https://bugzilla.redhat.com/show_bug.cgi?id=895276

For example on the Samsung 550 Chromebook, we have one thermal zone
and have 5 cooling_devices, 0-4, which correspond to 5 possible fan
speeds. Under typical idle, only cooling_device4 and maybe
cooling_device3 are active, depending on temperature:

cat /sys/class/thermal/cooling_device[01234]/cur_state
/sys/class/thermal/thermal_zone0/temp
0
0
0
0
1
57000

however after a suspend/resume, we see that cooling_devices 0 and 1
become active:
cat /sys/class/thermal/cooling_device[01234]/cur_state
/sys/class/thermal/thermal_zone0/temp
1
1
0
0
1
54000

and it seems to stay that way, even though the temperature is low
enough that the fan shouldn't be running at that speed. If I manually
disable cooling_devices 0 and 1 then fan control works normally again.

I started bisecting it and was able to do so up until this commit:
commit 29b19e250434c6193c8b8e4c34c9c6284dd4f101
Merge: 125c4c7 c072fed
Author: Len Brown <[email protected]>
AuthorDate: Tue Oct 9 01:35:52 2012 -0400
Commit: Len Brown <[email protected]>
CommitDate: Tue Oct 9 01:35:52 2012 -0400

Merge branch 'release' of
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux into
thermal

unfortunately, I'm not able to successfully do a suspend/resume on the
commits in that merge, so I wasn't able to bisect down to the exact
commit.

I did confirm that one parent of the merge is okay: commit
125c4c706b680c7831f0966ff873c1ad0354ec25 idr: rename MAX_LEVEL to
MAX_IDR_LEVEL

so I think it falls somewhere in this list of commits:
c072fed95c9855a920c114d7fa3351f0f54ea06e...e3f25e6e5836c4790fbe395ff42e241f372d859d

c072fed9 thermal: Exynos: Fix NULL pointer dereference in
exynos_unregister_thermal()
a4b6fec9 Thermal: Fix bug on cpu_cooling, cooling device's id conflict problem.
79e093c3 thermal: exynos: Use devm_* functions
17be868e ARM: exynos: add thermal sensor driver platform data support
7e0b55e6 thermal: exynos: register the tmu sensor with the kernel thermal layer
f22d9c03c thermal: exynos5: add exynos5250 thermal sensor driver support
c48cbba6 hwmon: exynos4: move thermal sensor driver to driver/thermal directory
02361418 thermal: add generic cpufreq cooling implementation
a7a3b8c8 Fix a build error.
204dd1d3 thermal: Fix potential NULL pointer accesses
1e426ffdd thermal: add Renesas R-Car thermal sensor support
79a49168 thermal: fix potential out-of-bounds memory access
f4a821ce6 Thermal: Introduce locking for cdev.thermal_instances list.
908b9fb79 Thermal: Unify the code for both active and passive cooling
ce119f832 Thermal: Introduce simple arbitrator for setting device cooling state
b5e4ae62 Thermal: List thermal_instance in thermal_cooling_device.
cddf31b3b Thermal: Rename thermal_instance.node to thermal_instance.tz_node.
2d374139 Thermal: Rename thermal_zone_device.cooling_devices
b81b6ba3 Thermal: rename structure thermal_cooling_device_instance to
thermal_instance
4ae46befb Thermal: Introduce thermal_zone_trip_update()
1b7ddb84 Thermal: Remove tc1/tc2 in generic thermal layer.
601f3d424 Thermal: Introduce .get_trend() callback.
9d99842f9 Thermal: set upper and lower limits
74051ba5 Thermal: Introduce cooling states range support

When I get time, I'll try to rebase those commits onto the IDR commit
and see if I can get a better bisect. Any insights into the problem
would be appreciated, thanks.


2013-05-15 04:26:45

by Zhang, Rui

[permalink] [raw]
Subject: Re: Fans at full speed after resume

please

On Tue, 2013-05-14 at 21:18 -0700, Sonny Rao wrote:
> Hi, I've seen a regression in kernels since 3.7 on x86 devices where
> the kernel turns the system fans on to max speed after resuming from
> ram. Other people have noticed it as well, for example see
> https://bugzilla.redhat.com/show_bug.cgi?id=895276
>
please check if this is a duplicate of bug
https://bugzilla.kernel.org/show_bug.cgi?id=56591
> For example on the Samsung 550 Chromebook, we have one thermal zone
> and have 5 cooling_devices, 0-4, which correspond to 5 possible fan
> speeds. Under typical idle, only cooling_device4 and maybe
> cooling_device3 are active, depending on temperature:
>
> cat /sys/class/thermal/cooling_device[01234]/cur_state
> /sys/class/thermal/thermal_zone0/temp
> 0
> 0
> 0
> 0
> 1
> 57000
>
> however after a suspend/resume, we see that cooling_devices 0 and 1
> become active:
> cat /sys/class/thermal/cooling_device[01234]/cur_state
> /sys/class/thermal/thermal_zone0/temp
> 1
> 1
> 0
> 0
> 1
> 54000
>
> and it seems to stay that way, even though the temperature is low
> enough that the fan shouldn't be running at that speed. If I manually
> disable cooling_devices 0 and 1 then fan control works normally again.
>
> I started bisecting it and was able to do so up until this commit:
> commit 29b19e250434c6193c8b8e4c34c9c6284dd4f101
> Merge: 125c4c7 c072fed
> Author: Len Brown <[email protected]>
> AuthorDate: Tue Oct 9 01:35:52 2012 -0400
> Commit: Len Brown <[email protected]>
> CommitDate: Tue Oct 9 01:35:52 2012 -0400
>
> Merge branch 'release' of
> git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux into
> thermal
>
> unfortunately, I'm not able to successfully do a suspend/resume on the
> commits in that merge, so I wasn't able to bisect down to the exact
> commit.
>
> I did confirm that one parent of the merge is okay: commit
> 125c4c706b680c7831f0966ff873c1ad0354ec25 idr: rename MAX_LEVEL to
> MAX_IDR_LEVEL
>
> so I think it falls somewhere in this list of commits:
> c072fed95c9855a920c114d7fa3351f0f54ea06e...e3f25e6e5836c4790fbe395ff42e241f372d859d
>
> c072fed9 thermal: Exynos: Fix NULL pointer dereference in
> exynos_unregister_thermal()
> a4b6fec9 Thermal: Fix bug on cpu_cooling, cooling device's id conflict problem.
> 79e093c3 thermal: exynos: Use devm_* functions
> 17be868e ARM: exynos: add thermal sensor driver platform data support
> 7e0b55e6 thermal: exynos: register the tmu sensor with the kernel thermal layer
> f22d9c03c thermal: exynos5: add exynos5250 thermal sensor driver support
> c48cbba6 hwmon: exynos4: move thermal sensor driver to driver/thermal directory
> 02361418 thermal: add generic cpufreq cooling implementation
> a7a3b8c8 Fix a build error.
> 204dd1d3 thermal: Fix potential NULL pointer accesses
> 1e426ffdd thermal: add Renesas R-Car thermal sensor support
> 79a49168 thermal: fix potential out-of-bounds memory access
> f4a821ce6 Thermal: Introduce locking for cdev.thermal_instances list.
> 908b9fb79 Thermal: Unify the code for both active and passive cooling
> ce119f832 Thermal: Introduce simple arbitrator for setting device cooling state
> b5e4ae62 Thermal: List thermal_instance in thermal_cooling_device.
> cddf31b3b Thermal: Rename thermal_instance.node to thermal_instance.tz_node.
> 2d374139 Thermal: Rename thermal_zone_device.cooling_devices
> b81b6ba3 Thermal: rename structure thermal_cooling_device_instance to
> thermal_instance
> 4ae46befb Thermal: Introduce thermal_zone_trip_update()
> 1b7ddb84 Thermal: Remove tc1/tc2 in generic thermal layer.
> 601f3d424 Thermal: Introduce .get_trend() callback.
> 9d99842f9 Thermal: set upper and lower limits
> 74051ba5 Thermal: Introduce cooling states range support
>
> When I get time, I'll try to rebase those commits onto the IDR commit
> and see if I can get a better bisect. Any insights into the problem
> would be appreciated, thanks.

2013-05-15 04:29:59

by Zhang, Rui

[permalink] [raw]
Subject: Re: Fans at full speed after resume

On Wed, 2013-05-15 at 12:26 +0800, Zhang Rui wrote:
> please
>
> On Tue, 2013-05-14 at 21:18 -0700, Sonny Rao wrote:
> > Hi, I've seen a regression in kernels since 3.7 on x86 devices where
> > the kernel turns the system fans on to max speed after resuming from
> > ram. Other people have noticed it as well, for example see
> > https://bugzilla.redhat.com/show_bug.cgi?id=895276
> >
> please check if this is a duplicate of bug
> https://bugzilla.kernel.org/show_bug.cgi?id=56591
or you can try 3.10-rc1 to see if the problem still exists or not.

thanks,
rui
> > For example on the Samsung 550 Chromebook, we have one thermal zone
> > and have 5 cooling_devices, 0-4, which correspond to 5 possible fan
> > speeds. Under typical idle, only cooling_device4 and maybe
> > cooling_device3 are active, depending on temperature:
> >
> > cat /sys/class/thermal/cooling_device[01234]/cur_state
> > /sys/class/thermal/thermal_zone0/temp
> > 0
> > 0
> > 0
> > 0
> > 1
> > 57000
> >
> > however after a suspend/resume, we see that cooling_devices 0 and 1
> > become active:
> > cat /sys/class/thermal/cooling_device[01234]/cur_state
> > /sys/class/thermal/thermal_zone0/temp
> > 1
> > 1
> > 0
> > 0
> > 1
> > 54000
> >
> > and it seems to stay that way, even though the temperature is low
> > enough that the fan shouldn't be running at that speed. If I manually
> > disable cooling_devices 0 and 1 then fan control works normally again.
> >
> > I started bisecting it and was able to do so up until this commit:
> > commit 29b19e250434c6193c8b8e4c34c9c6284dd4f101
> > Merge: 125c4c7 c072fed
> > Author: Len Brown <[email protected]>
> > AuthorDate: Tue Oct 9 01:35:52 2012 -0400
> > Commit: Len Brown <[email protected]>
> > CommitDate: Tue Oct 9 01:35:52 2012 -0400
> >
> > Merge branch 'release' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux into
> > thermal
> >
> > unfortunately, I'm not able to successfully do a suspend/resume on the
> > commits in that merge, so I wasn't able to bisect down to the exact
> > commit.
> >
> > I did confirm that one parent of the merge is okay: commit
> > 125c4c706b680c7831f0966ff873c1ad0354ec25 idr: rename MAX_LEVEL to
> > MAX_IDR_LEVEL
> >
> > so I think it falls somewhere in this list of commits:
> > c072fed95c9855a920c114d7fa3351f0f54ea06e...e3f25e6e5836c4790fbe395ff42e241f372d859d
> >
> > c072fed9 thermal: Exynos: Fix NULL pointer dereference in
> > exynos_unregister_thermal()
> > a4b6fec9 Thermal: Fix bug on cpu_cooling, cooling device's id conflict problem.
> > 79e093c3 thermal: exynos: Use devm_* functions
> > 17be868e ARM: exynos: add thermal sensor driver platform data support
> > 7e0b55e6 thermal: exynos: register the tmu sensor with the kernel thermal layer
> > f22d9c03c thermal: exynos5: add exynos5250 thermal sensor driver support
> > c48cbba6 hwmon: exynos4: move thermal sensor driver to driver/thermal directory
> > 02361418 thermal: add generic cpufreq cooling implementation
> > a7a3b8c8 Fix a build error.
> > 204dd1d3 thermal: Fix potential NULL pointer accesses
> > 1e426ffdd thermal: add Renesas R-Car thermal sensor support
> > 79a49168 thermal: fix potential out-of-bounds memory access
> > f4a821ce6 Thermal: Introduce locking for cdev.thermal_instances list.
> > 908b9fb79 Thermal: Unify the code for both active and passive cooling
> > ce119f832 Thermal: Introduce simple arbitrator for setting device cooling state
> > b5e4ae62 Thermal: List thermal_instance in thermal_cooling_device.
> > cddf31b3b Thermal: Rename thermal_instance.node to thermal_instance.tz_node.
> > 2d374139 Thermal: Rename thermal_zone_device.cooling_devices
> > b81b6ba3 Thermal: rename structure thermal_cooling_device_instance to
> > thermal_instance
> > 4ae46befb Thermal: Introduce thermal_zone_trip_update()
> > 1b7ddb84 Thermal: Remove tc1/tc2 in generic thermal layer.
> > 601f3d424 Thermal: Introduce .get_trend() callback.
> > 9d99842f9 Thermal: set upper and lower limits
> > 74051ba5 Thermal: Introduce cooling states range support
> >
> > When I get time, I'll try to rebase those commits onto the IDR commit
> > and see if I can get a better bisect. Any insights into the problem
> > would be appreciated, thanks.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-05-15 04:34:54

by Olof Johansson

[permalink] [raw]
Subject: Re: Fans at full speed after resume

2013/5/14 Zhang Rui <[email protected]>
>
> On Wed, 2013-05-15 at 12:26 +0800, Zhang Rui wrote:
> > please
> >
> > On Tue, 2013-05-14 at 21:18 -0700, Sonny Rao wrote:
> > > Hi, I've seen a regression in kernels since 3.7 on x86 devices where
> > > the kernel turns the system fans on to max speed after resuming from
> > > ram. Other people have noticed it as well, for example see
> > > https://bugzilla.redhat.com/show_bug.cgi?id=895276
> > >
> > please check if this is a duplicate of bug
> > https://bugzilla.kernel.org/show_bug.cgi?id=56591
> or you can try 3.10-rc1 to see if the problem still exists or not.
>

Or 3.8.12, from the looks of it (we're on 3.8.11 now, d'oh!).

-Olof

2013-05-15 04:35:21

by Sonny Rao

[permalink] [raw]
Subject: Re: Fans at full speed after resume

On Tue, May 14, 2013 at 9:29 PM, Zhang Rui <[email protected]> wrote:
> On Wed, 2013-05-15 at 12:26 +0800, Zhang Rui wrote:
>> please
>>
>> On Tue, 2013-05-14 at 21:18 -0700, Sonny Rao wrote:
>> > Hi, I've seen a regression in kernels since 3.7 on x86 devices where
>> > the kernel turns the system fans on to max speed after resuming from
>> > ram. Other people have noticed it as well, for example see
>> > https://bugzilla.redhat.com/show_bug.cgi?id=895276
>> >
>> please check if this is a duplicate of bug
>> https://bugzilla.kernel.org/show_bug.cgi?id=56591
> or you can try 3.10-rc1 to see if the problem still exists or not.

Ok, I patched in the fix from that bugzilla --
928c5edbe6f7cb0d1c71bc2353d091bc5b114fe3
but I'm still seeing the issue, I'll try 3.10-rc1 next

>
> thanks,
> rui
>> > For example on the Samsung 550 Chromebook, we have one thermal zone
>> > and have 5 cooling_devices, 0-4, which correspond to 5 possible fan
>> > speeds. Under typical idle, only cooling_device4 and maybe
>> > cooling_device3 are active, depending on temperature:
>> >
>> > cat /sys/class/thermal/cooling_device[01234]/cur_state
>> > /sys/class/thermal/thermal_zone0/temp
>> > 0
>> > 0
>> > 0
>> > 0
>> > 1
>> > 57000
>> >
>> > however after a suspend/resume, we see that cooling_devices 0 and 1
>> > become active:
>> > cat /sys/class/thermal/cooling_device[01234]/cur_state
>> > /sys/class/thermal/thermal_zone0/temp
>> > 1
>> > 1
>> > 0
>> > 0
>> > 1
>> > 54000
>> >
>> > and it seems to stay that way, even though the temperature is low
>> > enough that the fan shouldn't be running at that speed. If I manually
>> > disable cooling_devices 0 and 1 then fan control works normally again.
>> >
>> > I started bisecting it and was able to do so up until this commit:
>> > commit 29b19e250434c6193c8b8e4c34c9c6284dd4f101
>> > Merge: 125c4c7 c072fed
>> > Author: Len Brown <[email protected]>
>> > AuthorDate: Tue Oct 9 01:35:52 2012 -0400
>> > Commit: Len Brown <[email protected]>
>> > CommitDate: Tue Oct 9 01:35:52 2012 -0400
>> >
>> > Merge branch 'release' of
>> > git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux into
>> > thermal
>> >
>> > unfortunately, I'm not able to successfully do a suspend/resume on the
>> > commits in that merge, so I wasn't able to bisect down to the exact
>> > commit.
>> >
>> > I did confirm that one parent of the merge is okay: commit
>> > 125c4c706b680c7831f0966ff873c1ad0354ec25 idr: rename MAX_LEVEL to
>> > MAX_IDR_LEVEL
>> >
>> > so I think it falls somewhere in this list of commits:
>> > c072fed95c9855a920c114d7fa3351f0f54ea06e...e3f25e6e5836c4790fbe395ff42e241f372d859d
>> >
>> > c072fed9 thermal: Exynos: Fix NULL pointer dereference in
>> > exynos_unregister_thermal()
>> > a4b6fec9 Thermal: Fix bug on cpu_cooling, cooling device's id conflict problem.
>> > 79e093c3 thermal: exynos: Use devm_* functions
>> > 17be868e ARM: exynos: add thermal sensor driver platform data support
>> > 7e0b55e6 thermal: exynos: register the tmu sensor with the kernel thermal layer
>> > f22d9c03c thermal: exynos5: add exynos5250 thermal sensor driver support
>> > c48cbba6 hwmon: exynos4: move thermal sensor driver to driver/thermal directory
>> > 02361418 thermal: add generic cpufreq cooling implementation
>> > a7a3b8c8 Fix a build error.
>> > 204dd1d3 thermal: Fix potential NULL pointer accesses
>> > 1e426ffdd thermal: add Renesas R-Car thermal sensor support
>> > 79a49168 thermal: fix potential out-of-bounds memory access
>> > f4a821ce6 Thermal: Introduce locking for cdev.thermal_instances list.
>> > 908b9fb79 Thermal: Unify the code for both active and passive cooling
>> > ce119f832 Thermal: Introduce simple arbitrator for setting device cooling state
>> > b5e4ae62 Thermal: List thermal_instance in thermal_cooling_device.
>> > cddf31b3b Thermal: Rename thermal_instance.node to thermal_instance.tz_node.
>> > 2d374139 Thermal: Rename thermal_zone_device.cooling_devices
>> > b81b6ba3 Thermal: rename structure thermal_cooling_device_instance to
>> > thermal_instance
>> > 4ae46befb Thermal: Introduce thermal_zone_trip_update()
>> > 1b7ddb84 Thermal: Remove tc1/tc2 in generic thermal layer.
>> > 601f3d424 Thermal: Introduce .get_trend() callback.
>> > 9d99842f9 Thermal: set upper and lower limits
>> > 74051ba5 Thermal: Introduce cooling states range support
>> >
>> > When I get time, I'll try to rebase those commits onto the IDR commit
>> > and see if I can get a better bisect. Any insights into the problem
>> > would be appreciated, thanks.
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

2013-05-15 04:56:56

by Sonny Rao

[permalink] [raw]
Subject: Re: Fans at full speed after resume

On Tue, May 14, 2013 at 9:34 PM, Sonny Rao <[email protected]> wrote:
> On Tue, May 14, 2013 at 9:29 PM, Zhang Rui <[email protected]> wrote:
>> On Wed, 2013-05-15 at 12:26 +0800, Zhang Rui wrote:
>>> please
>>>
>>> On Tue, 2013-05-14 at 21:18 -0700, Sonny Rao wrote:
>>> > Hi, I've seen a regression in kernels since 3.7 on x86 devices where
>>> > the kernel turns the system fans on to max speed after resuming from
>>> > ram. Other people have noticed it as well, for example see
>>> > https://bugzilla.redhat.com/show_bug.cgi?id=895276
>>> >
>>> please check if this is a duplicate of bug
>>> https://bugzilla.kernel.org/show_bug.cgi?id=56591
>> or you can try 3.10-rc1 to see if the problem still exists or not.
>
> Ok, I patched in the fix from that bugzilla --
> 928c5edbe6f7cb0d1c71bc2353d091bc5b114fe3
> but I'm still seeing the issue, I'll try 3.10-rc1 next
>

3.10-rc1 seems good
3.9.2 is okay, though fans do seem to be on more for a while after
resume, it eventually turns off
3.8.13 seems to still be broken, with fans at maximum

>>
>> thanks,
>> rui
>>> > For example on the Samsung 550 Chromebook, we have one thermal zone
>>> > and have 5 cooling_devices, 0-4, which correspond to 5 possible fan
>>> > speeds. Under typical idle, only cooling_device4 and maybe
>>> > cooling_device3 are active, depending on temperature:
>>> >
>>> > cat /sys/class/thermal/cooling_device[01234]/cur_state
>>> > /sys/class/thermal/thermal_zone0/temp
>>> > 0
>>> > 0
>>> > 0
>>> > 0
>>> > 1
>>> > 57000
>>> >
>>> > however after a suspend/resume, we see that cooling_devices 0 and 1
>>> > become active:
>>> > cat /sys/class/thermal/cooling_device[01234]/cur_state
>>> > /sys/class/thermal/thermal_zone0/temp
>>> > 1
>>> > 1
>>> > 0
>>> > 0
>>> > 1
>>> > 54000
>>> >
>>> > and it seems to stay that way, even though the temperature is low
>>> > enough that the fan shouldn't be running at that speed. If I manually
>>> > disable cooling_devices 0 and 1 then fan control works normally again.
>>> >
>>> > I started bisecting it and was able to do so up until this commit:
>>> > commit 29b19e250434c6193c8b8e4c34c9c6284dd4f101
>>> > Merge: 125c4c7 c072fed
>>> > Author: Len Brown <[email protected]>
>>> > AuthorDate: Tue Oct 9 01:35:52 2012 -0400
>>> > Commit: Len Brown <[email protected]>
>>> > CommitDate: Tue Oct 9 01:35:52 2012 -0400
>>> >
>>> > Merge branch 'release' of
>>> > git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux into
>>> > thermal
>>> >
>>> > unfortunately, I'm not able to successfully do a suspend/resume on the
>>> > commits in that merge, so I wasn't able to bisect down to the exact
>>> > commit.
>>> >
>>> > I did confirm that one parent of the merge is okay: commit
>>> > 125c4c706b680c7831f0966ff873c1ad0354ec25 idr: rename MAX_LEVEL to
>>> > MAX_IDR_LEVEL
>>> >
>>> > so I think it falls somewhere in this list of commits:
>>> > c072fed95c9855a920c114d7fa3351f0f54ea06e...e3f25e6e5836c4790fbe395ff42e241f372d859d
>>> >
>>> > c072fed9 thermal: Exynos: Fix NULL pointer dereference in
>>> > exynos_unregister_thermal()
>>> > a4b6fec9 Thermal: Fix bug on cpu_cooling, cooling device's id conflict problem.
>>> > 79e093c3 thermal: exynos: Use devm_* functions
>>> > 17be868e ARM: exynos: add thermal sensor driver platform data support
>>> > 7e0b55e6 thermal: exynos: register the tmu sensor with the kernel thermal layer
>>> > f22d9c03c thermal: exynos5: add exynos5250 thermal sensor driver support
>>> > c48cbba6 hwmon: exynos4: move thermal sensor driver to driver/thermal directory
>>> > 02361418 thermal: add generic cpufreq cooling implementation
>>> > a7a3b8c8 Fix a build error.
>>> > 204dd1d3 thermal: Fix potential NULL pointer accesses
>>> > 1e426ffdd thermal: add Renesas R-Car thermal sensor support
>>> > 79a49168 thermal: fix potential out-of-bounds memory access
>>> > f4a821ce6 Thermal: Introduce locking for cdev.thermal_instances list.
>>> > 908b9fb79 Thermal: Unify the code for both active and passive cooling
>>> > ce119f832 Thermal: Introduce simple arbitrator for setting device cooling state
>>> > b5e4ae62 Thermal: List thermal_instance in thermal_cooling_device.
>>> > cddf31b3b Thermal: Rename thermal_instance.node to thermal_instance.tz_node.
>>> > 2d374139 Thermal: Rename thermal_zone_device.cooling_devices
>>> > b81b6ba3 Thermal: rename structure thermal_cooling_device_instance to
>>> > thermal_instance
>>> > 4ae46befb Thermal: Introduce thermal_zone_trip_update()
>>> > 1b7ddb84 Thermal: Remove tc1/tc2 in generic thermal layer.
>>> > 601f3d424 Thermal: Introduce .get_trend() callback.
>>> > 9d99842f9 Thermal: set upper and lower limits
>>> > 74051ba5 Thermal: Introduce cooling states range support
>>> >
>>> > When I get time, I'll try to rebase those commits onto the IDR commit
>>> > and see if I can get a better bisect. Any insights into the problem
>>> > would be appreciated, thanks.
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>

2013-05-15 09:47:12

by Sonny Rao

[permalink] [raw]
Subject: Re: Fans at full speed after resume

On Tue, May 14, 2013 at 9:56 PM, Sonny Rao <[email protected]> wrote:
> On Tue, May 14, 2013 at 9:34 PM, Sonny Rao <[email protected]> wrote:
>> On Tue, May 14, 2013 at 9:29 PM, Zhang Rui <[email protected]> wrote:
>>> On Wed, 2013-05-15 at 12:26 +0800, Zhang Rui wrote:
>>>> please
>>>>
>>>> On Tue, 2013-05-14 at 21:18 -0700, Sonny Rao wrote:
>>>> > Hi, I've seen a regression in kernels since 3.7 on x86 devices where
>>>> > the kernel turns the system fans on to max speed after resuming from
>>>> > ram. Other people have noticed it as well, for example see
>>>> > https://bugzilla.redhat.com/show_bug.cgi?id=895276
>>>> >
>>>> please check if this is a duplicate of bug
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=56591
>>> or you can try 3.10-rc1 to see if the problem still exists or not.
>>
>> Ok, I patched in the fix from that bugzilla --
>> 928c5edbe6f7cb0d1c71bc2353d091bc5b114fe3
>> but I'm still seeing the issue, I'll try 3.10-rc1 next
>>
>
> 3.10-rc1 seems good
> 3.9.2 is okay, though fans do seem to be on more for a while after
> resume, it eventually turns off
> 3.8.13 seems to still be broken, with fans at maximum
>

So, I did a reverse bisect between 3.9 and 3.9.1 and found that the
commit you mentioned does indeed fix the problem on 3.9, and I
double-checked that it doesn't seem to be fixed on 3.8.13. So, I made
a 3.8.13 version of this debug patch in the bugzilla entry
https://bugzilla.kernel.org/attachment.cgi?id=98671

and I never see the thermal_cdev_update getting called for cdev 0 or
cdev 1, yet they are set to 1 after resume. Perhaps something else is
enabling them?

>>>
>>> thanks,
>>> rui
>>>> > For example on the Samsung 550 Chromebook, we have one thermal zone
>>>> > and have 5 cooling_devices, 0-4, which correspond to 5 possible fan
>>>> > speeds. Under typical idle, only cooling_device4 and maybe
>>>> > cooling_device3 are active, depending on temperature:
>>>> >
>>>> > cat /sys/class/thermal/cooling_device[01234]/cur_state
>>>> > /sys/class/thermal/thermal_zone0/temp
>>>> > 0
>>>> > 0
>>>> > 0
>>>> > 0
>>>> > 1
>>>> > 57000
>>>> >
>>>> > however after a suspend/resume, we see that cooling_devices 0 and 1
>>>> > become active:
>>>> > cat /sys/class/thermal/cooling_device[01234]/cur_state
>>>> > /sys/class/thermal/thermal_zone0/temp
>>>> > 1
>>>> > 1
>>>> > 0
>>>> > 0
>>>> > 1
>>>> > 54000
>>>> >
>>>> > and it seems to stay that way, even though the temperature is low
>>>> > enough that the fan shouldn't be running at that speed. If I manually
>>>> > disable cooling_devices 0 and 1 then fan control works normally again.
>>>> >
>>>> > I started bisecting it and was able to do so up until this commit:
>>>> > commit 29b19e250434c6193c8b8e4c34c9c6284dd4f101
>>>> > Merge: 125c4c7 c072fed
>>>> > Author: Len Brown <[email protected]>
>>>> > AuthorDate: Tue Oct 9 01:35:52 2012 -0400
>>>> > Commit: Len Brown <[email protected]>
>>>> > CommitDate: Tue Oct 9 01:35:52 2012 -0400
>>>> >
>>>> > Merge branch 'release' of
>>>> > git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux into
>>>> > thermal
>>>> >
>>>> > unfortunately, I'm not able to successfully do a suspend/resume on the
>>>> > commits in that merge, so I wasn't able to bisect down to the exact
>>>> > commit.
>>>> >
>>>> > I did confirm that one parent of the merge is okay: commit
>>>> > 125c4c706b680c7831f0966ff873c1ad0354ec25 idr: rename MAX_LEVEL to
>>>> > MAX_IDR_LEVEL
>>>> >
>>>> > so I think it falls somewhere in this list of commits:
>>>> > c072fed95c9855a920c114d7fa3351f0f54ea06e...e3f25e6e5836c4790fbe395ff42e241f372d859d
>>>> >
>>>> > c072fed9 thermal: Exynos: Fix NULL pointer dereference in
>>>> > exynos_unregister_thermal()
>>>> > a4b6fec9 Thermal: Fix bug on cpu_cooling, cooling device's id conflict problem.
>>>> > 79e093c3 thermal: exynos: Use devm_* functions
>>>> > 17be868e ARM: exynos: add thermal sensor driver platform data support
>>>> > 7e0b55e6 thermal: exynos: register the tmu sensor with the kernel thermal layer
>>>> > f22d9c03c thermal: exynos5: add exynos5250 thermal sensor driver support
>>>> > c48cbba6 hwmon: exynos4: move thermal sensor driver to driver/thermal directory
>>>> > 02361418 thermal: add generic cpufreq cooling implementation
>>>> > a7a3b8c8 Fix a build error.
>>>> > 204dd1d3 thermal: Fix potential NULL pointer accesses
>>>> > 1e426ffdd thermal: add Renesas R-Car thermal sensor support
>>>> > 79a49168 thermal: fix potential out-of-bounds memory access
>>>> > f4a821ce6 Thermal: Introduce locking for cdev.thermal_instances list.
>>>> > 908b9fb79 Thermal: Unify the code for both active and passive cooling
>>>> > ce119f832 Thermal: Introduce simple arbitrator for setting device cooling state
>>>> > b5e4ae62 Thermal: List thermal_instance in thermal_cooling_device.
>>>> > cddf31b3b Thermal: Rename thermal_instance.node to thermal_instance.tz_node.
>>>> > 2d374139 Thermal: Rename thermal_zone_device.cooling_devices
>>>> > b81b6ba3 Thermal: rename structure thermal_cooling_device_instance to
>>>> > thermal_instance
>>>> > 4ae46befb Thermal: Introduce thermal_zone_trip_update()
>>>> > 1b7ddb84 Thermal: Remove tc1/tc2 in generic thermal layer.
>>>> > 601f3d424 Thermal: Introduce .get_trend() callback.
>>>> > 9d99842f9 Thermal: set upper and lower limits
>>>> > 74051ba5 Thermal: Introduce cooling states range support
>>>> >
>>>> > When I get time, I'll try to rebase those commits onto the IDR commit
>>>> > and see if I can get a better bisect. Any insights into the problem
>>>> > would be appreciated, thanks.
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>

2013-05-22 18:46:12

by Michael Großhäuser

[permalink] [raw]
Subject: Re: Fans at full speed after resume

Zhang Rui wrote:

> On Wed, 2013-05-15 at 12:26 +0800, Zhang Rui wrote:
>> please
>>
>> On Tue, 2013-05-14 at 21:18 -0700, Sonny Rao wrote:
>> > Hi, I've seen a regression in kernels since 3.7 on x86 devices where
>> > the kernel turns the system fans on to max speed after resuming from
>> > ram. Other people have noticed it as well, for example see
>> > https://bugzilla.redhat.com/show_bug.cgi?id=895276
>> >
>> please check if this is a duplicate of bug
>> https://bugzilla.kernel.org/show_bug.cgi?id=56591
> or you can try 3.10-rc1 to see if the problem still exists or not.
>
> thanks,
> rui


Hi,

I can confirm the same problem on a HP/Compaq 2510p with 3.10-rc2.

Best Regards,
Michael

2013-05-23 02:25:13

by Zhang, Rui

[permalink] [raw]
Subject: Re: Fans at full speed after resume

On Wed, 2013-05-22 at 20:46 +0200, Michael Großhäuser wrote:
> Zhang Rui wrote:
>
> > On Wed, 2013-05-15 at 12:26 +0800, Zhang Rui wrote:
> >> please
> >>
> >> On Tue, 2013-05-14 at 21:18 -0700, Sonny Rao wrote:
> >> > Hi, I've seen a regression in kernels since 3.7 on x86 devices where
> >> > the kernel turns the system fans on to max speed after resuming from
> >> > ram. Other people have noticed it as well, for example see
> >> > https://bugzilla.redhat.com/show_bug.cgi?id=895276
> >> >
> >> please check if this is a duplicate of bug
> >> https://bugzilla.kernel.org/show_bug.cgi?id=56591
> > or you can try 3.10-rc1 to see if the problem still exists or not.
> >
> > thanks,
> > rui
>
>
> Hi,
>
> I can confirm the same problem on a HP/Compaq 2510p with 3.10-rc2.
>
please check if this is a duplicate of bug
https://bugzilla.kernel.org/show_bug.cgi?id=58301

please try the patch at comment #7 and see if it helps or not.

thanks,
rui
> Best Regards,
> Michael
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-05-23 20:21:06

by Michael Großhäuser

[permalink] [raw]
Subject: Re: Fans at full speed after resume

Zhang Rui wrote:

> On Wed, 2013-05-22 at 20:46 +0200, Michael Großhäuser wrote:
>> Zhang Rui wrote:
>>
>> > On Wed, 2013-05-15 at 12:26 +0800, Zhang Rui wrote:
>> >> please
>> >>
>> >> On Tue, 2013-05-14 at 21:18 -0700, Sonny Rao wrote:
>> >> > Hi, I've seen a regression in kernels since 3.7 on x86 devices where
>> >> > the kernel turns the system fans on to max speed after resuming from
>> >> > ram. Other people have noticed it as well, for example see
>> >> > https://bugzilla.redhat.com/show_bug.cgi?id=895276
>> >> >
>> >> please check if this is a duplicate of bug
>> >> https://bugzilla.kernel.org/show_bug.cgi?id=56591
>> > or you can try 3.10-rc1 to see if the problem still exists or not.
>> >
>> > thanks,
>> > rui
>>
>>
>> Hi,
>>
>> I can confirm the same problem on a HP/Compaq 2510p with 3.10-rc2.
>>
> please check if this is a duplicate of bug
> https://bugzilla.kernel.org/show_bug.cgi?id=58301
>
> please try the patch at comment #7 and see if it helps or not.
>
> thanks,
> rui

Hi,

the patch fixes the problem for me.

Thanks,
Michael


>> Best Regards,
>> Michael
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-05-27 05:04:56

by Zhang, Rui

[permalink] [raw]
Subject: Re: Fans at full speed after resume

On Thu, 2013-05-23 at 22:22 +0200, Michael Großhäuser wrote:
> Zhang Rui wrote:
>
> > On Wed, 2013-05-22 at 20:46 +0200, Michael Großhäuser wrote:
> >> Zhang Rui wrote:
> >>
> >> > On Wed, 2013-05-15 at 12:26 +0800, Zhang Rui wrote:
> >> >> please
> >> >>
> >> >> On Tue, 2013-05-14 at 21:18 -0700, Sonny Rao wrote:
> >> >> > Hi, I've seen a regression in kernels since 3.7 on x86 devices where
> >> >> > the kernel turns the system fans on to max speed after resuming from
> >> >> > ram. Other people have noticed it as well, for example see
> >> >> > https://bugzilla.redhat.com/show_bug.cgi?id=895276
> >> >> >
> >> >> please check if this is a duplicate of bug
> >> >> https://bugzilla.kernel.org/show_bug.cgi?id=56591
> >> > or you can try 3.10-rc1 to see if the problem still exists or not.
> >> >
> >> > thanks,
> >> > rui
> >>
> >>
> >> Hi,
> >>
> >> I can confirm the same problem on a HP/Compaq 2510p with 3.10-rc2.
> >>
> > please check if this is a duplicate of bug
> > https://bugzilla.kernel.org/show_bug.cgi?id=58301
> >
> > please try the patch at comment #7 and see if it helps or not.
> >
Good to know.
then could you please try the four patches in comment #10, #11, #12 and
#13 in
https://bugzilla.kernel.org/show_bug.cgi?id=58301
without the previous patch and check if they help?

These four patches are the fixes that I would push for upstream.

thanks,
rui
> > thanks,
> > rui
>
> Hi,
>
> the patch fixes the problem for me.
>
> Thanks,
> Michael
>
>
> >> Best Regards,
> >> Michael
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html