2006-03-15 01:47:37

by Luming Yu

[permalink] [raw]
Subject: RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]

>
>[I've trimmed non-relevant lists ([email protected],
>[email protected], [email protected],
>[email protected],
>[email protected]) from the CC. Let me know if
>anyone else wants to be trimmed.]
>
>> Could you do bisection to find out which methods or which thermal
>> zone cause trouble? To do that, you have to hack thermal.c by
>> commenting out some calls of evaluating methods below. I hope it is
>> easy for you! :-)
>
>I eventually muddled my way there. The short story is that I can
>reproduce the hang -- on the FIRST S3 cycle -- when the _TMP method is
>called a few times, just for THM0.

Excellent!
Could you just comment out _TMP in kernel or in DSDT,
and do several S3 suspend /resume Cycles without remove thermal
module,
I want to make sure we are at right place to drill down.

Thanks for your testing reports. It's impressive. :-)

--Luming


2006-03-15 05:40:54

by Sanjoy Mahajan

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]

> Could you just comment out _TMP in kernel or in DSDT,

I think it needs both excisions: If I comment out just the kernel _TMP
calls, the DSDT might slip one in through the interpreter. If I
comment out just the DSDT _TMP calls, then the kernel can still call
_TMP. So instead I modified acpi_evaluate_integer() to return 27 C
(3000 dK) if it's ever asked for a temperature, without doing any
actual work:

--- utils.c.orig 2006-02-27 00:09:35.000000000 -0500
+++ utils.c 2006-03-14 23:36:59.000000000 -0500
@@ -270,7 +270,15 @@ acpi_evaluate_integer(acpi_handle handle
memset(element, 0, sizeof(union acpi_object));
buffer.length = sizeof(union acpi_object);
buffer.pointer = element;
- status = acpi_evaluate_object(handle, pathname, arguments, &buffer);
+ if (strcmp(pathname, "_TMP") != 0)
+ status = acpi_evaluate_object(handle, pathname, arguments, &buffer);
+ else {
+ printk(KERN_INFO PREFIX "acpi_evaluate_integer: Faking _TMP\n");
+ status = AE_OK;
+ element->type = ACPI_TYPE_INTEGER;
+ element->integer.value = 3000; /* 27 C, in deciKelvins */
+ }
+
if (ACPI_FAILURE(status)) {
acpi_util_eval_error(handle, pathname, status);
return_ACPI_STATUS(status);

This diff is in addition to the previous debugging changes to
thermal.c.

> and do several S3 suspend /resume Cycles without remove thermal
> module, I want to make sure we are at right place to drill down.

I repeated yesterday's experiments:

echo 100 > /proc/acpi/thermal_zone/THM0/polling_frequency
modprobe -rv thermal
modprobe thermal zone_to_keep=0 bisect_get_info=1
sleep.sh

with the modified utils.c (being careful to install the new kernel and
reboot, not just reinstall modules, since utils.c is part of the acpi
builtins). And, unlike yesterday (when _TMP was unhacked), there was
no hang. Nor did it hang after five more sleep-wake cycles.

Here's are the dmesgs starting when 'thermal' is loaded at boot
(i.e. with the above patch but no special zone_to_keep etc. params),
and then with the commands above:

# during boot
ACPI: thermal_add: THM0
# next line is from the utils.c modification to return 27 C always
ACPI: acpi_evaluate_integer: Faking _TMP
Execute Method: [\_TZ_.THM0._PSV] (Node c157be48)
Execute Method: [\_TZ_.THM0._TC1] (Node c157bdc8)
Execute Method: [\_TZ_.THM0._TC2] (Node c157bd88)
Execute Method: [\_TZ_.THM0._TSP] (Node c157bd48)
Execute Method: [\_TZ_.THM0._AC0] (Node c157bf48)
Execute Method: [\_TZ_.THM0._SCP] (Node c157bec8)
ACPI: acpi_evaluate_integer: Faking _TMP
ACPI: Thermal Zone [THM0] (27 C)
ACPI: thermal_add: THM2
ACPI: acpi_evaluate_integer: Faking _TMP
Execute Method: [\_TZ_.THM2._AC0] (Node c157bb48)
Execute Method: [\_TZ_.THM2._SCP] (Node c157bac8)
ACPI: acpi_evaluate_integer: Faking _TMP
ACPI: Thermal Zone [THM2] (27 C)
ACPI: thermal_add: THM6
ACPI: acpi_evaluate_integer: Faking _TMP
Execute Method: [\_TZ_.THM6._AC0] (Node c157b908)
Execute Method: [\_TZ_.THM6._SCP] (Node c157b888)
ACPI: acpi_evaluate_integer: Faking _TMP
ACPI: Thermal Zone [THM6] (27 C)
ACPI: thermal_add: THM7
ACPI: acpi_evaluate_integer: Faking _TMP
Execute Method: [\_TZ_.THM7._AC0] (Node c157b6c8)
Execute Method: [\_TZ_.THM7._SCP] (Node c157b648)
ACPI: acpi_evaluate_integer: Faking _TMP
ACPI: Thermal Zone [THM7] (27 C)
ACPI: thermal_add: _TZ
ACPI: acpi_evaluate_integer: Faking _TMP

# booting is done. Now for
# echo 100 > /proc/acpi/thermal_zone/THM0/polling_frequency
ACPI: acpi_evaluate_integer: Faking _TMP
# now "modprobe -rv thermal; modprobe thermal zone_to_keep=0 bisect_get_info=1"
ACPI: CPU0 (power states: C1[C1] C2[C2] C3[C3])
ACPI: Processor [CPU0] (supports 8 throttling states)
ACPI: thermal_add: THM0
ACPI: acpi_evaluate_integer: Faking _TMP
ACPI: thermal_get_info: got temperature, but bisect_get_info = 1 so exiting
ACPI: acpi_evaluate_integer: Faking _TMP
ACPI: Thermal Zone [THM0] (27 C)
ACPI: thermal_add: THM2
ACPI: thermal_add: ignoring THM2
ACPI: thermal_add: THM6
ACPI: thermal_add: ignoring THM6
ACPI: thermal_add: THM7
ACPI: thermal_add: ignoring THM7
ACPI: thermal_add: _TZ
ACPI: thermal_add: ignoring _TZ
# now sleep.sh
eth0: removing device
Unloaded prism54 driver
PM: Preparing system for mem sleep
Stopping tasks: =======================================================|
Execute Method: [\_SB_.LID0._PSW] (Node c1564808)
Execute Method: [\_SB_.SLPB._PSW] (Node c1564708)
Execute Method: [\_S3_] (Node c157a988)
Execute Method: [\_PTS] (Node c157ab48)
Execute Method: [\_SI_._SST] (Node c157a8c8)
uhci_hcd 0000:00:07.2: suspend_rh
uhci_hcd 0000:00:07.2: uhci_suspend
uhci_hcd 0000:00:07.2: --> PCI D0/legacy
PM: Entering mem sleep
# hit "Fn" key to wake it up
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Back to C!
PM: Finishing wakeup.
Execute Method: [\_GPE._L0B] (Node c157a848)
PCI: Found IRQ 11 for device 0000:00:02.0
PCI: Sharing IRQ 11 with 0000:00:06.0
PCI: Sharing IRQ 11 with 0000:01:00.0
PCI: Found IRQ 11 for device 0000:00:02.1
uhci_hcd 0000:00:07.2: PCI legacy resume
PCI: Found IRQ 11 for device 0000:00:07.2
uhci_hcd 0000:00:07.2: uhci_resume
uhci_hcd 0000:00:07.2: uhci_check_and_reset_hc: legsup = 0x2000
uhci_hcd 0000:00:07.2: Performing full reset
usb usb1: root hub lost power or was reset
uhci_hcd 0000:00:07.2: suspend_rh
usb usb1: finish resume
uhci_hcd 0000:00:07.2: wakeup_rh
Restarting tasks...<7>hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0000
done
Execute Method: [\_SI_._SST] (Node c157a8c8)
Execute Method: [\_WAK] (Node c157aac8)
Execute Method: [\_TZ_.THM0._PSV] (Node c157be48)
Execute Method: [\_TZ_.THM0._TC1] (Node c157bdc8)
Execute Method: [\_TZ_.THM0._TC2] (Node c157bd88)
Execute Method: [\_TZ_.THM0._TSP] (Node c157bd48)
Execute Method: [\_TZ_.THM0._AC0] (Node c157bf48)
ACPI: acpi_evaluate_integer: Faking _TMP
Execute Method: [\_SI_._SST] (Node c157a8c8)
uhci_hcd 0000:00:07.2: suspend_rh (auto-stop)
Execute Method: [\_SB_.LID0._PSW] (Node c1564808)
Execute Method: [\_SB_.SLPB._PSW] (Node c1564708)
ds: ds_open(socket 0)
ds: ds_open(socket 1)
ds: ds_open(socket 2)
# from explicit 'cardctl eject' in sleep.sh's wake portion (to save battery)
pccard: card ejected from slot 1
PCMCIA: socket e36dac28: *** DANGER *** unable to remove socket power
ds: ds_release(socket 0)
ds: ds_release(socket 1)
ACPI: acpi_evaluate_integer: Faking _TMP

# and I can keep doing 'sleep.sh' with no problem

-Sanjoy

`Never underestimate the evil of which men of power are capable.'
--Bertrand Russell, _War Crimes in Vietnam_, chapter 1.

2006-03-15 05:58:05

by Sanjoy Mahajan

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]

One sad piece of data that I came across, perhaps worth investigating
further after this one is chased down:

As described in the last email, the combination of _TMP fakery (in
utils.c) plus the bisecting version of thermal.c (loading only the
zone THM0 and then only up to bisect_get_info=1) got rid of the hangs.

So I got bold and tried _TMP fakery but with the vanilla thermal.c.
The idea being that if _TMP is to blame for all the problems, then S3
sleep should work fine with this setup. But it hung in the usual way,
on the second sleep. Below are the dmesgs after the usual boot-time
ones.

This experiment produces a hang even with _TMP faked, whereas the
previous experiment didn't (also with _TMP faked but, after the boot,
loading only the THM0 zone and only doing the _TMP methods of it, even
on wake). So one of the non-TMP methods below must be causing a
problem? My suspicion is that it's one of the methods called on wake
(_THM0._PSV or ._TC1, etc. or maybe one of the other zone's methods),
which would explain why the first sleep goes fine but the second one
fails.

I don't think it's any of the calls made when 'thermal' is loading at
boot time, because the same calls happen in the previous experiment.
In that experiment, thermal loads normally (with _TMP faked), and only
after boot do I unload it and replace it with

modprobe thermal zone_to_keep=0 bisect_get_info=1

Anyway, here are the dmesgs for this experiment (hangs on 2nd sleep):

# loading 'thermal' on boot (with vanilla thermal.c, so it loads
# all the thermal zones):
ACPI: acpi_evaluate_integer: Faking _TMP
Execute Method: [\_TZ_.THM0._PSV] (Node c157be48)
Execute Method: [\_TZ_.THM0._TC1] (Node c157bdc8)
Execute Method: [\_TZ_.THM0._TC2] (Node c157bd88)
Execute Method: [\_TZ_.THM0._TSP] (Node c157bd48)
Execute Method: [\_TZ_.THM0._AC0] (Node c157bf48)
Execute Method: [\_TZ_.THM0._SCP] (Node c157bec8)
ACPI: acpi_evaluate_integer: Faking _TMP
ACPI: Thermal Zone [THM0] (27 C)
ACPI: acpi_evaluate_integer: Faking _TMP
Execute Method: [\_TZ_.THM2._AC0] (Node c157bb48)
Execute Method: [\_TZ_.THM2._SCP] (Node c157bac8)
ACPI: acpi_evaluate_integer: Faking _TMP
ACPI: Thermal Zone [THM2] (27 C)
ACPI: acpi_evaluate_integer: Faking _TMP
Execute Method: [\_TZ_.THM6._AC0] (Node c157b908)
Execute Method: [\_TZ_.THM6._SCP] (Node c157b888)
ACPI: acpi_evaluate_integer: Faking _TMP
ACPI: Thermal Zone [THM6] (27 C)
ACPI: acpi_evaluate_integer: Faking _TMP
Execute Method: [\_TZ_.THM7._AC0] (Node c157b6c8)
Execute Method: [\_TZ_.THM7._SCP] (Node c157b648)
ACPI: acpi_evaluate_integer: Faking _TMP
ACPI: Thermal Zone [THM7] (27 C)

# from "echo 100 > THM0/polling_frequency"
ACPI: acpi_evaluate_integer: Faking _TMP
# now doing the 'sleep.sh' script
# though for consistency maybe I should first do
# 'modprobe -r thermal ; modprobe thermal'
eth0: removing device
Unloaded prism54 driver
PM: Preparing system for mem sleep
Stopping tasks: ====================================================|
Execute Method: [\_SB_.LID0._PSW] (Node c1564808)
Execute Method: [\_SB_.SLPB._PSW] (Node c1564708)
Execute Method: [\_S3_] (Node c157a988)
Execute Method: [\_PTS] (Node c157ab48)
Execute Method: [\_SI_._SST] (Node c157a8c8)
uhci_hcd 0000:00:07.2: suspend_rh
uhci_hcd 0000:00:07.2: uhci_suspend
uhci_hcd 0000:00:07.2: --> PCI D0/legacy
PM: Entering mem sleep
# wake it up
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Back to C!
PM: Finishing wakeup.
Execute Method: [\_GPE._L0B] (Node c157a848)
PCI: Found IRQ 11 for device 0000:00:02.0
PCI: Sharing IRQ 11 with 0000:00:06.0
PCI: Sharing IRQ 11 with 0000:01:00.0
PCI: Found IRQ 11 for device 0000:00:02.1
uhci_hcd 0000:00:07.2: PCI legacy resume
PCI: Found IRQ 11 for device 0000:00:07.2
uhci_hcd 0000:00:07.2: uhci_resume
uhci_hcd 0000:00:07.2: uhci_check_and_reset_hc: legsup = 0x2000
uhci_hcd 0000:00:07.2: Performing full reset
usb usb1: root hub lost power or was reset
uhci_hcd 0000:00:07.2: suspend_rh
usb usb1: finish resume
uhci_hcd 0000:00:07.2: wakeup_rh
Restarting tasks...<7>hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0000
done
Execute Method: [\_SI_._SST] (Node c157a8c8)
Execute Method: [\_WAK] (Node c157aac8)
Execute Method: [\_TZ_.THM0._PSV] (Node c157be48)
Execute Method: [\_TZ_.THM0._TC1] (Node c157bdc8)
Execute Method: [\_TZ_.THM0._TC2] (Node c157bd88)
Execute Method: [\_TZ_.THM0._TSP] (Node c157bd48)
Execute Method: [\_TZ_.THM0._AC0] (Node c157bf48)
ACPI: acpi_evaluate_integer: Faking _TMP
Execute Method: [\_TZ_.THM2._AC0] (Node c157bb48)
Execute Method: [\_SI_._SST] (Node c157a8c8)
ACPI: acpi_evaluate_integer: Faking _TMP
Execute Method: [\_TZ_.THM6._AC0] (Node c157b908)
ACPI: acpi_evaluate_integer: Faking _TMP
Execute Method: [\_TZ_.THM7._AC0] (Node c157b6c8)
ACPI: acpi_evaluate_integer: Faking _TMP
uhci_hcd 0000:00:07.2: suspend_rh (auto-stop)
Execute Method: [\_SB_.LID0._PSW] (Node c1564808)
Execute Method: [\_SB_.SLPB._PSW] (Node c1564708)
ds: ds_open(socket 0)
ds: ds_open(socket 1)
ds: ds_open(socket 2)
pccard: card ejected from slot 1
PCMCIA: socket e3003c28: *** DANGER *** unable to remove socket power
ds: ds_release(socket 0)
ds: ds_release(socket 1)
PM: Preparing system for mem sleep

# and it hangs here.

-Sanjoy

`Never underestimate the evil of which men of power are capable.'
--Bertrand Russell, _War Crimes in Vietnam_, chapter 1.