>Bad news. It hangs when I do the usual stress test:
Hmm, we can continue to have fun with debugging. Right?
>
>echo 1 > THM0/polling_frequency
>sleep.sh
>sleep.sh
>
>The second sleep.sh hangs going to sleep. It is in an endless loop
>printing the following line, once per second (from the
>polling_frequency):
>
> Execute Method: [\_TZ_.THM0._TMP] (Node c157bf88)
This should be the diffient problem with the previous reported hang.
I recall it was hang at a loop in SMPI waiting for BIOS's response.
Please confirm, Also please mute THM0 polling.
>
>> Please also make sure you have vanilla DSDT
>
>$ grep DSDT /boot/config-2.6.16-rc5.fake-thermal_active+passive
># CONFIG_ACPI_CUSTOM_DSDT is not set
>
>> vanilla Kernel, and just hacked acpi_thermal_active/passive.
>
>Only diff between pristine 2.6.16-rc5 tree and mine is:
>
>diff -rup /tmp/linux-2.6.16-rc5/drivers/acpi/thermal.c
>/usr/src/linux-2.6.16-rc5/drivers/acpi/thermal.c
>--- /tmp/linux-2.6.16-rc5/drivers/acpi/thermal.c
>2006-02-27 00:09:35.000000000 -0500
>+++ /usr/src/linux-2.6.16-rc5/drivers/acpi/thermal.c
>2006-03-16 09:45:30.000000000 -0500
>@@ -526,6 +526,8 @@ static void acpi_thermal_passive(struct
>
> ACPI_FUNCTION_TRACE("acpi_thermal_passive");
>
>+ return;
>+
> if (!tz || !tz->trips.passive.flags.valid)
> return;
>
>@@ -615,6 +617,8 @@ static void acpi_thermal_active(struct a
>
> ACPI_FUNCTION_TRACE("acpi_thermal_active");
>
>+ return;
>+
> if (!tz)
> return;
>
>
This looks ok for debugging.
> Hmm, we can continue to have fun with debugging. Right?
Definitely, I haven't given up.
>> The second sleep.sh hangs going to sleep. It is in an endless loop
>> printing the following line, once per second (from the
>> polling_frequency):
>>
>> Execute Method: [\_TZ_.THM0._TMP] (Node c157bf88)
I don't think these lines are a problem. They just reflect that
thermal polling is happening once per second. So even though the ACPI
system is hanging in the SMPI loop (as you say below), it is alive
enough to poll the temperature sensors.
> Also please mute THM0 polling.
I retested the hacked kernel (with faked thermal_active/passive)
but with no thermal polling, just doing
cat THM*/polling_frequency (they were all 'polling disabled')
sleep.sh (works)
sleep.sh (hangs in the usual SMPI loop)
and it hangs as usual.
> This should be the different problem from the previous reported hang.
> I recall it was hanging at a loop in SMPI waiting for BIOS's response.
> Please confirm,
I just retested vanilla 2.6.16-rc5 (vanilla kernel, vanilla DSDT),
with polling_interval=1 (second). My earlier tests with that kernel
had polling_interval=100, and the easiest way to reproduce the hang
was:
echo 100 > THM0/polling_interval
modprobe -r thermal ; modprobe thermal
sleep.sh (this hangs)
With this method, the system would hang on the *first* sleep cycle.
The other method to produce the hang, with thermal polling muted, was:
echo 0 > THM0/polling_interval (and the rest of them, to make sure)
sleep.sh (it comes back)
sleep.sh (this one hangs)
I tried the same method but with 1 second instead of 100 seconds:
echo 1 > THM0/polling_interval
sleep.sh (this one works, maybe because I didn't do the modprobing)
sleep.sh (this hangs)
The second sleep.sh hangs in the usual loop, which produces the
ex-region etc. loop, but interspersed in that dmesg output
is the output from the thermal polling. So I also see
Execute Method: [\_TZ_.THM0._TMP] (Node c157bf88)
plus its associated function traces (ec_intr_write or something like
that -- I saved all the log files).
One other point is that we haven't yet used a piece of information:
that the system never hangs if I boot with ec_intr=0. Actually,
that's why I tried commenting out the \_SB.PCI0.ISA0.EC0.UPDT () line
in _TMP method, and it did 'solve' the problem (at least, it did with
AC0 faked -- I haven't tried keeping AC0 but taking out just that
line).
-Sanjoy
`Never underestimate the evil of which men of power are capable.'
--Bertrand Russell, _War Crimes in Vietnam_, chapter 1.