2006-03-15 06:17:06

by Luming Yu

[permalink] [raw]
Subject: RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]

>> Could you just comment out _TMP in kernel or in DSDT,
>
>I think it needs both excisions: If I comment out just the kernel _TMP
>calls, the DSDT might slip one in through the interpreter. If I
>comment out just the DSDT _TMP calls, then the kernel can still call
>_TMP. So instead I modified acpi_evaluate_integer() to return 27 C
>(3000 dK) if it's ever asked for a temperature, without doing any
>actual work:
>
>--- utils.c.orig 2006-02-27 00:09:35.000000000 -0500
>+++ utils.c 2006-03-14 23:36:59.000000000 -0500
>@@ -270,7 +270,15 @@ acpi_evaluate_integer(acpi_handle handle
> memset(element, 0, sizeof(union acpi_object));
> buffer.length = sizeof(union acpi_object);
> buffer.pointer = element;
>- status = acpi_evaluate_object(handle, pathname, arguments, &buffer);
>+ if (strcmp(pathname, "_TMP") != 0)
>+ status = acpi_evaluate_object(handle, pathname,
>arguments, &buffer);
>+ else {
>+ printk(KERN_INFO PREFIX "acpi_evaluate_integer: Faking _TMP\n");
>+ status = AE_OK;
>+ element->type = ACPI_TYPE_INTEGER;
>+ element->integer.value = 3000; /* 27 C, in deciKelvins */
>+ }
>+
> if (ACPI_FAILURE(status)) {
> acpi_util_eval_error(handle, pathname, status);
> return_ACPI_STATUS(status);
>
>This diff is in addition to the previous debugging changes to
>thermal.c.

If you do it in this way, all thermal zone's _TMP will be faked.
If you remove the real THM0._TMP, and fake a dummy THM0._TMP
in DSDT, and don't change anything in kernel, then if S3 works
well, I will be convinced that THM0._TMP was causing trouble.
Yes, I'm asking you to override DSDT for debugging. :-)
But, please make sure don't change other things in DSDT, otherwise
it still won't be trusted. :-)

Anyway, I'm studying THM0._TMP, and try to figure out how it is related
with EC.

Thanks,
Luming


2006-03-15 06:36:01

by Sanjoy Mahajan

[permalink] [raw]
Subject: Re: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]

> If you do it in this way, all thermal zone's _TMP will be faked.

Loading 'thermal' with zone_to_keep=0 meant that it skipped THM{2,6,7}
(the only other zones). But only THM0 was loaded, so any path that
included, say, THM2._TMP wouldn't get executed because of lines like:

if (!tz)
return_VALUE(-EINVAL);

Plus the dmesgs show all cases when _TMP was faked (each fakery
produces a printk). In the experiment with zone_to_keep=0, the only
cases were with THM0.

> If you remove the real THM0._TMP, and fake a dummy THM0._TMP in
> DSDT, and don't change anything in kernel, then if S3 works well, I
> will be convinced that THM0._TMP was causing trouble.

I'll try it, to test my theory above! But one clarification first: Do
you mean that I use a vanilla thermal.c, or should I keep using the
modified thermal.c with zone_to_keep=0 as the module parameter? I
don't think I revert to the vanilla thermal.c. Suppose that there are
two bugs, which I think is likely (see previous email). Commenting
out only THM0._TMP but preserving everything else in the DSDT & kernel
might eliminate any bug caused by THM0._TMP. But if it still hangs --
and I'm pretty sure it will -- it means there's a another bug
somewhere else.

Here's why I'm sure it will hang. When I commented out all
evaluations of _TMP (modifying utils.c), but used a vanilla thermal.c,
it still hung. And commenting out all _TMP's means I commented out
THM0._TMP. So vanilla thermal.c + no THM0._TMP should hang too.

> Ok, Let's change the way of hacking. Let's start bisection without
> touching kernel, instead with DSDT.

No problem I think.

> Firstly, you need to find out which THM.

The zone_to_keep=0 tests show that THM0 causes a problem, don't they?
Other zones may also cause a problem, but THM0 can do it all alone.

> Then, which Methods.

The test that hung on the first S3 sleep, with zone_to_keep=0 and
bisect_get_info=1, shows that just THM0._TMP can cause a problem --
since no other methods got executed.

As with figuring out which zones cause problems, other methods may
also cause the problem. So I want to make sure I use a bisection
method that will work even if there is more than one bug, whether in
multiple zones or in multiple methods in the same zone.

-Sanjoy

`Never underestimate the evil of which men of power are capable.'
--Bertrand Russell, _War Crimes in Vietnam_, chapter 1.