2004-01-28 02:19:43

by Alessandro Suardi

[permalink] [raw]
Subject: 2.6.2-rc2-bk1 oopses on boot (ACPI patch)

Already reported, but I'll do so once again, since it looks like
in a short while I won't be able to boot official kernels in my
current config...

Original report here:

http://www.ussg.iu.edu/hypermail/linux/kernel/0312.3/0442.html

Please advise whether I should give up cpufreq for now - I really
don't want to bang my head against a wall.


Thanks in advance,

--alessandro

"Two rivers run too deep
The seasons change and so do I"
(U2, "Indian Summer Sky")


2004-01-28 02:47:10

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch)

Alessandro Suardi <[email protected]> wrote:
>
> Already reported, but I'll do so once again, since it looks like
> in a short while I won't be able to boot official kernels in my
> current config...
>
> Original report here:
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0312.3/0442.html

Divide by zero. Looks like ACPI is now passing bad values into the
frequency change notifier.

Does this make the oops go away?

diff -puN drivers/cpufreq/cpufreq.c~cpufreq-workaround drivers/cpufreq/cpufreq.c
--- 25/drivers/cpufreq/cpufreq.c~cpufreq-workaround 2004-01-27 18:36:05.000000000 -0800
+++ 25-akpm/drivers/cpufreq/cpufreq.c 2004-01-27 18:36:42.000000000 -0800
@@ -928,6 +928,11 @@ void cpufreq_notify_transition(struct cp
return; /* Only valid if we're in the resume process where
* everyone knows what CPU frequency we are at */

+ if (freqs->new == 0) {
+ printk("%s: avoiding div-by-zero\n", __FUNCTION__);
+ return;
+ }
+
down_read(&cpufreq_notifier_rwsem);
switch (state) {
case CPUFREQ_PRECHANGE:

_

2004-01-28 03:06:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch)



On Wed, 28 Jan 2004, Alessandro Suardi wrote:
>
> Already reported, but I'll do so once again, since it looks like
> in a short while I won't be able to boot official kernels in my
> current config...
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0312.3/0442.html

Can you make adjust_jiffies() print out its arguments (it's in
drivers/cpufreq/cpufreq.c).

It looks like cpufreq_scale() gets a divide-by-zero or an overflow on one
of

l_p_j_ref, l_p_j_ref_freq, ci->new

and just printing out those values would be interesting.

That said, the code is crap anyway. It does various divides without
actually testing for any sanity at all, and tries to "avoid overflow" by
totally bogus methods, instead of just using the 64-bit do_div64().

Dominic? Dave? Suggestions about nicer failure modes?

Linus

2004-01-28 03:12:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch)



On Tue, 27 Jan 2004, Andrew Morton wrote:
>
> Divide by zero. Looks like ACPI is now passing bad values into the
> frequency change notifier.
>
> Does this make the oops go away?

Other values will still cause divide-by-zero (any divisor in 0..9 will do
it). Besides, we're dividing with _old_, not new, so that's the one we
should likely check.

Linus

2004-01-28 03:20:14

by Alessandro Suardi

[permalink] [raw]
Subject: Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch)

Linus Torvalds wrote:
>
> On Tue, 27 Jan 2004, Andrew Morton wrote:
>
>>Divide by zero. Looks like ACPI is now passing bad values into the
>>frequency change notifier.
>>
>>Does this make the oops go away?
>
>
> Other values will still cause divide-by-zero (any divisor in 0..9 will do
> it). Besides, we're dividing with _old_, not new, so that's the one we
> should likely check.
>
> Linus

Indeed... I get two of the debug printks from the patch, but in the
end I still oops due to a div-by-zero with EIP in time_cpufreq_notifier.

I'll try and look into Linus' suggestion about printing out stuff from
adjust_jiffies() in cpufreq.c and will report later.


Thanks,

--alessandro

"Two rivers run too deep
The seasons change and so do I"
(U2, "Indian Summer Sky")

2004-01-28 03:47:32

by Alessandro Suardi

[permalink] [raw]
Subject: Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch)

Linus Torvalds wrote:
>
> On Wed, 28 Jan 2004, Alessandro Suardi wrote:
>
>>Already reported, but I'll do so once again, since it looks like
>> in a short while I won't be able to boot official kernels in my
>> current config...
>>
>> http://www.ussg.iu.edu/hypermail/linux/kernel/0312.3/0442.html
>
>
> Can you make adjust_jiffies() print out its arguments (it's in
> drivers/cpufreq/cpufreq.c).
>
> It looks like cpufreq_scale() gets a divide-by-zero or an overflow on one
> of
>
> l_p_j_ref, l_p_j_ref_freq, ci->new
>
> and just printing out those values would be interesting.

Assuming the late hour (hmm, early by now) hasn't crossed my
eyes entirely the three above entities are %lu, %u, %u... so
this line

printk("CPUFREQ DEBUG: [%lu] [%u] [%u]\n", l_p_j_ref, l_p_j_ref_freq, ci->new);

as both first and last instruction in adjust_jiffies() turns
up the same values, which are 1773568, 1, 0.


Side-note, since master penguin is looking... after the oops
all SysRq stuff keeps working - except Alt-SysRq-B; the atkbd.c
code tells me the keyboard says "too many keys pressed". K, T,
P just do their job fine.
(yeah, okay, Alt-SysRq-O prints Power Off but obviously doesn't).


Thanks,

--alessandro

"Two rivers run too deep
The seasons change and so do I"
(U2, "Indian Summer Sky")

2004-01-28 04:38:06

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch)

On Tuesday 27 January 2004 09:42 pm, Andrew Morton wrote:
> Alessandro Suardi <[email protected]> wrote:
> > Already reported, but I'll do so once again, since it looks like
> > in a short while I won't be able to boot official kernels in my
> > current config...
> >
> > Original report here:
> >
> > http://www.ussg.iu.edu/hypermail/linux/kernel/0312.3/0442.html
>
> Divide by zero. Looks like ACPI is now passing bad values into the
> frequency change notifier.

It is a common problem with Dell's DSDT implementation which does not
follow ACPI spec and it's been going on for ages. From the original
report:

cpufreq: CPU0 - ACPI performance management activated
cpufreq: *P0: 1Mhz, 0 mW, 0 uS
cpufreq: P1: 0Mhz, 0 mW, 0 uS
divide error: 0000 [#1]

As you can see all data is bogus... Patching DSDT cures it for sure,
sometimes CONFIG_ACPI_RELAXED_AML helps as well.

I suppose ACPI P-states driver could check frequencies/latencies and
refuse to activate if the are bogus.

--
Dmitry

2004-01-28 13:38:53

by Matt Domsch

[permalink] [raw]
Subject: Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch)

On Tue, Jan 27, 2004 at 11:37:55PM -0500, Dmitry Torokhov wrote:
> > Divide by zero. Looks like ACPI is now passing bad values into the
> > frequency change notifier.
>
> It is a common problem with Dell's DSDT implementation which does not
> follow ACPI spec and it's been going on for ages. From the original
> report:
>
> cpufreq: CPU0 - ACPI performance management activated
> cpufreq: *P0: 1Mhz, 0 mW, 0 uS
> cpufreq: P1: 0Mhz, 0 mW, 0 uS
> divide error: 0000 [#1]
>
> As you can see all data is bogus... Patching DSDT cures it for sure,
> sometimes CONFIG_ACPI_RELAXED_AML helps as well.

Please send me your DSDT and output of dmidecode, and ideally what a
proper DSDT should show in this case (I'm not familiar enough with
what all the various ACPI tables should contain), and I'll take it up
with the BIOS programmers for that platform.

Thanks,
Matt

--
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions linux.dell.com & http://www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

2004-01-28 16:20:26

by Dominik Brodowski

[permalink] [raw]
Subject: Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch)

On Wed, Jan 28, 2004 at 04:40:32AM +0100, Alessandro Suardi wrote:
> printk("CPUFREQ DEBUG: [%lu] [%u] [%u]\n", l_p_j_ref, l_p_j_ref_freq,
> ci->new);
>
> as both first and last instruction in adjust_jiffies() turns
> up the same values, which are 1773568, 1, 0.

The ACPI tables report totally bogus CPU frequencies -- 1 and 0 MHz. I'm
surprised this differs between 2.6.0 and 2.6.x-mm... Len, any idea?


On Tue, Jan 27, 2004 at 07:06:41PM -0800, Linus Torvalds wrote:
>
>
> On Wed, 28 Jan 2004, Alessandro Suardi wrote:
> >
> > Already reported, but I'll do so once again, since it looks like
> > in a short while I won't be able to boot official kernels in my
> > current config...
> >
> > http://www.ussg.iu.edu/hypermail/linux/kernel/0312.3/0442.html
>
> Can you make adjust_jiffies() print out its arguments (it's in
> drivers/cpufreq/cpufreq.c).
>
> It looks like cpufreq_scale() gets a divide-by-zero or an overflow on one
> of
>
> l_p_j_ref, l_p_j_ref_freq, ci->new
>
> and just printing out those values would be interesting.
>
> That said, the code is crap anyway.

> It does various divides without
> actually testing for any sanity at all,

CPUfreq and the CPUfreq timing code _need_ to rely on the CPU frequencies
being reported by the drivers. If they're wrong all timing will be wrong[1]...
Nonetheless, a fix for the acpi driver which aborts on such "zero" MHz
reports has already been sent to Len for reviewal [2].

> and tries to "avoid overflow" by
> totally bogus methods, instead of just using the 64-bit do_div64().

Agreed, will fix it.

Dominik

[1] Especially as the pmtmr also uses tsc for the delay() routines...
[2] http://marc.theaimsgroup.com/?l=acpi4linux&m=107421039607335&w=2


Attachments:
(No filename) (1.69 kB)
(No filename) (189.00 B)
Download all attachments

2004-01-28 22:38:41

by Alessandro Suardi

[permalink] [raw]
Subject: Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch)

Matt Domsch wrote:
> On Tue, Jan 27, 2004 at 11:37:55PM -0500, Dmitry Torokhov wrote:
>
>>>Divide by zero. Looks like ACPI is now passing bad values into the
>>>frequency change notifier.
>>
>>It is a common problem with Dell's DSDT implementation which does not
>>follow ACPI spec and it's been going on for ages. From the original
>>report:
>>
>>cpufreq: CPU0 - ACPI performance management activated
>> cpufreq: *P0: 1Mhz, 0 mW, 0 uS
>> cpufreq: P1: 0Mhz, 0 mW, 0 uS
>> divide error: 0000 [#1]
>>
>>As you can see all data is bogus... Patching DSDT cures it for sure,
>>sometimes CONFIG_ACPI_RELAXED_AML helps as well.
>
>
> Please send me your DSDT and output of dmidecode, and ideally what a
> proper DSDT should show in this case (I'm not familiar enough with
> what all the various ACPI tables should contain), and I'll take it up
> with the BIOS programmers for that platform.

While appreciating your offer, I'd like to remind that this works
perfectly prior to the 20031203 ACPI patch. Indeed, this is what
2.6.1 vanilla says in that area:

cpufreq: CPU0 - ACPI performance management activated.
cpufreq: *P0: 1800 MHz, 0 mW, 250 uS
cpufreq: P1: 1200 MHz, 0 mW, 250 uS

Attaching the gzipped dmesg for my 2.6.1 boot - let me know if
you want anyway dmidecode output and DSDT; for this latter I'll
have to ask for instructions (or is the output of a simple
'cat /proc/acpi/dsdt' enough ?).

--alessandro

"Two rivers run too deep
The seasons change and so do I"
(U2, "Indian Summer Sky")


Attachments:
dmesg.out.gz (4.38 kB)

2004-01-29 23:31:54

by Brown, Len

[permalink] [raw]
Subject: Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch)

Alessandro,
Looks like you've identifed a regression, probably in ACPI.

Please test the 1st patch attached to this bug report
http://bugzilla.kernel.org/show_bug.cgi?id=1766

If it doesn't address the problem, please file an additional bug report
per below.

thanks,
-Len

ps.
The divide-by zero symptom should be addressed by Dominik's update, now
in the ACPI tree and thus the next -mm patch.

pps.
How to file a bug against ACPI:

http://bugzilla.kernel.org/ Category: Power Management, Component: ACPI

Please attach dmesg -s40000 output (or serial console log if dmesg
unavailable)

Please attach the output from acpidmp, available in /usr/sbin/, or in
pmtools:
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/

On Wed, 2004-01-28 at 17:32, Alessandro Suardi wrote:
> Matt Domsch wrote:
> > On Tue, Jan 27, 2004 at 11:37:55PM -0500, Dmitry Torokhov wrote:
> >
> >>>Divide by zero. Looks like ACPI is now passing bad values into the
> >>>frequency change notifier.
...
> , I'd like to remind that this works
> perfectly prior to the 20031203 ACPI patch. Indeed, this is what
> 2.6.1 vanilla says in that area:
>
> cpufreq: CPU0 - ACPI performance management activated.
> cpufreq: *P0: 1800 MHz, 0 mW, 250 uS
> cpufreq: P1: 1200 MHz, 0 mW, 250 uS
>
> Attaching the gzipped dmesg for my 2.6.1 boot - let me know if
> you want anyway dmidecode output and DSDT; for this latter I'll
> have to ask for instructions (or is the output of a simple
> 'cat /proc/acpi/dsdt' enough ?).
>
> --alessandro
>
> "Two rivers run too deep
> The seasons change and so do I"
> (U2, "Indian Summer Sky")
>
>

2004-01-30 00:38:34

by Alessandro Suardi

[permalink] [raw]
Subject: Re: 2.6.2-rc2-bk1 oopses on boot (ACPI patch)

Len Brown wrote:
> Alessandro,
> Looks like you've identifed a regression, probably in ACPI.
>
> Please test the 1st patch attached to this bug report
> http://bugzilla.kernel.org/show_bug.cgi?id=1766

The patch you mention fixes my problem - tested over 2.6.2-rc2-bk3.

> If it doesn't address the problem, please file an additional bug report
> per below.

Thanks for the instructions, I really appreciate.


Keep up the great work ! Ciao,

--alessandro

"Two rivers run too deep
The seasons change and so do I"
(U2, "Indian Summer Sky")