2007-12-12 09:12:54

by Ingo Molnar

[permalink] [raw]
Subject: [crash] kernel BUG at drivers/cpufreq/cpufreq.c:1060!


2.6.24-rc4-git5, got this cpufreq crash on x86 64-bit, during 'make
randconfig' random bootup testing:

powernow-k8: BIOS error - no PSB or ACPI _PSS objects
------------[ cut here ]------------
kernel BUG at drivers/cpufreq/cpufreq.c:1060!
invalid opcode: 0000 [1] SMP
[...]
RIP: 0010:[<ffffffff8056021a>] [<ffffffff8056021a>] cpufreq_remove_dev+0x160/0x2ad
[...]
Call Trace:
[<ffffffff80455c03>] sysdev_driver_unregister+0x53/0x8a
[<ffffffff8055f635>] cpufreq_register_driver+0x148/0x188
[<ffffffff808436cb>] kernel_init+0x14b/0x318
[<ffffffff8020ce78>] child_rip+0xa/0x12
[<ffffffff80843580>] kernel_init+0x0/0x318
[<ffffffff8020ce6e>] child_rip+0x0/0x12

kernel is 2.6.24-rc4-git5-ish + x86.git. (but no cpufreq changes to the
upstream code) crashlog and config attached. Will try with vanilla
-latest as well.

Ingo


Attachments:
(No filename) (833.00 B)
crash.log (147.69 kB)
config (43.84 kB)
Download all attachments

2007-12-12 09:57:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: [crash] kernel BUG at drivers/cpufreq/cpufreq.c:1060!


* Ingo Molnar <[email protected]> wrote:

> 2.6.24-rc4-git5, got this cpufreq crash on x86 64-bit, during 'make
> randconfig' random bootup testing:

hm, does not seem to be easily reproducible. I tried 10 bootups and 2 of
them failed.

Ingo

2007-12-12 10:21:54

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [crash] kernel BUG at drivers/cpufreq/cpufreq.c:1060!

On 12/12/07, Ingo Molnar <[email protected]> wrote:
>
> 2.6.24-rc4-git5, got this cpufreq crash on x86 64-bit, during 'make
> randconfig' random bootup testing:

Ingo, since you already scripted this, maybe you can add
"modprobe everything/rmmod everything" test after successful bootup.
It will catch amazing amount of stuff, I promise.

Ditto for modprobe/rmmod/modprobe and modprobe/rmmod/cat /proc,
cat /sys smoke tests.

2007-12-12 10:35:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: [crash] kernel BUG at drivers/cpufreq/cpufreq.c:1060!


* Alexey Dobriyan <[email protected]> wrote:

> On 12/12/07, Ingo Molnar <[email protected]> wrote:
> >
> > 2.6.24-rc4-git5, got this cpufreq crash on x86 64-bit, during 'make
> > randconfig' random bootup testing:
>
> Ingo, since you already scripted this, maybe you can add "modprobe
> everything/rmmod everything" test after successful bootup. It will
> catch amazing amount of stuff, I promise.

something close to that is one of my standard tests: booting up an
allyesconfig kernel.

Ingo

2007-12-12 16:41:07

by Dave Jones

[permalink] [raw]
Subject: Re: [crash] kernel BUG at drivers/cpufreq/cpufreq.c:1060!

On Wed, Dec 12, 2007 at 10:11:44AM +0100, Ingo Molnar wrote:

> 2.6.24-rc4-git5, got this cpufreq crash on x86 64-bit, during 'make
> randconfig' random bootup testing:

You hit all the fun bugs.

Just before we initialise cpufreqs notifier list..

> Testing NMI watchdog ... <4>WARNING: CPU#0: NMI appears to be stuck (0->0)!

eek?

> powernow-k8: Found 1 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ processors (1 cpu cores) (version 2.20.00)
> powernow-k8: BIOS error - no PSB or ACPI _PSS objects
> ------------[ cut here ]------------
> kernel BUG at drivers/cpufreq/cpufreq.c:1060!

The actual BUG you hit is

if (unlikely(lock_policy_rwsem_write(cpu)))
BUG();

It _looks_ like we're leaking a refcount on that lock, but
I don't see where. It's a shame you can't reproduce this easily,
as cpufreq.debug=7 would give us more clues.
(And CONFIG_CPUFREQ_DEBUG=y)

I'll think about this some more.

Dave

--
http://www.codemonkey.org.uk

2007-12-12 18:19:31

by Dave Jones

[permalink] [raw]
Subject: Re: [crash] kernel BUG at drivers/cpufreq/cpufreq.c:1060!

On Wed, Dec 12, 2007 at 11:40:13AM -0500, Dave Jones wrote:

> > powernow-k8: Found 1 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ processors (1 cpu cores) (version 2.20.00)
> > powernow-k8: BIOS error - no PSB or ACPI _PSS objects
> > ------------[ cut here ]------------
> > kernel BUG at drivers/cpufreq/cpufreq.c:1060!
>
> The actual BUG you hit is
>
> if (unlikely(lock_policy_rwsem_write(cpu)))
> BUG();
>
> It _looks_ like we're leaking a refcount on that lock, but
> I don't see where. It's a shame you can't reproduce this easily,
> as cpufreq.debug=7 would give us more clues.
> (And CONFIG_CPUFREQ_DEBUG=y)

So we're missing some unlocks in some error paths.
It's feasible you hit one of those.
This patch should be the fix for that.

Dave

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 5e626b1..79581fa 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -841,19 +841,25 @@ static int cpufreq_add_dev (struct sys_device * sys_dev)
drv_attr = cpufreq_driver->attr;
while ((drv_attr) && (*drv_attr)) {
ret = sysfs_create_file(&policy->kobj, &((*drv_attr)->attr));
- if (ret)
+ if (ret) {
+ unlock_policy_rwsem_write(cpu);
goto err_out_driver_exit;
+ }
drv_attr++;
}
if (cpufreq_driver->get){
ret = sysfs_create_file(&policy->kobj, &cpuinfo_cur_freq.attr);
- if (ret)
+ if (ret) {
+ unlock_policy_rwsem_write(cpu);
goto err_out_driver_exit;
+ }
}
if (cpufreq_driver->target){
ret = sysfs_create_file(&policy->kobj, &scaling_cur_freq.attr);
- if (ret)
+ if (ret) {
+ unlock_policy_rwsem_write(cpu);
goto err_out_driver_exit;
+ }
}

spin_lock_irqsave(&cpufreq_driver_lock, flags);
--
http://www.codemonkey.org.uk

2007-12-13 10:17:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: [crash] kernel BUG at drivers/cpufreq/cpufreq.c:1060!


* Dave Jones <[email protected]> wrote:

> > It _looks_ like we're leaking a refcount on that lock, but I don't
> > see where. It's a shame you can't reproduce this easily, as
> > cpufreq.debug=7 would give us more clues. (And
> > CONFIG_CPUFREQ_DEBUG=y)
>
> So we're missing some unlocks in some error paths. It's feasible you
> hit one of those. This patch should be the fix for that.

since it's not really reproducible (i failed to get it since then), how
about you push your fix upstream (it's an obviously correct fix), we
consider this regression fixed and i'll re-notify you if there's still
any problem left. It's not like there's any escape from make randconfig
bootup test coverage in the long run ;-)

Ingo

2007-12-13 17:54:44

by Dave Jones

[permalink] [raw]
Subject: Re: [crash] kernel BUG at drivers/cpufreq/cpufreq.c:1060!

On Thu, Dec 13, 2007 at 11:17:11AM +0100, Ingo Molnar wrote:
>
> * Dave Jones <[email protected]> wrote:
>
> > > It _looks_ like we're leaking a refcount on that lock, but I don't
> > > see where. It's a shame you can't reproduce this easily, as
> > > cpufreq.debug=7 would give us more clues. (And
> > > CONFIG_CPUFREQ_DEBUG=y)
> >
> > So we're missing some unlocks in some error paths. It's feasible you
> > hit one of those. This patch should be the fix for that.
>
> since it's not really reproducible (i failed to get it since then), how
> about you push your fix upstream (it's an obviously correct fix), we
> consider this regression fixed and i'll re-notify you if there's still
> any problem left. It's not like there's any escape from make randconfig
> bootup test coverage in the long run ;-)

Yeah, will push it to Linus today.

Dave

--
http://www.codemonkey.org.uk