2.6.24-rc4-git5, got this cpufreq crash on x86 64-bit, during 'make
randconfig' random bootup testing:
powernow-k8: BIOS error - no PSB or ACPI _PSS objects
------------[ cut here ]------------
kernel BUG at drivers/cpufreq/cpufreq.c:1060!
invalid opcode: 0000 [1] SMP
[...]
RIP: 0010:[<ffffffff8056021a>] [<ffffffff8056021a>] cpufreq_remove_dev+0x160/0x2ad
[...]
Call Trace:
[<ffffffff80455c03>] sysdev_driver_unregister+0x53/0x8a
[<ffffffff8055f635>] cpufreq_register_driver+0x148/0x188
[<ffffffff808436cb>] kernel_init+0x14b/0x318
[<ffffffff8020ce78>] child_rip+0xa/0x12
[<ffffffff80843580>] kernel_init+0x0/0x318
[<ffffffff8020ce6e>] child_rip+0x0/0x12
kernel is 2.6.24-rc4-git5-ish + x86.git. (but no cpufreq changes to the
upstream code) crashlog and config attached. Will try with vanilla
-latest as well.
Ingo
* Ingo Molnar <[email protected]> wrote:
> 2.6.24-rc4-git5, got this cpufreq crash on x86 64-bit, during 'make
> randconfig' random bootup testing:
hm, does not seem to be easily reproducible. I tried 10 bootups and 2 of
them failed.
Ingo
On 12/12/07, Ingo Molnar <[email protected]> wrote:
>
> 2.6.24-rc4-git5, got this cpufreq crash on x86 64-bit, during 'make
> randconfig' random bootup testing:
Ingo, since you already scripted this, maybe you can add
"modprobe everything/rmmod everything" test after successful bootup.
It will catch amazing amount of stuff, I promise.
Ditto for modprobe/rmmod/modprobe and modprobe/rmmod/cat /proc,
cat /sys smoke tests.
* Alexey Dobriyan <[email protected]> wrote:
> On 12/12/07, Ingo Molnar <[email protected]> wrote:
> >
> > 2.6.24-rc4-git5, got this cpufreq crash on x86 64-bit, during 'make
> > randconfig' random bootup testing:
>
> Ingo, since you already scripted this, maybe you can add "modprobe
> everything/rmmod everything" test after successful bootup. It will
> catch amazing amount of stuff, I promise.
something close to that is one of my standard tests: booting up an
allyesconfig kernel.
Ingo
On Wed, Dec 12, 2007 at 10:11:44AM +0100, Ingo Molnar wrote:
> 2.6.24-rc4-git5, got this cpufreq crash on x86 64-bit, during 'make
> randconfig' random bootup testing:
You hit all the fun bugs.
Just before we initialise cpufreqs notifier list..
> Testing NMI watchdog ... <4>WARNING: CPU#0: NMI appears to be stuck (0->0)!
eek?
> powernow-k8: Found 1 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ processors (1 cpu cores) (version 2.20.00)
> powernow-k8: BIOS error - no PSB or ACPI _PSS objects
> ------------[ cut here ]------------
> kernel BUG at drivers/cpufreq/cpufreq.c:1060!
The actual BUG you hit is
if (unlikely(lock_policy_rwsem_write(cpu)))
BUG();
It _looks_ like we're leaking a refcount on that lock, but
I don't see where. It's a shame you can't reproduce this easily,
as cpufreq.debug=7 would give us more clues.
(And CONFIG_CPUFREQ_DEBUG=y)
I'll think about this some more.
Dave
--
http://www.codemonkey.org.uk
On Wed, Dec 12, 2007 at 11:40:13AM -0500, Dave Jones wrote:
> > powernow-k8: Found 1 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ processors (1 cpu cores) (version 2.20.00)
> > powernow-k8: BIOS error - no PSB or ACPI _PSS objects
> > ------------[ cut here ]------------
> > kernel BUG at drivers/cpufreq/cpufreq.c:1060!
>
> The actual BUG you hit is
>
> if (unlikely(lock_policy_rwsem_write(cpu)))
> BUG();
>
> It _looks_ like we're leaking a refcount on that lock, but
> I don't see where. It's a shame you can't reproduce this easily,
> as cpufreq.debug=7 would give us more clues.
> (And CONFIG_CPUFREQ_DEBUG=y)
So we're missing some unlocks in some error paths.
It's feasible you hit one of those.
This patch should be the fix for that.
Dave
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 5e626b1..79581fa 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -841,19 +841,25 @@ static int cpufreq_add_dev (struct sys_device * sys_dev)
drv_attr = cpufreq_driver->attr;
while ((drv_attr) && (*drv_attr)) {
ret = sysfs_create_file(&policy->kobj, &((*drv_attr)->attr));
- if (ret)
+ if (ret) {
+ unlock_policy_rwsem_write(cpu);
goto err_out_driver_exit;
+ }
drv_attr++;
}
if (cpufreq_driver->get){
ret = sysfs_create_file(&policy->kobj, &cpuinfo_cur_freq.attr);
- if (ret)
+ if (ret) {
+ unlock_policy_rwsem_write(cpu);
goto err_out_driver_exit;
+ }
}
if (cpufreq_driver->target){
ret = sysfs_create_file(&policy->kobj, &scaling_cur_freq.attr);
- if (ret)
+ if (ret) {
+ unlock_policy_rwsem_write(cpu);
goto err_out_driver_exit;
+ }
}
spin_lock_irqsave(&cpufreq_driver_lock, flags);
--
http://www.codemonkey.org.uk
* Dave Jones <[email protected]> wrote:
> > It _looks_ like we're leaking a refcount on that lock, but I don't
> > see where. It's a shame you can't reproduce this easily, as
> > cpufreq.debug=7 would give us more clues. (And
> > CONFIG_CPUFREQ_DEBUG=y)
>
> So we're missing some unlocks in some error paths. It's feasible you
> hit one of those. This patch should be the fix for that.
since it's not really reproducible (i failed to get it since then), how
about you push your fix upstream (it's an obviously correct fix), we
consider this regression fixed and i'll re-notify you if there's still
any problem left. It's not like there's any escape from make randconfig
bootup test coverage in the long run ;-)
Ingo
On Thu, Dec 13, 2007 at 11:17:11AM +0100, Ingo Molnar wrote:
>
> * Dave Jones <[email protected]> wrote:
>
> > > It _looks_ like we're leaking a refcount on that lock, but I don't
> > > see where. It's a shame you can't reproduce this easily, as
> > > cpufreq.debug=7 would give us more clues. (And
> > > CONFIG_CPUFREQ_DEBUG=y)
> >
> > So we're missing some unlocks in some error paths. It's feasible you
> > hit one of those. This patch should be the fix for that.
>
> since it's not really reproducible (i failed to get it since then), how
> about you push your fix upstream (it's an obviously correct fix), we
> consider this regression fixed and i'll re-notify you if there's still
> any problem left. It's not like there's any escape from make randconfig
> bootup test coverage in the long run ;-)
Yeah, will push it to Linus today.
Dave
--
http://www.codemonkey.org.uk