-tip auto-testing started triggering this spinlock corruption message
yesterday:
[ 3.976213] calling acpi_rtc_init+0x0/0xd3
[ 3.980213] ACPI Exception (utmutex-0263): AE_BAD_PARAMETER, Thread F7C50000 could not acquire Mutex [3] [20080321]
[ 3.992213] BUG: spinlock bad magic on CPU#0, swapper/1
[ 3.992213] lock: c2508dc4, .magic: 00000000, .owner: swapper/1, .owner_cpu: 0
[ 3.992213] Pid: 1, comm: swapper Not tainted 2.6.26-rc6-00325-gc0793f7-dirty #8570
[ 3.992213] [<c0521040>] spin_bug+0x7c/0x87
[ 3.992213] [<c0521064>] _raw_spin_unlock+0x19/0x72
[ 3.992213] [<c13ee1f2>] _spin_unlock_irqrestore+0x25/0x55
[ 3.992213] [<c05ae056>] acpi_os_release_lock+0xd/0xf
[ 3.992213] [<c05c3a23>] acpi_set_register+0x214/0x26c
[ 3.992213] [<c05b66b5>] acpi_clear_event+0x5e/0x7f
[ 3.992213] [<c1f58ace>] acpi_rtc_init+0x42/0xd3
[ 3.992213] [<c1f2b645>] kernel_init+0x128/0x269
[ 3.992213] [<c1f58a8c>] ? acpi_rtc_init+0x0/0xd3
[ 3.992213] [<c0513664>] ? trace_hardirqs_on_thunk+0xc/0x10
[ 3.992213] [<c011abf0>] ? restore_nocheck_notrace+0x0/0xe
[ 3.992213] [<c1f2b51d>] ? kernel_init+0x0/0x269
[ 3.992213] [<c1f2b51d>] ? kernel_init+0x0/0x269
[ 3.992213] [<c011b813>] kernel_thread_helper+0x7/0x10
[ 3.992213] =======================
[ 3.996213] initcall acpi_rtc_init+0x0/0xd3 returned 0 after 15 msecs
[ 4.004213] calling acpi_event_init+0x0/0x74
[ 4.008213] initcall acpi_event_init+0x0/0x74 returned 0 after 0 msecs
i have found the AE_BAD_PARAMETER in older logs a well, but the spinlock
corruption is new and nothing in that area is changed by -tip so i
suspect it's a mainline problem as well.
it's not reproducible so it's not bisectable - it happened a few times
sporadically, but on a repeated reboot with the same bzImage it does not
reproduce. The config, full bootlog and bzImage can be found at:
http://redhat.com/~mingo/misc/config-Fri_Jun_20_11_19_25_CEST_2008.bad
http://redhat.com/~mingo/misc/log-Fri_Jun_20_11_19_25_CEST_2008.bad
http://redhat.com/~mingo/misc/bzImage-Fri_Jun_20_11_19_25_CEST_2008.bad
(more info about the system on request.)
NOTE, this is with the following boot parameters:
maxcpus=1 nosmp no_hz=off highres=0 nmi_watchdog=0 noapic nolapic_timer
hpet=disable idle=mwait idle=poll highmem=512m nopat notsc acpi=off
pci=nomsi
the bzImage i've uploaded adds most of those boot parameters
automatically. These boot parameters worked fine in the past.
given that it involves acpi_rtc_init:acpi_rtc_init(), one suspect would
be this change:
| commit e1094bfa26e5e94af2fea79e004614dbce42b008
| Author: Zhao Yakui <[email protected]>
| Date: Wed May 14 11:32:59 2008 +0800
|
| ACPI: Disable Fixed_RTC event when installing RTC handler
given that it's not bisectable, i can only "slow-test" any patch: i.e. i
can apply any patch and report whether the spinlock corruption has
occured after 2 days of continuous testing.
Ingo
Hi,
On Fri, Jun 20, 2008 at 11:52 AM, Ingo Molnar <[email protected]> wrote:
>
> -tip auto-testing started triggering this spinlock corruption message
> yesterday:
>
> [ 3.976213] calling acpi_rtc_init+0x0/0xd3
> [ 3.980213] ACPI Exception (utmutex-0263): AE_BAD_PARAMETER, Thread F7C50000 could not acquire Mutex [3] [20080321]
...
> i have found the AE_BAD_PARAMETER in older logs a well, but the spinlock
> corruption is new and nothing in that area is changed by -tip so i
> suspect it's a mainline problem as well.
It seems that some acpi calls are made before acpi is even
initialized, hence the AE_BAD_PARAMETER (ACPI is trying to use
uninitialized mutexes) -- I think that may be the source of the mutex
corruption as well.
This probably happens because acpi_early_init() (which happens before
all the initcalls; mutex initialization too) returns early:
void __init acpi_early_init(void)
{
acpi_status status = AE_OK;
if (acpi_disabled)
return;
...
I notice that you're booting with acpi=off, so it might be the same
problem. You could try this patch to find other callers that don't
check whether ACPI is available before using ACPI-defined mutexes:
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 235a138..5b34328 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -818,8 +818,7 @@ acpi_status acpi_os_wait_semaphore(acpi_handle handle, u32 u
long jiffies;
int ret = 0;
- if (!sem || (units < 1))
- return AE_BAD_PARAMETER;
+ BUG_ON(!sem || (units < 1));
if (units > 1)
return AE_SUPPORT;
(This will dump the stack instead of printing AE_BAD_PARAMETER in your
dmesg, so this is guaranteed to halt your machine given that you have
at least three of these messages in your log already!)
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
Hi Ingo,
Can you see if this patch solves your problem? There might be other
functions that needs this guard as well, though. I wonder if maybe this test
should just be included at the top of every driver that uses ACPI in some
way. But I'm pretty sure that this lack of initialization is the root of your
problem in any case :-)
(By the way, I don't know why this problem popped up at this time, maybe it
was just bad timing/bad luck... How far back do your AE_BAD_PARAMETER in the
logs go?)
Vegard
From: Vegard Nossum <[email protected]>
Date: Fri, 20 Jun 2008 15:39:09 +0200
Subject: [PATCH] ACPI: don't walk tables if ACPI was disabled
Ingo Molnar wrote:
> -tip auto-testing started triggering this spinlock corruption message
> yesterday:
>
> [ 3.976213] calling acpi_rtc_init+0x0/0xd3
> [ 3.980213] ACPI Exception (utmutex-0263): AE_BAD_PARAMETER, Thread F7C50000 could not acquire Mutex [3] [20080321]
> [ 3.992213] BUG: spinlock bad magic on CPU#0, swapper/1
> [ 3.992213] lock: c2508dc4, .magic: 00000000, .owner: swapper/1, .owner_cpu: 0
This is apparently because some parts of ACPI, including mutexes, are not
initialized when acpi=off is passed to the kernel.
Reported-by: Ingo Molnar <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
---
drivers/acpi/glue.c | 3 +++
drivers/acpi/namespace/nsxfeval.c | 3 +++
2 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/drivers/acpi/glue.c b/drivers/acpi/glue.c
index 2808dc6..9b227d4 100644
--- a/drivers/acpi/glue.c
+++ b/drivers/acpi/glue.c
@@ -333,6 +333,9 @@ static int __init acpi_rtc_init(void)
{
struct device *dev = get_rtc_dev();
+ if (acpi_disabled)
+ return 0;
+
if (dev) {
rtc_wake_setup();
rtc_info.wake_on = rtc_wake_on;
diff --git a/drivers/acpi/namespace/nsxfeval.c b/drivers/acpi/namespace/nsxfeval.c
index a8d5491..c274d1d 100644
--- a/drivers/acpi/namespace/nsxfeval.c
+++ b/drivers/acpi/namespace/nsxfeval.c
@@ -391,6 +391,9 @@ acpi_walk_namespace(acpi_object_type type,
ACPI_FUNCTION_TRACE(acpi_walk_namespace);
+ if (acpi_disabled)
+ return_ACPI_STATUS(AE_NO_NAMESPACE);
+
/* Parameter validation */
if ((type > ACPI_TYPE_LOCAL_MAX) || (!max_depth) || (!user_function)) {
--
1.5.4.1
* Vegard Nossum <[email protected]> wrote:
> Hi Ingo,
>
> Can you see if this patch solves your problem? There might be other
> functions that needs this guard as well, though. I wonder if maybe
> this test should just be included at the top of every driver that uses
> ACPI in some way. But I'm pretty sure that this lack of initialization
> is the root of your problem in any case :-)
applied this to tip/out-of-tree for more testing, thanks Vegard.
> (By the way, I don't know why this problem popped up at this time,
> maybe it was just bad timing/bad luck... How far back do your
> AE_BAD_PARAMETER in the logs go?)
i have hit this warning for the first time in January 2008:
[ 0.000000] Linux version 2.6.24-rc8 (mingo@dione) (gcc version 4.2.2)
#452 SMP Sun Jan 20 23:36:28 CET 2008
and it says:
[ 0.000000] Calling initcall 0xc050758a: acpi_rtc_init+0x0/0xb8()
[ 0.000000] ACPI Exception (utmutex-0263): AE_BAD_PARAMETER,
Thread F7C22000 could not acquire Mutex [3] [20070126]
[ 0.000000] initcall 0xc050758a: acpi_rtc_init+0x0/0xb8() returned 0.
the logs of my auto-tests on this box start at more than a year ago:
Linux version 2.6.21-rc6 (mingo@dione) (gcc version 4.0.2)
#331 SMP Fri Apr 13 10:14:12 CEST 2007
the size of the logs is 16.2 GB, covering the bootup of 58605 uniquely
built kernels performing 67065 bootups - so it's a fairly exhaustive
history.
that's why WARN_ON()s are so important - there's no way my automated
tools (or even i, when taking a casual look at the logs) could have
picked up that new ACPI Exception - if each subsystem has different
warnings (which change frequently) then it's sheer impossible to
automate the answer to the "does that log show any anomaly" question.
( Even delta analysis would be of little use, due to timing related
noise, random data variances and the impact of randconfig booting. )
The only reason i noticed it because this problem escallated into a lock
corruption which triggered a WARN_ON().
Ingo
On Fri, 20 Jun 2008, Vegard Nossum wrote:
> Hi Ingo,
>
> Can you see if this patch solves your problem? There might be other
> functions that needs this guard as well, though. I wonder if maybe this test
> should just be included at the top of every driver that uses ACPI in some
> way. But I'm pretty sure that this lack of initialization is the root of your
> problem in any case :-)
>
> (By the way, I don't know why this problem popped up at this time, maybe it
> was just bad timing/bad luck... How far back do your AE_BAD_PARAMETER in the
> logs go?)
>
>
> Vegard
>
>
> From: Vegard Nossum <[email protected]>
> Date: Fri, 20 Jun 2008 15:39:09 +0200
> Subject: [PATCH] ACPI: don't walk tables if ACPI was disabled
>
> Ingo Molnar wrote:
> > -tip auto-testing started triggering this spinlock corruption message
> > yesterday:
> >
> > [ 3.976213] calling acpi_rtc_init+0x0/0xd3
> > [ 3.980213] ACPI Exception (utmutex-0263): AE_BAD_PARAMETER, Thread F7C50000 could not acquire Mutex [3] [20080321]
> > [ 3.992213] BUG: spinlock bad magic on CPU#0, swapper/1
> > [ 3.992213] lock: c2508dc4, .magic: 00000000, .owner: swapper/1, .owner_cpu: 0
>
> This is apparently because some parts of ACPI, including mutexes, are not
> initialized when acpi=off is passed to the kernel.
>
> Reported-by: Ingo Molnar <[email protected]>
> Signed-off-by: Vegard Nossum <[email protected]>
> ---
> drivers/acpi/glue.c | 3 +++
> drivers/acpi/namespace/nsxfeval.c | 3 +++
> 2 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/acpi/glue.c b/drivers/acpi/glue.c
> index 2808dc6..9b227d4 100644
> --- a/drivers/acpi/glue.c
> +++ b/drivers/acpi/glue.c
> @@ -333,6 +333,9 @@ static int __init acpi_rtc_init(void)
> {
> struct device *dev = get_rtc_dev();
>
> + if (acpi_disabled)
> + return 0;
> +
hmm, i would expect dev to be 0 for acpi=off,
since pnp_match would fail, no?
> if (dev) {
> rtc_wake_setup();
> rtc_info.wake_on = rtc_wake_on;
> diff --git a/drivers/acpi/namespace/nsxfeval.c b/drivers/acpi/namespace/nsxfeval.c
> index a8d5491..c274d1d 100644
> --- a/drivers/acpi/namespace/nsxfeval.c
> +++ b/drivers/acpi/namespace/nsxfeval.c
> @@ -391,6 +391,9 @@ acpi_walk_namespace(acpi_object_type type,
>
> ACPI_FUNCTION_TRACE(acpi_walk_namespace);
>
> + if (acpi_disabled)
> + return_ACPI_STATUS(AE_NO_NAMESPACE);
> +
We should probably BUG_ON(acpi_disabled) here,
since it takes a programming bug to hit this case.
-Len
> /* Parameter validation */
>
> if ((type > ACPI_TYPE_LOCAL_MAX) || (!max_depth) || (!user_function)) {
> --
> 1.5.4.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Fri, Jun 20, 2008 at 9:00 PM, Len Brown <[email protected]> wrote:
>> @@ -333,6 +333,9 @@ static int __init acpi_rtc_init(void)
>> {
>> struct device *dev = get_rtc_dev();
>>
>> + if (acpi_disabled)
>> + return 0;
>> +
>
> hmm, i would expect dev to be 0 for acpi=off,
> since pnp_match would fail, no?
Obviously not. Because Ingo is booting with acpi=off and he still gets
a warning about some mutex operation that originates from this very
initcall:
[ 3.976213] calling acpi_rtc_init+0x0/0xd3
[ 3.980213] ACPI Exception (utmutex-0263): AE_BAD_PARAMETER, Thread
F7C50000 could not acquire Mutex [3] [20080321]
This function seems to do the discovery of the rtc device:
static int __init pnpacpi_init(void)
{
if (acpi_disabled || pnpacpi_disabled) {
pnp_info("PnP ACPI: disabled");
return 0;
}
...
subsys_initcall(pnpacpi_init);
So we have these functions:
1. acpi_early_init() - happens before any initcall. This would
initialize mutexes, but simply returns if acpi_disabled.
2. pnpacpi_init() - subsys_initcall. Does the initial discovery of
pnpacpi devices, but simply returns if acpi_disabled.
3. acpi_init() - subsys_initcall. Simply returns if acpi_disabled.
4. acpi_rtc_init() - fs_initcall (after subsys_initcall).
So I don't know. We also know that things like dock_init() are wrong
for sure (with attached patch):
------------[ cut here ]------------
WARNING: at drivers/acpi/osl.c:821 acpi_os_wait_semaphore+0x21/0xf0()
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.26-rc6-00161-g952f4a0-dirty #20
...
[<c0222762>] acpi_os_wait_semaphore+0x21/0xf0
[<c023ce2a>] acpi_ut_acquire_mutex+0x5e/0xc4
[<c0233778>] acpi_walk_namespace+0x24/0x60
[<c05d005f>] dock_init+0x0/0x48
[<c05d008f>] dock_init+0x30/0x48
[<c0242eb3>] find_dock+0x0/0x2f0
[<c05bb422>] kernel_init+0x120/0x254
...
ACPI Exception (utmutex-0263): AE_BAD_PARAMETER, Thread C7820000 could
not acquire Mutex [1] [20080321]
I guess Ingo should revert my bogus changes to acpi_rtc_init() and use
your suggestion of a BUG() (or maybe just WARN()) to catch the
backtrace so we can figure out how it gets there when acpi_disabled.
The only thing I can think of now is that pnp_match() is doing
something wrong. It returns 1 when it shouldn't, or something. But I
can't really spot it :-(
Vegard
diff --git a/drivers/acpi/namespace/nsxfeval.c b/drivers/acpi/namespace/nsxfeval
index a8d5491..7c444de 100644
--- a/drivers/acpi/namespace/nsxfeval.c
+++ b/drivers/acpi/namespace/nsxfeval.c
@@ -556,6 +556,8 @@ acpi_get_devices(const char *HID,
ACPI_FUNCTION_TRACE(acpi_get_devices);
+ WARN_ON(acpi_disabled);
+
/* Parameter validation */
if (!user_function) {
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 235a138..ca661e7 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -818,6 +818,8 @@ acpi_status acpi_os_wait_semaphore(acpi_handle handle, u32 u
long jiffies;
int ret = 0;
+ WARN_ON(acpi_disabled);
+
if (!sem || (units < 1))
return AE_BAD_PARAMETER;
On Fri, Jun 20, 2008 at 10:40 PM, Vegard Nossum <[email protected]> wrote:
> On Fri, Jun 20, 2008 at 9:00 PM, Len Brown <[email protected]> wrote:
>>> @@ -333,6 +333,9 @@ static int __init acpi_rtc_init(void)
>>> {
>>> struct device *dev = get_rtc_dev();
>>>
>>> + if (acpi_disabled)
>>> + return 0;
>>> +
>>
>> hmm, i would expect dev to be 0 for acpi=off,
>> since pnp_match would fail, no?
>
> Obviously not. Because Ingo is booting with acpi=off and he still gets
> a warning about some mutex operation that originates from this very
> initcall:
>
> [ 3.976213] calling acpi_rtc_init+0x0/0xd3
> [ 3.980213] ACPI Exception (utmutex-0263): AE_BAD_PARAMETER, Thread
> F7C50000 could not acquire Mutex [3] [20080321]
>
>
Aha. The device in question (PNP0b00) is discovered while doing
pnpbios_init(), NOT pnpacpi_init:
[ 1.712032] initcall pnpacpi_init+0x0/0x80 returned 0 after 3 msecs
[ 1.716032] calling pnpbios_init+0x0/0x322
[ 1.720032] PnPBIOS: Scanning system for PnP BIOS support...
[ 1.724032] PnPBIOS: Found PnP BIOS installation structure at 0xc00fc550
[ 1.728032] PnPBIOS: PnP BIOS version 1.0, entry 0xf0000:0xc580, dseg 0xf0000
...
[ 1.876032] pnp 00:03: parse allocated resources
[ 1.880032] pnp 00:03: add irq 8 flags 0x0
[ 1.884032] pnp 00:03: add io 0x70-0x71 flags 0x0
[ 1.888032] pnp 00:03: parse resource options
[ 1.892032] pnp 00:03: new independent option
[ 1.896032] device: '00:03': device_add
[ 1.900032] bus: 'pnp': add device 00:03
[ 1.904032] PM: Adding info for pnp:00:03
[ 1.908032] pnp 00:03: Plug and Play BIOS device, IDs PNP0b00 (active)
So I guess this function, pnpbios_init() needs the check as well. In
fact, it has this:
#ifdef CONFIG_PNPACPI
if (!acpi_disabled && !pnpacpi_disabled) {
pnpbios_disabled = 1;
printk(KERN_INFO "PnPBIOS: Disabled by ACPI PNP\n");
return -ENODEV;
}
#endif /* CONFIG_ACPI */
...I guess that should be changed to say if (acpi_disabled ||
pnpacpi_disabled)? Or... I don't understand the purpose of the
original test. But it seems to be there since the beginning of time
(or, well, v2.6.12-rc2).
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
On Fri, Jun 20, 2008 at 11:27 PM, Vegard Nossum <[email protected]> wrote:
> So I guess this function, pnpbios_init() needs the check as well. In
> fact, it has this:
>
> #ifdef CONFIG_PNPACPI
> if (!acpi_disabled && !pnpacpi_disabled) {
> pnpbios_disabled = 1;
> printk(KERN_INFO "PnPBIOS: Disabled by ACPI PNP\n");
> return -ENODEV;
> }
> #endif /* CONFIG_ACPI */
>
> ...I guess that should be changed to say if (acpi_disabled ||
> pnpacpi_disabled)? Or... I don't understand the purpose of the
> original test. But it seems to be there since the beginning of time
> (or, well, v2.6.12-rc2).
Nope. I found the introduction of the change in the historical git repository:
commit 4723ebe898a32262ed49fe677897ccea47e72ff4
Author: Adam Belay <[email protected]>
Date: Sun Oct 24 15:07:32 2004 -0400
[PNPBIOS] disable if ACPI is active
As further ACPI pnp functionaility is implemented it is no longer
safe to run ACPI and PNPBIOS concurrently.
We therefore take the following approach:
- attempt to enable ACPI support
- if ACPI fails (blacklist etc.) enable pnpbios support
- if ACPI support is not compiled in the kernel enable pnpbios support
Signed-off-by: Adam Belay <[email protected]>
and now I understand the purpose of the check; pnpbios does not depend
on ACPI; ACPI/pnpacpi is incompatible with pnpbios.
Yet it remains a fact that pnpbios will discover devices which then
ACPI code uses erroneously. Which means that my original fix for Ingo
probably is the right one after all. Should I submit another patch
which does the right thing for everything under drivers/acpi/, or can
you do it on your own? :-)
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
* Vegard Nossum <[email protected]> wrote:
> On Fri, Jun 20, 2008 at 11:27 PM, Vegard Nossum <[email protected]> wrote:
> > So I guess this function, pnpbios_init() needs the check as well. In
> > fact, it has this:
> >
> > #ifdef CONFIG_PNPACPI
> > if (!acpi_disabled && !pnpacpi_disabled) {
> > pnpbios_disabled = 1;
> > printk(KERN_INFO "PnPBIOS: Disabled by ACPI PNP\n");
> > return -ENODEV;
> > }
> > #endif /* CONFIG_ACPI */
> >
> > ...I guess that should be changed to say if (acpi_disabled ||
> > pnpacpi_disabled)? Or... I don't understand the purpose of the
> > original test. But it seems to be there since the beginning of time
> > (or, well, v2.6.12-rc2).
>
> Nope. I found the introduction of the change in the historical git repository:
>
> commit 4723ebe898a32262ed49fe677897ccea47e72ff4
> Author: Adam Belay <[email protected]>
> Date: Sun Oct 24 15:07:32 2004 -0400
>
> [PNPBIOS] disable if ACPI is active
>
> As further ACPI pnp functionaility is implemented it is no longer
> safe to run ACPI and PNPBIOS concurrently.
>
> We therefore take the following approach:
> - attempt to enable ACPI support
> - if ACPI fails (blacklist etc.) enable pnpbios support
> - if ACPI support is not compiled in the kernel enable pnpbios support
>
> Signed-off-by: Adam Belay <[email protected]>
>
> and now I understand the purpose of the check; pnpbios does not depend
> on ACPI; ACPI/pnpacpi is incompatible with pnpbios.
wow, rather old bug - i guess lockdep made it more visible.
> Yet it remains a fact that pnpbios will discover devices which then
> ACPI code uses erroneously. Which means that my original fix for Ingo
> probably is the right one after all. Should I submit another patch
> which does the right thing for everything under drivers/acpi/, or can
> you do it on your own? :-)
i havent seen the warning reappear with your fix after thousands of
bootups - so i guess we can consider it fixed.
Len, please consider the patch below. (it's in tip/out-of-tree)
Ingo
----------------->
commit acc85833791a5d8f84b8df601afc1cc44568dd18
Author: Vegard Nossum <[email protected]>
Date: Fri Jun 20 15:56:40 2008 +0200
ACPI: don't walk tables if ACPI was disabled
Ingo Molnar wrote:
> -tip auto-testing started triggering this spinlock corruption message
> yesterday:
>
> [ 3.976213] calling acpi_rtc_init+0x0/0xd3
> [ 3.980213] ACPI Exception (utmutex-0263): AE_BAD_PARAMETER, Thread F7C50000 could not acquire Mutex [3] [20080321]
> [ 3.992213] BUG: spinlock bad magic on CPU#0, swapper/1
> [ 3.992213] lock: c2508dc4, .magic: 00000000, .owner: swapper/1, .owner_cpu: 0
This is apparently because some parts of ACPI, including mutexes, are not
initialized when acpi=off is passed to the kernel.
Reported-by: Ingo Molnar <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Zhao Yakui <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Alexey Starikovskiy <[email protected]>
Cc: Yinghai Lu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
diff --git a/drivers/acpi/glue.c b/drivers/acpi/glue.c
index b4d4ce0..c3e1eeb 100644
--- a/drivers/acpi/glue.c
+++ b/drivers/acpi/glue.c
@@ -334,6 +334,9 @@ static int __init acpi_rtc_init(void)
{
struct device *dev = get_rtc_dev();
+ if (acpi_disabled)
+ return 0;
+
if (dev) {
rtc_wake_setup();
rtc_info.wake_on = rtc_wake_on;
diff --git a/drivers/acpi/namespace/nsxfeval.c b/drivers/acpi/namespace/nsxfeval.c
index a8d5491..c274d1d 100644
--- a/drivers/acpi/namespace/nsxfeval.c
+++ b/drivers/acpi/namespace/nsxfeval.c
@@ -391,6 +391,9 @@ acpi_walk_namespace(acpi_object_type type,
ACPI_FUNCTION_TRACE(acpi_walk_namespace);
+ if (acpi_disabled)
+ return_ACPI_STATUS(AE_NO_NAMESPACE);
+
/* Parameter validation */
if ((type > ACPI_TYPE_LOCAL_MAX) || (!max_depth) || (!user_function)) {
On 6/24/08, Ingo Molnar <[email protected]> wrote:
> i havent seen the warning reappear with your fix after thousands of
> bootups - so i guess we can consider it fixed.
>
> Len, please consider the patch below. (it's in tip/out-of-tree)
No, please don't :-)
It fixes your particular case (the acpi_rtc_init() hunk of the patch),
but the acpi_walk_namespace() part should be changed to a WARN(). But
that is likely to cause a lot of "spurious" reports, so the other acpi
drivers should be fixed as well.
I was waiting for Len to reply, but maybe I can submit the proper fix too.
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
On Tuesday 24 June 2008 5:52:55 am Vegard Nossum wrote:
> On 6/24/08, Ingo Molnar <[email protected]> wrote:
> > i havent seen the warning reappear with your fix after thousands of
> > bootups - so i guess we can consider it fixed.
> >
> > Len, please consider the patch below. (it's in tip/out-of-tree)
>
> No, please don't :-)
>
> It fixes your particular case (the acpi_rtc_init() hunk of the patch),
> but the acpi_walk_namespace() part should be changed to a WARN(). But
> that is likely to cause a lot of "spurious" reports, so the other acpi
> drivers should be fixed as well.
>
> I was waiting for Len to reply, but maybe I can submit the proper fix too.
I haven't looked into the acpi_walk_namespace() question.
But I did look at the acpi_rtc_init() stuff, and I have to say it
looks like a really round-about way to accomplish this. I'd rather
just fold in the FADT accesses directly into cmos_do_probe() and
dispense with the stuff in drivers/acpi/glue.c. I'm working on a
couple patches along this line, but it will take me a few days to
finish up.
Bjorn
On Tue, 2008-06-24 at 13:52 +0200, Vegard Nossum wrote:
> On 6/24/08, Ingo Molnar <[email protected]> wrote:
> > i havent seen the warning reappear with your fix after thousands of
> > bootups - so i guess we can consider it fixed.
> >
> > Len, please consider the patch below. (it's in tip/out-of-tree)
>
> No, please don't :-)
>
> It fixes your particular case (the acpi_rtc_init() hunk of the patch),
> but the acpi_walk_namespace() part should be changed to a WARN(). But
> that is likely to cause a lot of "spurious" reports, so the other acpi
> drivers should be fixed as well.
In fact this issue is related with the following factors:
a. when acpi is disabled, OS won't initialize the ACPI mutex, which
is accessed by many ACPI interface functions. For example:
acpi_walk_namespace, acpi_install_fixed_event_handler.
b. When acpi is disabled, some drivers will call the ACPI interface
functions. For example: The acpi_walk_namespace is called in
dock_init/bay_init. The acpi_install_fixed_event_handler is called in
the acpi_rtc_init.
Although the following warning message is harmless, it is still
confusing.
> [ 3.980213] ACPI Exception (utmutex-0263): AE_BAD_PARAMETER,
Thread F7C50000 could not acquire Mutex [3] [20080321]
Maybe it is reasonable that the flag of acpi_disabled should be used
in the drivers using ACPI interface functions. Of course it will be safe
to use the flag of acpi_disabled in the function of acpi_walk_namespace.
Thanks.
Yakui.
> I was waiting for Len to reply, but maybe I can submit the proper fix too.
>
>
> Vegard
>
From: Len Brown <[email protected]>
Signed-off-by: Len Brown <[email protected]>
---
> i have hit this warning for the first time in January 2008:
...
> that's why WARN_ON()s are so important - there's no way my automated
> tools (or even i, when taking a casual look at the logs) could have
> picked up that new ACPI Exception - if each subsystem has different
> warnings (which change frequently) then it's sheer impossible to
> automate the answer to the "does that log show any anomaly" question.
Maybe something like this would help?
(totally untested)
-Len
diff --git a/drivers/acpi/utilities/utmisc.c b/drivers/acpi/utilities/utmisc.c
index 1f057b7..d53e82c 100644
--- a/drivers/acpi/utilities/utmisc.c
+++ b/drivers/acpi/utilities/utmisc.c
@@ -1028,6 +1028,7 @@ acpi_ut_error(char *module_name, u32 line_number, char *format, ...)
{
va_list args;
+ warn_on_slowpath(module_name, line_number);
acpi_os_printf("ACPI Error (%s-%04d): ", module_name, line_number);
va_start(args, format);
@@ -1042,6 +1043,7 @@ acpi_ut_exception(char *module_name,
{
va_list args;
+ warn_on_slowpath(module_name, line_number);
acpi_os_printf("ACPI Exception (%s-%04d): %s, ", module_name,
line_number, acpi_format_exception(status));
@@ -1058,6 +1060,7 @@ acpi_ut_warning(char *module_name, u32 line_number, char *format, ...)
{
va_list args;
+ warn_on_slowpath(module_name, line_number);
acpi_os_printf("ACPI Warning (%s-%04d): ", module_name, line_number);
va_start(args, format);
On Tue, 24 Jun 2008, Ingo Molnar wrote:
> * Vegard Nossum <[email protected]> wrote:
>
> > On Fri, Jun 20, 2008 at 11:27 PM, Vegard Nossum <[email protected]> wrote:
> > > So I guess this function, pnpbios_init() needs the check as well. In
> > > fact, it has this:
> > >
> > > #ifdef CONFIG_PNPACPI
> > > if (!acpi_disabled && !pnpacpi_disabled) {
> > > pnpbios_disabled = 1;
> > > printk(KERN_INFO "PnPBIOS: Disabled by ACPI PNP\n");
> > > return -ENODEV;
> > > }
> > > #endif /* CONFIG_ACPI */
> > >
> > > ...I guess that should be changed to say if (acpi_disabled ||
> > > pnpacpi_disabled)? Or... I don't understand the purpose of the
> > > original test. But it seems to be there since the beginning of time
> > > (or, well, v2.6.12-rc2).
> >
> > Nope. I found the introduction of the change in the historical git repository:
> >
> > commit 4723ebe898a32262ed49fe677897ccea47e72ff4
> > Author: Adam Belay <[email protected]>
> > Date: Sun Oct 24 15:07:32 2004 -0400
> >
> > [PNPBIOS] disable if ACPI is active
> >
> > As further ACPI pnp functionaility is implemented it is no longer
> > safe to run ACPI and PNPBIOS concurrently.
> >
> > We therefore take the following approach:
> > - attempt to enable ACPI support
> > - if ACPI fails (blacklist etc.) enable pnpbios support
> > - if ACPI support is not compiled in the kernel enable pnpbios support
> >
> > Signed-off-by: Adam Belay <[email protected]>
> >
> > and now I understand the purpose of the check; pnpbios does not depend
> > on ACPI; ACPI/pnpacpi is incompatible with pnpbios.
>
> wow, rather old bug - i guess lockdep made it more visible.
No, that commit was not a bug, it was correct, and still is,
for pnpACPI and pnpBIOS must be mutually exclusive.
The thing that changed was the RTC specific code.
-Len
On Tue, 24 Jun 2008 22:34:25 -0400 (EDT)
Len Brown <[email protected]> wrote:
> From: Len Brown <[email protected]>
>
> Signed-off-by: Len Brown <[email protected]>
> ---
> > i have hit this warning for the first time in January 2008:
> ...
> > that's why WARN_ON()s are so important - there's no way my
> > automated tools (or even i, when taking a casual look at the logs)
> > could have picked up that new ACPI Exception - if each subsystem
> > has different warnings (which change frequently) then it's sheer
> > impossible to automate the answer to the "does that log show any
> > anomaly" question.
>
> Maybe something like this would help?
> (totally untested)
>
> -Len
>
> diff --git a/drivers/acpi/utilities/utmisc.c
> b/drivers/acpi/utilities/utmisc.c index 1f057b7..d53e82c 100644
> --- a/drivers/acpi/utilities/utmisc.c
> +++ b/drivers/acpi/utilities/utmisc.c
> @@ -1028,6 +1028,7 @@ acpi_ut_error(char *module_name, u32
> line_number, char *format, ...) {
> va_list args;
>
> + warn_on_slowpath(module_name, line_number);
> acpi_os_printf("ACPI Error (%s-%04d): ", module_name,
> line_number);
Please do not use warn_on_slowpath() directly!
Please use WARN_ON() or WARN(). The later has printk semantics so this
could turn into
WARN(1, "ACPI Error (%s-%0d): \n", module_name, line_number);
and kerneloops.org and various other tools will pick up the printk like
arguments as part of the warning message.
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
> Please do not use warn_on_slowpath() directly!
> Please use WARN_ON() or WARN(). The later has printk semantics so this
> could turn into
>
> WARN(1, "ACPI Error (%s-%0d): \n", module_name, line_number);
>
> and kerneloops.org and various other tools will pick up the printk like
> arguments as part of the warning message.
cool, will do.
thanks,
-Len
* Len Brown <[email protected]> wrote:
> > > and now I understand the purpose of the check; pnpbios does not
> > > depend on ACPI; ACPI/pnpacpi is incompatible with pnpbios.
> >
> > wow, rather old bug - i guess lockdep made it more visible.
>
> No, that commit was not a bug, it was correct, and still is, for
> pnpACPI and pnpBIOS must be mutually exclusive.
>
> The thing that changed was the RTC specific code.
ok. Vegard's patch/workaround got my test setup going, i'll drop it if
something better comes along. (tip/out-of-tree commits dont get
propagated into linux-next so it will not interfere with anything)
Ingo
On Tuesday 24 June 2008 07:37:37 pm Zhao Yakui wrote:
> On Tue, 2008-06-24 at 13:52 +0200, Vegard Nossum wrote:
> > On 6/24/08, Ingo Molnar <[email protected]> wrote:
> > > i havent seen the warning reappear with your fix after thousands of
> > > bootups - so i guess we can consider it fixed.
> > >
> > > Len, please consider the patch below. (it's in tip/out-of-tree)
> >
> > No, please don't :-)
> >
> > It fixes your particular case (the acpi_rtc_init() hunk of the patch),
> > but the acpi_walk_namespace() part should be changed to a WARN(). But
> > that is likely to cause a lot of "spurious" reports, so the other acpi
> > drivers should be fixed as well.
> In fact this issue is related with the following factors:
> a. when acpi is disabled, OS won't initialize the ACPI mutex, which
> is accessed by many ACPI interface functions. For example:
> acpi_walk_namespace, acpi_install_fixed_event_handler.
> b. When acpi is disabled, some drivers will call the ACPI interface
> functions. For example: The acpi_walk_namespace is called in
> dock_init/bay_init.
I think most current uses of acpi_walk_namespace() are indications
that the ACPI or PNP core is missing something.
In dock_init() and bay_init(), it's used to bind a driver to a
device. I think it would be better if we could figure out how to
use the usual acpi_bus_register_driver() interface. Actually, it
looks like this is already 90% done: acpi_dock_match() does the
same thing as is_dock(), so it looks like dock_init() could easily
be converted to register as a driver for ACPI_DOCK_HID.
bay_init() looks similar, with acpi_bay_match(), is_ejectable_bay(),
ACPI_BAY_HID, etc.
Other users of acpi_walk_namespace() are often to install notify
handlers to deal with add/remove events. I think these are telling
us that we need to implement the "TBD: Handle device insertion/removal"
pieces in acpi_bus_check_device().
> The acpi_install_fixed_event_handler is called in
> the acpi_rtc_init.
Yes (via rtc_wake_setup()). I think this should be moved into the
RTC driver itself. I have some ideas on how to do this; I'll post
a patch in a few days. But for 2.6.26, I think the minimal fix of
checking acpi_disabled in acpi_rtc_init() is better.
Bjorn
On Wed, 2008-06-25 at 09:08 -0600, Bjorn Helgaas wrote:
> On Tuesday 24 June 2008 07:37:37 pm Zhao Yakui wrote:
> > On Tue, 2008-06-24 at 13:52 +0200, Vegard Nossum wrote:
> > > On 6/24/08, Ingo Molnar <[email protected]> wrote:
> > > > i havent seen the warning reappear with your fix after thousands of
> > > > bootups - so i guess we can consider it fixed.
> > > >
> > > > Len, please consider the patch below. (it's in tip/out-of-tree)
> > >
> > > No, please don't :-)
> > >
> > > It fixes your particular case (the acpi_rtc_init() hunk of the patch),
> > > but the acpi_walk_namespace() part should be changed to a WARN(). But
> > > that is likely to cause a lot of "spurious" reports, so the other acpi
> > > drivers should be fixed as well.
> > In fact this issue is related with the following factors:
> > a. when acpi is disabled, OS won't initialize the ACPI mutex, which
> > is accessed by many ACPI interface functions. For example:
> > acpi_walk_namespace, acpi_install_fixed_event_handler.
> > b. When acpi is disabled, some drivers will call the ACPI interface
> > functions. For example: The acpi_walk_namespace is called in
> > dock_init/bay_init.
>
> I think most current uses of acpi_walk_namespace() are indications
> that the ACPI or PNP core is missing something.
I don't think so. The acpi_walk_namespace is used to enumerate the ACPI
tree and execute some specific operations. For example: Add the device
notification function for some type of device; call the INI method for
all the device.
> In dock_init() and bay_init(), it's used to bind a driver to a
> device. I think it would be better if we could figure out how to
> use the usual acpi_bus_register_driver() interface. Actually, it
> looks like this is already 90% done: acpi_dock_match() does the
> same thing as is_dock(), so it looks like dock_init() could easily
> be converted to register as a driver for ACPI_DOCK_HID.
Maybe what you said is reasonable if the dock/bay device exists and is
added to Linux ACPI device tree. But if the status of bay/dock device
doesn't exist , it won't be added into the Linux ACPI device tree. In
such case the dock/bay driver won't be loaded for it. So it will be
reasonable to enumerate the acpi tree to install the notification
function for the dock device so that OS can receive the notification
event when the dock device is hotplugged.
If acpi is disabled, it is unnecessary for OS to find the dock/bay
device again. In such case it will be reasonable to use the flag of
acpi_disabled in the function of dock_init/bay_init.
Best regards.
Yakui.
> bay_init() looks similar, with acpi_bay_match(), is_ejectable_bay(),
> ACPI_BAY_HID, etc.
>
> Other users of acpi_walk_namespace() are often to install notify
> handlers to deal with add/remove events. I think these are telling
> us that we need to implement the "TBD: Handle device insertion/removal"
> pieces in acpi_bus_check_device().
>
> > The acpi_install_fixed_event_handler is called in
> > the acpi_rtc_init.
>
> Yes (via rtc_wake_setup()). I think this should be moved into the
> RTC driver itself. I have some ideas on how to do this; I'll post
> a patch in a few days. But for 2.6.26, I think the minimal fix of
> checking acpi_disabled in acpi_rtc_init() is better.
>
> Bjorn
>
>
On Tue, 24 Jun 2008, Len Brown wrote:
>
>
> > Please do not use warn_on_slowpath() directly!
> > Please use WARN_ON() or WARN(). The later has printk semantics so this
> > could turn into
> >
> > WARN(1, "ACPI Error (%s-%0d): \n", module_name, line_number);
>
> >
> > and kerneloops.org and various other tools will pick up the printk like
> > arguments as part of the warning message.
>
> cool, will do.
that is, will do when WARN() actually gets upstream:-)
On Wednesday 25 June 2008 09:02:48 pm Zhao Yakui wrote:
> On Wed, 2008-06-25 at 09:08 -0600, Bjorn Helgaas wrote:
> > On Tuesday 24 June 2008 07:37:37 pm Zhao Yakui wrote:
> > > On Tue, 2008-06-24 at 13:52 +0200, Vegard Nossum wrote:
> > > > On 6/24/08, Ingo Molnar <[email protected]> wrote:
> > > > > i havent seen the warning reappear with your fix after thousands of
> > > > > bootups - so i guess we can consider it fixed.
> > > > >
> > > > > Len, please consider the patch below. (it's in tip/out-of-tree)
> > > >
> > > > No, please don't :-)
> > > >
> > > > It fixes your particular case (the acpi_rtc_init() hunk of the patch),
> > > > but the acpi_walk_namespace() part should be changed to a WARN(). But
> > > > that is likely to cause a lot of "spurious" reports, so the other acpi
> > > > drivers should be fixed as well.
> > > In fact this issue is related with the following factors:
> > > a. when acpi is disabled, OS won't initialize the ACPI mutex, which
> > > is accessed by many ACPI interface functions. For example:
> > > acpi_walk_namespace, acpi_install_fixed_event_handler.
> > > b. When acpi is disabled, some drivers will call the ACPI interface
> > > functions. For example: The acpi_walk_namespace is called in
> > > dock_init/bay_init.
> >
> > I think most current uses of acpi_walk_namespace() are indications
> > that the ACPI or PNP core is missing something.
> I don't think so. The acpi_walk_namespace is used to enumerate the ACPI
> tree and execute some specific operations. For example: Add the device
> notification function for some type of device; call the INI method for
> all the device.
There are exceptions, and obviously acpi_walk_namespace() will be
needed some places.
One example where I think acpi_walk_namespace() should not be used
is to register notification functions for device addition/removal.
I think the ACPI core should be handling those notify events and
turning them into add()/remove() calls to the driver.
> > In dock_init() and bay_init(), it's used to bind a driver to a
> > device. I think it would be better if we could figure out how to
> > use the usual acpi_bus_register_driver() interface. Actually, it
> > looks like this is already 90% done: acpi_dock_match() does the
> > same thing as is_dock(), so it looks like dock_init() could easily
> > be converted to register as a driver for ACPI_DOCK_HID.
> Maybe what you said is reasonable if the dock/bay device exists and is
> added to Linux ACPI device tree. But if the status of bay/dock device
> doesn't exist , it won't be added into the Linux ACPI device tree. In
> such case the dock/bay driver won't be loaded for it. So it will be
> reasonable to enumerate the acpi tree to install the notification
> function for the dock device so that OS can receive the notification
> event when the dock device is hotplugged.
If the bay/dock device doesn't exist, we shouldn't need a driver
for it. The normal scenario for non-ACPI drivers is that we load
a driver when a device appears. That doesn't work very well in
this case because the ACPI core is missing the "TBD: Handle device
insertion/removal" stuff I mentioned earlier.
I know it's not very useful for me to talk about this without
providing any patches, so I'll shut up now.
Bjorn