2013-04-20 18:53:42

by Borislav Petkov

[permalink] [raw]
Subject: irq 16: nobody cared

Hi Paul,

so I've been bisecting another issue and have been seeing the warning
in the attached pic. Reverting c0f4dfd4f90f1667d234d21f15153ea09a2eaa66
("rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks") seems
to fix the issue. The warning appears as the last thing on the screen
before the machine goes down after being suspended to disk.

And IRQ 16 and azx_interrupt both belong to hda_intel.c

Any ideas?

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


Attachments:
(No filename) (524.00 B)
CIMG2314.JPG (119.47 kB)
Download all attachments

2013-04-20 23:52:14

by Paul E. McKenney

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Sat, Apr 20, 2013 at 08:53:30PM +0200, Borislav Petkov wrote:
> Hi Paul,
>
> so I've been bisecting another issue and have been seeing the warning
> in the attached pic. Reverting c0f4dfd4f90f1667d234d21f15153ea09a2eaa66
> ("rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks") seems
> to fix the issue. The warning appears as the last thing on the screen
> before the machine goes down after being suspended to disk.
>
> And IRQ 16 and azx_interrupt both belong to hda_intel.c
>
> Any ideas?

Hmmm... Does this problem occur only with CONFIG_RCU_FAST_NO_HZ=y,
or does it occur unconditionally? (My guess is the former, but figured
I should check.)

Thanx, Paul

2013-04-21 10:34:06

by Borislav Petkov

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Sat, Apr 20, 2013 at 04:52:07PM -0700, Paul E. McKenney wrote:
> Hmmm... Does this problem occur only with CONFIG_RCU_FAST_NO_HZ=y, or
> does it occur unconditionally? (My guess is the former, but figured I
> should check.)

Your guess is correct, sir.

>From quickly grepping in /boot/, I have CONFIG_RCU_FAST_NO_HZ=y in all
my 3.9 configs. So I went and turned it off and the issue doesn't appear
anymore.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-21 16:30:23

by Paul E. McKenney

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Sun, Apr 21, 2013 at 12:34:03PM +0200, Borislav Petkov wrote:
> On Sat, Apr 20, 2013 at 04:52:07PM -0700, Paul E. McKenney wrote:
> > Hmmm... Does this problem occur only with CONFIG_RCU_FAST_NO_HZ=y, or
> > does it occur unconditionally? (My guess is the former, but figured I
> > should check.)
>
> Your guess is correct, sir.
>
> >From quickly grepping in /boot/, I have CONFIG_RCU_FAST_NO_HZ=y in all
> my 3.9 configs. So I went and turned it off and the issue doesn't appear
> anymore.

Thank you for the info! Now to figure out what the heck is causing this.

I am also guessing that your system does have hardware that could do an
irq 16. Of course, if removing or disabing this hardware is an option,
it would be interesting to see what happens. Especially given that at the
very bottom of the .jpg, there is a chopped-off line that appears to read
"Disabling IRQ #16". Which of course leads me to wonder whether we hit
a race between the last interrupt arriving and the device being disabled.

Hmmm... Do you have either CONFIG_PM_SLEEP, SUPPORT_VGA_SWITCHEROO,
or CONFIG_PM_RUNTIME set for these runs?

Adding Takashi Iwai on CC in case he has any insight.

Thanx, Paul

2013-04-21 16:57:02

by Borislav Petkov

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Sun, Apr 21, 2013 at 09:30:02AM -0700, Paul E. McKenney wrote:
> Thank you for the info! Now to figure out what the heck is causing this.
>
> I am also guessing that your system does have hardware that could do an
> irq 16. Of course, if removing or disabing this hardware is an option,
> it would be interesting to see what happens.

Onboard sound chip so only BIOS disable should help, if present. Let me
check. Oh, I can take out the laser and cut it out of the silicon. :-)

What about not enabling any sound support and thus not using this IRQ
handler? Would that tell us something?

> Especially given that at the very bottom of the .jpg, there is a
> chopped-off line that appears to read "Disabling IRQ #16". Which of
> course leads me to wonder whether we hit a race between the last
> interrupt arriving and the device being disabled.

Well, the "Disabling IRQ..." line comes from note_interrupt() down the
do_IRQ() path. It looks like

if (unlikely(desc->irqs_unhandled > 99900)) {

we hit some high number of unhandled interrupts, issue the message about
the bad IRQ and then disable it by dumping the handlers beforehand.

> Hmmm... Do you have either CONFIG_PM_SLEEP, SUPPORT_VGA_SWITCHEROO,
> or CONFIG_PM_RUNTIME set for these runs?

CONFIG_PM_SLEEP=y
# CONFIG_VGA_SWITCHEROO is not set
# CONFIG_PM_RUNTIME is not set

You meant CONFIG_VGA_SWITCHEROO and not SUPPORT_VGA_SWITCHEROO, right?

Ok, let me try to disable the soundcard in the BIOS.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-21 18:10:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Sun, Apr 21, 2013 at 06:56:54PM +0200, Borislav Petkov wrote:
> Ok, let me try to disable the soundcard in the BIOS.

Ok, there's no warning message anymore but maybe a dozen of seconds
delay before the machine is actually powered off.

"...
ACPI: Preparing to enter system sleep state S5
[Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
<a bunch of seconds of doing nothing>
...
<machine powers off>"

So could it be that some RCU cleanup stuff is going on during that time?

I mean, the issue is not that big of a deal - when the box resumes, the
sound card works just fine so...

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-21 18:47:56

by Paul E. McKenney

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Sun, Apr 21, 2013 at 06:56:54PM +0200, Borislav Petkov wrote:
> On Sun, Apr 21, 2013 at 09:30:02AM -0700, Paul E. McKenney wrote:
> > Thank you for the info! Now to figure out what the heck is causing this.
> >
> > I am also guessing that your system does have hardware that could do an
> > irq 16. Of course, if removing or disabing this hardware is an option,
> > it would be interesting to see what happens.
>
> Onboard sound chip so only BIOS disable should help, if present. Let me
> check. Oh, I can take out the laser and cut it out of the silicon. :-)
>
> What about not enabling any sound support and thus not using this IRQ
> handler? Would that tell us something?

Using a laser to cut out the silicon would be more entertaining, but
either BIOS disable or compiling out sound support should work. ;-)

> > Especially given that at the very bottom of the .jpg, there is a
> > chopped-off line that appears to read "Disabling IRQ #16". Which of
> > course leads me to wonder whether we hit a race between the last
> > interrupt arriving and the device being disabled.
>
> Well, the "Disabling IRQ..." line comes from note_interrupt() down the
> do_IRQ() path. It looks like
>
> if (unlikely(desc->irqs_unhandled > 99900)) {
>
> we hit some high number of unhandled interrupts, issue the message about
> the bad IRQ and then disable it by dumping the handlers beforehand.

It would also be interesting to print out desc->irqs_unhandled at some
point during suspend to get a feel for the magnitude of the difference.
After all, 99900 is an interesting choice of number!

> > Hmmm... Do you have either CONFIG_PM_SLEEP, SUPPORT_VGA_SWITCHEROO,
> > or CONFIG_PM_RUNTIME set for these runs?
>
> CONFIG_PM_SLEEP=y
> # CONFIG_VGA_SWITCHEROO is not set
> # CONFIG_PM_RUNTIME is not set
>
> You meant CONFIG_VGA_SWITCHEROO and not SUPPORT_VGA_SWITCHEROO, right?

You are quite correct.

Thanx, Paul

> Ok, let me try to disable the soundcard in the BIOS.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
>

2013-04-21 18:56:16

by Paul E. McKenney

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Sun, Apr 21, 2013 at 08:10:35PM +0200, Borislav Petkov wrote:
> On Sun, Apr 21, 2013 at 06:56:54PM +0200, Borislav Petkov wrote:
> > Ok, let me try to disable the soundcard in the BIOS.
>
> Ok, there's no warning message anymore but maybe a dozen of seconds
> delay before the machine is actually powered off.

That is not so good...

> "...
> ACPI: Preparing to enter system sleep state S5
> [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
> <a bunch of seconds of doing nothing>
> ...
> <machine powers off>"
>
> So could it be that some RCU cleanup stuff is going on during that time?

CONFIG_RCU_FAST_NO_HZ will definitely change the timing, for example,
increasing grace-period durations by up to a factor of four.

One way to figure out if this is the problem would be to either (1) apply
the following patch (assuming you have no more than a few tens of CPUs)
or (2) setting the sysfs rcutree.rcu_expedited variable to 1 just before
suspending the system.

Either approach will force RCU to always use the faster expedited
grace periods for synchronize_rcu() and friends. They will -not- help
if someone has open-coded synchronize_rcu() in terms of call_rcu(),
though.

> I mean, the issue is not that big of a deal - when the box resumes, the
> sound card works just fine so...

Fair point. perhaps we should wait for Takashi to weigh in.

Thanx, Paul

> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --


------------------------------------------------------------------------

rcu: Not for inclusion: Force expedited grace periods

Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index a9610d1..55c5ef6 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -2420,7 +2420,7 @@ void synchronize_sched(void)
"Illegal synchronize_sched() in RCU-sched read-side critical section");
if (rcu_blocking_is_gp())
return;
- if (rcu_expedited)
+ if (1)
synchronize_sched_expedited();
else
wait_rcu_gp(call_rcu_sched);
@@ -2447,7 +2447,7 @@ void synchronize_rcu_bh(void)
"Illegal synchronize_rcu_bh() in RCU-bh read-side critical section");
if (rcu_blocking_is_gp())
return;
- if (rcu_expedited)
+ if (1)
synchronize_rcu_bh_expedited();
else
wait_rcu_gp(call_rcu_bh);
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 46b93b0..190a199 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -711,7 +711,7 @@ void synchronize_rcu(void)
"Illegal synchronize_rcu() in RCU read-side critical section");
if (!rcu_scheduler_active)
return;
- if (rcu_expedited)
+ if (1)
synchronize_rcu_expedited();
else
wait_rcu_gp(call_rcu);

2013-04-21 19:07:03

by Borislav Petkov

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Sun, Apr 21, 2013 at 11:56:09AM -0700, Paul E. McKenney wrote:
> CONFIG_RCU_FAST_NO_HZ will definitely change the timing, for example,
> increasing grace-period durations by up to a factor of four.
>
> One way to figure out if this is the problem would be to either (1)
> apply the following patch (assuming you have no more than a few tens
> of CPUs)

Only 8 - I'm modest that way :)

> or (2) setting the sysfs rcutree.rcu_expedited variable to 1 just before
> suspending the system.
>
> Either approach will force RCU to always use the faster expedited
> grace periods for synchronize_rcu() and friends. They will -not- help
> if someone has open-coded synchronize_rcu() in terms of call_rcu(),
> though.

Right,

# echo 1 > /sys/kernel/rcu_expedited

helped.

No warning, no delay, 2 suspend/resume cycles back-to-back. So, a
probable fix could be to force-enable the expedited grace periods during
suspend...?

Hmmm.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-21 20:35:07

by Paul E. McKenney

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Sun, Apr 21, 2013 at 09:06:55PM +0200, Borislav Petkov wrote:
> On Sun, Apr 21, 2013 at 11:56:09AM -0700, Paul E. McKenney wrote:
> > CONFIG_RCU_FAST_NO_HZ will definitely change the timing, for example,
> > increasing grace-period durations by up to a factor of four.
> >
> > One way to figure out if this is the problem would be to either (1)
> > apply the following patch (assuming you have no more than a few tens
> > of CPUs)
>
> Only 8 - I'm modest that way :)

;-) ;-) ;-)

> > or (2) setting the sysfs rcutree.rcu_expedited variable to 1 just before
> > suspending the system.
> >
> > Either approach will force RCU to always use the faster expedited
> > grace periods for synchronize_rcu() and friends. They will -not- help
> > if someone has open-coded synchronize_rcu() in terms of call_rcu(),
> > though.
>
> Right,
>
> # echo 1 > /sys/kernel/rcu_expedited
>
> helped.
>
> No warning, no delay, 2 suspend/resume cycles back-to-back. So, a
> probable fix could be to force-enable the expedited grace periods during
> suspend...?

Fix for the slowness, for sure!

For the irq warning, it is most likely simply hiding the problem.

Still, speeding up suspend sounds valuable. The connection between
suspending and expediting grace periods probably needs to be in
the kernel -- people will undoubtedly want to expedite them for short
periods of time at runtime, like when starting wireshark, and it would
be good to minimize unnecessary dependency on scripting.

It would not be hard to provide an in-kernel primitive for expediting.

Hmmm... When you resume, is the expediting still set? (Can't see why
it would not be.) If so, it would be good to have some way of disabling
it after boot completes. Which, as noted in another thread, requires
that RCU be informed of when boot completes. ;-)

Thanx, Paul

> Hmmm.
>
> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
>

2013-04-21 20:51:56

by Borislav Petkov

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Sun, Apr 21, 2013 at 01:34:47PM -0700, Paul E. McKenney wrote:
> > No warning, no delay, 2 suspend/resume cycles back-to-back. So, a
> > probable fix could be to force-enable the expedited grace periods during
> > suspend...?
>
> Fix for the slowness, for sure!
>
> For the irq warning, it is most likely simply hiding the problem.

Hmm, they seem somehow related. The warning doesn't appear when
rcu_expedited is set which means, RCU grace periods are shorter, leading
up to less IRQ 16 interrupts piling up and not hitting the magic 99900
irq count.

At least this is how I primitively imagine it...

Btw this magic number is pretty old:

commit e40419dbdb3b5c912d35f8736c51687bf4e0deb0
Author: Ingo Molnar <[email protected]>
Date: Mon Oct 18 08:55:37 2004 -0700

[PATCH] generic irq subsystem: core

Probably Ingo and tglx know why it was chosen. :)

> Still, speeding up suspend sounds valuable. The connection between
> suspending and expediting grace periods probably needs to be in
> the kernel -- people will undoubtedly want to expedite them for short
> periods of time at runtime, like when starting wireshark, and it would
> be good to minimize unnecessary dependency on scripting.
>
> It would not be hard to provide an in-kernel primitive for expediting.

Well, this should be trivial, no? A one-liner setting rcu_expedited
to 1 somewhere at the beginning of suspend. When we go further down,
synchronize_sched() et al will pick up the change.

> Hmmm... When you resume, is the expediting still set? (Can't see why
> it would not be.)

Sure.

> If so, it would be good to have some way of disabling
> it after boot completes. Which, as noted in another thread, requires
> that RCU be informed of when boot completes. ;-)

A similar oneliner at the end of the resume path.

Maybe have rcu suspend/resume callbacks where you can do this stuff and
maybe more in the future.

Btw, I have no idea how the suspend and resume paths look like, will
have to dig a bit. Rafael would know though :-)

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-21 21:42:48

by Borislav Petkov

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Sun, Apr 21, 2013 at 10:51:39PM +0200, Borislav Petkov wrote:
> A similar oneliner at the end of the resume path.
>
> Maybe have rcu suspend/resume callbacks where you can do this stuff
> and maybe more in the future.

Ok, here's one - it is pretty straight-forward using that notifier
abomination :-). It is only in rcutree.c, it could probably be shared
with rcutiny.c. And it has a section mismatch but I'll fix that too.

It works though - two suspend/resume cycles looked ok.

--
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 986d1d3a34f5..cb1f61ede2f9 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -53,6 +53,7 @@
#include <linux/delay.h>
#include <linux/stop_machine.h>
#include <linux/random.h>
+#include <linux/suspend.h>

#include "rcutree.h"
#include <trace/events/rcu.h>
@@ -3097,6 +3098,22 @@ static int __cpuinit rcu_cpu_notify(struct notifier_block *self,
return NOTIFY_OK;
}

+static int __cpuinit rcu_pm_notify(struct notifier_block *self,
+ unsigned long action, void *hcpu)
+{
+ switch (action) {
+ case PM_HIBERNATION_PREPARE:
+ rcu_expedited = 1;
+ break;
+ case PM_POST_RESTORE:
+ rcu_expedited = 0;
+ break;
+ default:
+ break;
+ }
+ return NOTIFY_OK;
+}
+
/*
* Spawn the kthread that handles this RCU flavor's grace periods.
*/
@@ -3323,6 +3340,7 @@ void __init rcu_init(void)
* or the scheduler are operational.
*/
cpu_notifier(rcu_cpu_notify, 0);
+ pm_notifier(rcu_pm_notify, 0);
for_each_online_cpu(cpu)
rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu);
}


--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-21 22:01:04

by Paul E. McKenney

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Sun, Apr 21, 2013 at 11:42:41PM +0200, Borislav Petkov wrote:
> On Sun, Apr 21, 2013 at 10:51:39PM +0200, Borislav Petkov wrote:
> > A similar oneliner at the end of the resume path.
> >
> > Maybe have rcu suspend/resume callbacks where you can do this stuff
> > and maybe more in the future.
>
> Ok, here's one - it is pretty straight-forward using that notifier
> abomination :-). It is only in rcutree.c, it could probably be shared
> with rcutiny.c. And it has a section mismatch but I'll fix that too.
>
> It works though - two suspend/resume cycles looked ok.

Cool!!! Note that there is no need for expediting TINY_RCU because
its grace periods are already maximally expedited. There is only one
CPU, so if you are following the rules, when you call synchronize_rcu(),
by definition there is a zero-length grace period. ;-)

So all you need is rcutree.c.

Looking forward to the section-mismatch-free version!

Thanx, Paul

> --
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 986d1d3a34f5..cb1f61ede2f9 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -53,6 +53,7 @@
> #include <linux/delay.h>
> #include <linux/stop_machine.h>
> #include <linux/random.h>
> +#include <linux/suspend.h>
>
> #include "rcutree.h"
> #include <trace/events/rcu.h>
> @@ -3097,6 +3098,22 @@ static int __cpuinit rcu_cpu_notify(struct notifier_block *self,
> return NOTIFY_OK;
> }
>
> +static int __cpuinit rcu_pm_notify(struct notifier_block *self,
> + unsigned long action, void *hcpu)
> +{
> + switch (action) {
> + case PM_HIBERNATION_PREPARE:
> + rcu_expedited = 1;
> + break;
> + case PM_POST_RESTORE:
> + rcu_expedited = 0;
> + break;
> + default:
> + break;
> + }
> + return NOTIFY_OK;
> +}
> +
> /*
> * Spawn the kthread that handles this RCU flavor's grace periods.
> */
> @@ -3323,6 +3340,7 @@ void __init rcu_init(void)
> * or the scheduler are operational.
> */
> cpu_notifier(rcu_cpu_notify, 0);
> + pm_notifier(rcu_pm_notify, 0);
> for_each_online_cpu(cpu)
> rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu);
> }
>
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
>

2013-04-21 22:12:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Sun, Apr 21, 2013 at 03:00:15PM -0700, Paul E. McKenney wrote:
> Cool!!! Note that there is no need for expediting TINY_RCU because
> its grace periods are already maximally expedited. There is only one
> CPU, so if you are following the rules, when you call synchronize_rcu(),
> by definition there is a zero-length grace period. ;-)
>
> So all you need is rcutree.c.

Ok, even better :-).

> Looking forward to the section-mismatch-free version!

Yeah, here it is. Please sanity-check the commit message whether I'm
making sense.

I've just pulled -rc8 and tip/master + this patch and a couple of other
fixes and will run it the coming week to make sure there are no other
issues.

Thanks!

--
>From 3c155e9a22036839c09d98a4acbc7e17a5a3783c Mon Sep 17 00:00:00 2001
From: Borislav Petkov <[email protected]>
Date: Sun, 21 Apr 2013 23:56:15 +0200
Subject: [PATCH] RCU: Expedite grace periods during suspend/resume

Paul says CONFIG_RCU_FAST_NO_HZ can increase grace-period durations by a
factor of four. This, in turn, can change timings during suspend and can
delay interrupts handling, leading to "IRQ XX: nobody cared" splats and
the machine disabling those interrupt lines for no sensible reason.

Thus, we want to temporarily switch to expedited grace periods when
suspending the box and return to normal settings when resuming. This
patch does exactly that.

Cc: "Paul E. McKenney" <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
---
kernel/rcutree.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 986d1d3a34f5..9e9aeb38b7a2 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -53,6 +53,7 @@
#include <linux/delay.h>
#include <linux/stop_machine.h>
#include <linux/random.h>
+#include <linux/suspend.h>

#include "rcutree.h"
#include <trace/events/rcu.h>
@@ -3097,6 +3098,22 @@ static int __cpuinit rcu_cpu_notify(struct notifier_block *self,
return NOTIFY_OK;
}

+static int rcu_pm_notify(struct notifier_block *self,
+ unsigned long action, void *hcpu)
+{
+ switch (action) {
+ case PM_HIBERNATION_PREPARE:
+ rcu_expedited = 1;
+ break;
+ case PM_POST_RESTORE:
+ rcu_expedited = 0;
+ break;
+ default:
+ break;
+ }
+ return NOTIFY_OK;
+}
+
/*
* Spawn the kthread that handles this RCU flavor's grace periods.
*/
@@ -3323,6 +3340,7 @@ void __init rcu_init(void)
* or the scheduler are operational.
*/
cpu_notifier(rcu_cpu_notify, 0);
+ pm_notifier(rcu_pm_notify, 0);
for_each_online_cpu(cpu)
rcu_cpu_notify(NULL, CPU_UP_PREPARE, (void *)(long)cpu);
}
--
1.8.2.135.g7b592fa


--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-22 08:01:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: irq 16: nobody cared


* Borislav Petkov <[email protected]> wrote:

> From 3c155e9a22036839c09d98a4acbc7e17a5a3783c Mon Sep 17 00:00:00 2001
> From: Borislav Petkov <[email protected]>
> Date: Sun, 21 Apr 2013 23:56:15 +0200
> Subject: [PATCH] RCU: Expedite grace periods during suspend/resume
>
> Paul says CONFIG_RCU_FAST_NO_HZ can increase grace-period durations by a
> factor of four. This, in turn, can change timings during suspend and can
> delay interrupts handling, leading to "IRQ XX: nobody cared" splats and
> the machine disabling those interrupt lines for no sensible reason.
>
> Thus, we want to temporarily switch to expedited grace periods when
> suspending the box and return to normal settings when resuming. This
> patch does exactly that.

Hm, this really smells like a workaround: treating the symptom, not the
cause.

How can an increase in grace-periods break drivers and suspend? Do we
understand exactly what happens there? If we do, can we fix that instead
of tweaking the RCU timeouts?

Thanks,

Ingo

2013-04-22 08:32:23

by Takashi Iwai

[permalink] [raw]
Subject: Re: irq 16: nobody cared

At Sun, 21 Apr 2013 09:30:02 -0700,
Paul E. McKenney wrote:
>
> On Sun, Apr 21, 2013 at 12:34:03PM +0200, Borislav Petkov wrote:
> > On Sat, Apr 20, 2013 at 04:52:07PM -0700, Paul E. McKenney wrote:
> > > Hmmm... Does this problem occur only with CONFIG_RCU_FAST_NO_HZ=y, or
> > > does it occur unconditionally? (My guess is the former, but figured I
> > > should check.)
> >
> > Your guess is correct, sir.
> >
> > >From quickly grepping in /boot/, I have CONFIG_RCU_FAST_NO_HZ=y in all
> > my 3.9 configs. So I went and turned it off and the issue doesn't appear
> > anymore.
>
> Thank you for the info! Now to figure out what the heck is causing this.
>
> I am also guessing that your system does have hardware that could do an
> irq 16. Of course, if removing or disabing this hardware is an option,
> it would be interesting to see what happens. Especially given that at the
> very bottom of the .jpg, there is a chopped-off line that appears to read
> "Disabling IRQ #16". Which of course leads me to wonder whether we hit
> a race between the last interrupt arriving and the device being disabled.
>
> Hmmm... Do you have either CONFIG_PM_SLEEP, SUPPORT_VGA_SWITCHEROO,
> or CONFIG_PM_RUNTIME set for these runs?
>
> Adding Takashi Iwai on CC in case he has any insight.

Hm, if it's really due to a stray irq, just adding the missing
synchronize_irq() like below would help?


Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 7b213d5..262dbf1 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -2682,6 +2682,8 @@ static void azx_stop_chip(struct azx *chip)
/* disable interrupts */
azx_int_disable(chip);
azx_int_clear(chip);
+ if (chip->irq >= 0)
+ synchronize_irq(chip->irq);

/* disable CORB/RIRB */
azx_free_cmd_io(chip);

2013-04-22 09:13:14

by Borislav Petkov

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Mon, Apr 22, 2013 at 10:32:17AM +0200, Takashi Iwai wrote:
> Hm, if it's really due to a stray irq, just adding the missing
> synchronize_irq() like below would help?
>
>
> Takashi
>
> ---
> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> index 7b213d5..262dbf1 100644
> --- a/sound/pci/hda/hda_intel.c
> +++ b/sound/pci/hda/hda_intel.c
> @@ -2682,6 +2682,8 @@ static void azx_stop_chip(struct azx *chip)
> /* disable interrupts */
> azx_int_disable(chip);
> azx_int_clear(chip);
> + if (chip->irq >= 0)
> + synchronize_irq(chip->irq);

Nope, it doesn't help - I see the same splat.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-22 09:18:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Mon, Apr 22, 2013 at 10:01:36AM +0200, Ingo Molnar wrote:
> Hm, this really smells like a workaround: treating the symptom, not
> the cause.

Well, I just tested Takashi's add missing synchronize_irq() to the
suspend path of snd_hda_intel and it doesn't help.

So it could be an issue with this driver or only this hw/driver can
exacerbate RCU this much to trigger the spurious irq machinery.

> How can an increase in grace-periods break drivers and suspend? Do we
> understand exactly what happens there?

I'd rather look into Paul's direction here. :)

> If we do, can we fix that instead of tweaking the RCU timeouts?

Well, I was under the impression yesterday that we actually want
to switch to expedited grace periods for suspend/resume *anyway*,
regardless of drivers.

Paul?

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-22 09:19:10

by Takashi Iwai

[permalink] [raw]
Subject: Re: irq 16: nobody cared

At Mon, 22 Apr 2013 11:13:10 +0200,
Borislav Petkov wrote:
>
> On Mon, Apr 22, 2013 at 10:32:17AM +0200, Takashi Iwai wrote:
> > Hm, if it's really due to a stray irq, just adding the missing
> > synchronize_irq() like below would help?
> >
> >
> > Takashi
> >
> > ---
> > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> > index 7b213d5..262dbf1 100644
> > --- a/sound/pci/hda/hda_intel.c
> > +++ b/sound/pci/hda/hda_intel.c
> > @@ -2682,6 +2682,8 @@ static void azx_stop_chip(struct azx *chip)
> > /* disable interrupts */
> > azx_int_disable(chip);
> > azx_int_clear(chip);
> > + if (chip->irq >= 0)
> > + synchronize_irq(chip->irq);
>
> Nope, it doesn't help - I see the same splat.

Thanks. Then I have no idea what's missing in the current
azx_suspend() and azx_resume() codes.

Is the PCI device a HD-audio controller for the built-in analog or
a HDMI audio controller coupled with a graphics chip?

I wonder why no MSI is used there. MSI should have been used for most
of controllers except for cases where blacklisted with PCI SSID or
Nvidia / Creative controllers.


Takashi

2013-04-22 10:07:04

by Borislav Petkov

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Mon, Apr 22, 2013 at 11:19:07AM +0200, Takashi Iwai wrote:
> Is the PCI device a HD-audio controller for the built-in analog or
> a HDMI audio controller coupled with a graphics chip?

Hmm:

[ 7.204020] snd_hda_intel 0000:01:00.1: irq 90 for MSI/MSI-X
[ 7.551912] #0: HDA ATI SB at 0xfeb00000 irq 16
[ 7.551965] #1: HDA ATI HDMI at 0xfea30000 irq 90
[ 70.870377] snd_hda_intel 0000:01:00.1: irq 90 for MSI/MSI-X

# lspci -s 01:00.1 -vvv
01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI RV635 HDMI Audio [Radeon HD 3600 Series]
Subsystem: ASUSTeK Computer Inc. Device aa20
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 90
Region 0: Memory at fea30000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000feeff00c Data: 4164
Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Kernel driver in use: snd_hda_intel

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-22 11:33:15

by Takashi Iwai

[permalink] [raw]
Subject: Re: irq 16: nobody cared

At Mon, 22 Apr 2013 12:06:57 +0200,
Borislav Petkov wrote:
>
> On Mon, Apr 22, 2013 at 11:19:07AM +0200, Takashi Iwai wrote:
> > Is the PCI device a HD-audio controller for the built-in analog or
> > a HDMI audio controller coupled with a graphics chip?
>
> Hmm:
>
> [ 7.204020] snd_hda_intel 0000:01:00.1: irq 90 for MSI/MSI-X
> [ 7.551912] #0: HDA ATI SB at 0xfeb00000 irq 16

So, you have two PCI HD-audio controllers, and 0000:01:00.1 is for
HDMI while irq 16 is from another onboard audio. Does the onboard
audio PCI entry have no MSI support, right?

(This might be off-topic, but I just wonder...)


thanks,

Takashi

> [ 7.551965] #1: HDA ATI HDMI at 0xfea30000 irq 90
> [ 70.870377] snd_hda_intel 0000:01:00.1: irq 90 for MSI/MSI-X
>
> # lspci -s 01:00.1 -vvv
> 01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI RV635 HDMI Audio [Radeon HD 3600 Series]
> Subsystem: ASUSTeK Computer Inc. Device aa20
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin B routed to IRQ 90
> Region 0: Memory at fea30000 (64-bit, non-prefetchable) [size=16K]
> Capabilities: [50] Power Management version 3
> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
> MaxPayload 128 bytes, MaxReadReq 128 bytes
> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
> LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us
> ClockPM- Surprise- LLActRep- BwNot-
> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> DevCap2: Completion Timeout: Not Supported, TimeoutDis-
> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> Compliance De-emphasis: -6dB
> LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> Address: 00000000feeff00c Data: 4164
> Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
> Kernel driver in use: snd_hda_intel
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
>

2013-04-22 12:56:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Mon, 22 Apr 2013, Takashi Iwai wrote:
> At Mon, 22 Apr 2013 11:13:10 +0200,
> Borislav Petkov wrote:
> >
> > On Mon, Apr 22, 2013 at 10:32:17AM +0200, Takashi Iwai wrote:
> > > Hm, if it's really due to a stray irq, just adding the missing
> > > synchronize_irq() like below would help?
> > >
> > >
> > > Takashi
> > >
> > > ---
> > > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> > > index 7b213d5..262dbf1 100644
> > > --- a/sound/pci/hda/hda_intel.c
> > > +++ b/sound/pci/hda/hda_intel.c
> > > @@ -2682,6 +2682,8 @@ static void azx_stop_chip(struct azx *chip)
> > > /* disable interrupts */
> > > azx_int_disable(chip);
> > > azx_int_clear(chip);
> > > + if (chip->irq >= 0)
> > > + synchronize_irq(chip->irq);
> >
> > Nope, it doesn't help - I see the same splat.
>
> Thanks. Then I have no idea what's missing in the current
> azx_suspend() and azx_resume() codes.

I asked Boris to add a disable_irq() on top of the azx_suspend()
function and that does not help either, but it tells us that the
interrupt storm arrives _BEFORE_ azx_suspend() is invoked.

So some other device shuts down and causes that. The obvious question
is whether this interrupt line is shared with something else.

Boris, can you please provide the irq16 line of /proc/interrupts
before you invoke suspend?

If it's shared we know which driver is shutdown before hda_intel and
perhaps leaves its device in a weird state.

If it's not shared, it still might be shared behind the scene. The
other device uses MSI(X) and when shutdown MSI is disabled so the
default IRQ route which points to IRQ16 would be active again.

I assume, that the RCU stuff just causes enough timing difference to
uncover the underlying issue.

Thanks,

tglx

2013-04-22 13:17:10

by Paul E. McKenney

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Mon, Apr 22, 2013 at 11:18:47AM +0200, Borislav Petkov wrote:
> On Mon, Apr 22, 2013 at 10:01:36AM +0200, Ingo Molnar wrote:
> > Hm, this really smells like a workaround: treating the symptom, not
> > the cause.
>
> Well, I just tested Takashi's add missing synchronize_irq() to the
> suspend path of snd_hda_intel and it doesn't help.
>
> So it could be an issue with this driver or only this hw/driver can
> exacerbate RCU this much to trigger the spurious irq machinery.
>
> > How can an increase in grace-periods break drivers and suspend? Do we
> > understand exactly what happens there?
>
> I'd rather look into Paul's direction here. :)

I believe that we need both -- a fix that prevents the stray irq for
normal CONFIG_RCU_FAST_NO_HZ grace periods and also expediting grace
periods during suspend/resume. I am greedy that way. ;-)

Thanx, Paul

> > If we do, can we fix that instead of tweaking the RCU timeouts?
>
> Well, I was under the impression yesterday that we actually want
> to switch to expedited grace periods for suspend/resume *anyway*,
> regardless of drivers.
>
> Paul?
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
>

2013-04-22 13:56:32

by Borislav Petkov

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Mon, Apr 22, 2013 at 01:33:12PM +0200, Takashi Iwai wrote:
> At Mon, 22 Apr 2013 12:06:57 +0200,
> Borislav Petkov wrote:
> >
> > On Mon, Apr 22, 2013 at 11:19:07AM +0200, Takashi Iwai wrote:
> > > Is the PCI device a HD-audio controller for the built-in analog or
> > > a HDMI audio controller coupled with a graphics chip?
> >
> > Hmm:
> >
> > [ 7.204020] snd_hda_intel 0000:01:00.1: irq 90 for MSI/MSI-X
> > [ 7.551912] #0: HDA ATI SB at 0xfeb00000 irq 16
>
> So, you have two PCI HD-audio controllers, and 0000:01:00.1 is for
> HDMI while irq 16 is from another onboard audio. Does the onboard
> audio PCI entry have no MSI support, right?

Looks like it:

$ grep snd /proc/interrupts
16: 108 109 107 106 106 108 107 118 IO-APIC-fasteoi snd_hda_intel
90: 3 3 4 4 4 4 3 3 PCI-MSI-edge snd_hda_intel

Btw in alsamixer, I have a bunch of controls for the SB soundcard and
when I select the HDMI sound card, it shows only one "S/PDIF" volume
control which can only be muted.

I'd guess this is the audio output over the monitor when it is connected
over HDMI or so.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-22 14:24:03

by Borislav Petkov

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Mon, Apr 22, 2013 at 02:56:08PM +0200, Thomas Gleixner wrote:
> Boris, can you please provide the irq16 line of /proc/interrupts
> before you invoke suspend?
>
> If it's shared we know which driver is shutdown before hda_intel and
> perhaps leaves its device in a weird state.
>
> If it's not shared, it still might be shared behind the scene. The
> other device uses MSI(X) and when shutdown MSI is disabled so the
> default IRQ route which points to IRQ16 would be active again.
>
> I assume, that the RCU stuff just causes enough timing difference to
> uncover the underlying issue.

Hmm, as Takashi reminded me, this box has two audio controllers. So
could it be that when we shutdown and disable MSI, the interrupts get
routed to to IRQ16 and they overflow there? Just a wild guess, I guess.

I very well can imagine that those two audio controllers to be one where
some f*cked up BIOS shim switches between the two but can't keep them
apart properly.

Something like the HPET fiascos...

* This is irq counts with my patch to expedite RCU grace periods:

- before suspend:
# grep snd /proc/interrupts
16: 1304 1340 1335 1329 1293 1348 1357 1363 IO-APIC-fasteoi snd_hda_intel
90: 3 3 4 4 4 4 3 3 PCI-MSI-edge snd_hda_intel

- after resume:
# grep snd /proc/interrupts
16: 1354 1389 1386 1380 1344 1400 1409 1495 IO-APIC-fasteoi snd_hda_intel
90: 0 0 0 0 0 0 0 5 PCI-MSI-edge snd_hda_intel


* "normal" RCU grace periods:

- before suspend:
# grep snd /proc/interrupts
16: 109 108 107 108 108 108 109 111 IO-APIC-fasteoi snd_hda_intel
90: 3 3 4 4 4 4 3 3 PCI-MSI-edge snd_hda_intel

- after resume:
# grep snd /proc/interrupts
16: 154 154 153 155 157 157 235 164 IO-APIC-fasteoi snd_hda_intel
90: 0 0 0 0 0 0 5 0 PCI-MSI-edge snd_hda_intel

More "hmm.."

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-22 14:44:45

by Paul E. McKenney

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Mon, Apr 22, 2013 at 04:23:56PM +0200, Borislav Petkov wrote:
> On Mon, Apr 22, 2013 at 02:56:08PM +0200, Thomas Gleixner wrote:
> > Boris, can you please provide the irq16 line of /proc/interrupts
> > before you invoke suspend?
> >
> > If it's shared we know which driver is shutdown before hda_intel and
> > perhaps leaves its device in a weird state.
> >
> > If it's not shared, it still might be shared behind the scene. The
> > other device uses MSI(X) and when shutdown MSI is disabled so the
> > default IRQ route which points to IRQ16 would be active again.
> >
> > I assume, that the RCU stuff just causes enough timing difference to
> > uncover the underlying issue.
>
> Hmm, as Takashi reminded me, this box has two audio controllers. So
> could it be that when we shutdown and disable MSI, the interrupts get
> routed to to IRQ16 and they overflow there? Just a wild guess, I guess.

Would it make sense to try shutting off the hardware, waiting (say)
100 milliseconds, then diabling the irq? (Hey, had to ask!)

Thanx, Paul

> I very well can imagine that those two audio controllers to be one where
> some f*cked up BIOS shim switches between the two but can't keep them
> apart properly.
>
> Something like the HPET fiascos...
>
> * This is irq counts with my patch to expedite RCU grace periods:
>
> - before suspend:
> # grep snd /proc/interrupts
> 16: 1304 1340 1335 1329 1293 1348 1357 1363 IO-APIC-fasteoi snd_hda_intel
> 90: 3 3 4 4 4 4 3 3 PCI-MSI-edge snd_hda_intel
>
> - after resume:
> # grep snd /proc/interrupts
> 16: 1354 1389 1386 1380 1344 1400 1409 1495 IO-APIC-fasteoi snd_hda_intel
> 90: 0 0 0 0 0 0 0 5 PCI-MSI-edge snd_hda_intel
>
>
> * "normal" RCU grace periods:
>
> - before suspend:
> # grep snd /proc/interrupts
> 16: 109 108 107 108 108 108 109 111 IO-APIC-fasteoi snd_hda_intel
> 90: 3 3 4 4 4 4 3 3 PCI-MSI-edge snd_hda_intel
>
> - after resume:
> # grep snd /proc/interrupts
> 16: 154 154 153 155 157 157 235 164 IO-APIC-fasteoi snd_hda_intel
> 90: 0 0 0 0 0 0 5 0 PCI-MSI-edge snd_hda_intel
>
> More "hmm.."
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2013-04-22 21:33:09

by Borislav Petkov

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Mon, Apr 22, 2013 at 07:44:02AM -0700, Paul E. McKenney wrote:
> Would it make sense to try shutting off the hardware, waiting (say)
> 100 milliseconds, then diabling the irq? (Hey, had to ask!)

Ok, after a whole day of debugging, here's what tglx and I found out
(Thomas, please correct me if I've misinterpreted something):

When we suspend, we do the following things:

* suspend/freeze devices
* build hibernation image
* resume/thaw devices
* write out image to swap
* enter S5

Below's how it looks like in dmesg (the azx_resume is debugging output
to show us when we resume the audio crap).

Now, when we resume the devices to write out the image, the hda_intel
thing gets an interrupt on irq line 16.

What's puzzling is that

1) HDA's INTSTS status register says no IRQ occurred
2) HDA's PCI status register says no IRQ is active
3) however, (IO)APIC says an IRQ is pending

The azx_interrupt() IRQ handler looks at the INTSTS and since there's no
status bit set, it returns IRQ_NONE.

After 99900 retries, the spurious IRQ detection triggers and we disable
IRQ 16.

Now, the RCU expedited grace periods patch only brown-paper-bags this
issue because the IRQ is still pending but we don't wait a couple of
seconds before powering off the machine to actually trigger the spurious
IRQ detection.

However, paulmck wants to take this patch anyway and for an unrelated
reason or simply because he's greedy that way :-)

Further action is to try to find out why this IRQ is still there when
the device gets enabled.

That should be all for now.

[ 1442.248431] PM: Marking nosave pages: [mem 0xba9ba000-0xbca4dfff]
[ 1442.256534] PM: Marking nosave pages: [mem 0xbca4f000-0xbcc54fff]
[ 1442.256542] PM: Marking nosave pages: [mem 0xbd083000-0xbd7f3fff]
[ 1442.256571] PM: Marking nosave pages: [mem 0xbd800000-0x100000fff]
[ 1442.257147] PM: Basic memory bitmaps created
[ 1442.257147] PM: Syncing filesystems ... done.
[ 1442.294602] Freezing user space processes ... (elapsed 0.01 seconds) done.
[ 1442.315451] PM: Preallocating image memory... done (allocated 168444 pages)
[ 1442.826308] PM: Allocated 673776 kbytes in 0.50 seconds (1347.55 MB/s)
[ 1442.833410] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
[ 1443.579467] [drm] PCIE gen 2 link speeds already enabled
[ 1443.579583] serial 00:09: activated
[ 1443.582498] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[ 1443.582557] radeon 0000:01:00.0: WB enabled
[ 1443.582559] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff880428eefc00
[ 1443.582561] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff880428eefc0c
[ 1443.588070] azx_resume: irq: -1, after PCI_D0
162] azx_resume: irq: -1, after restore_state
[ 1443.588256] azx_resume: irq: -1, after set_master
[ 1443.588258] azx_resume: irq: -1, msi
[ 1443.588528] azx_resume: irq: 90, after acquire_irq
[ 1443.588530] azx_resume: irq: 90, after init_pci
[ 1443.588983] azx_resume: irq: -1, after PCI_D0
[ 1443.589059] azx_resume: irq: -1, after restore_state
[ 1443.589189] azx_resume: irq: -1, after set_master
[ 1443.589218] azx_resume: irq: 16, after acquire_irq
[ 1443.589227] azx_resume: irq: 16, after init_pci
[ 1443.590827] azx_resume: irq: 90, after init_chip
[ 1443.590982] azx_resume: irq: 90, after hda_resume
[ 1443.590986] azx_resume: irq: 90, exit
[ 1443.593278] azx_resume: irq: 16, after init_chip
[ 1443.613761] [drm] ring test on 0 succeeded in 0 usecs
[ 1443.613819] [drm] ring test on 3 succeeded in 1 usecs
[ 1443.613837] [drm] ib test on ring 0 succeeded in 0 usecs
[ 1443.613852] [drm] ib test on ring 3 succeeded in 1 usecs
[ 1443.763299] r8169 0000:02:00.0 eth0: link down
[ 1443.764763] azx_resume: irq: 16, after hda_resume
[ 1443.764765] azx_resume: irq: 16, exit
[ 1443.882924] ata7: SATA link down (SStatus 0 SControl 300)
[ 1443.888956] ata5: SATA link down (SStatus 0 SControl 300)
[ 1443.897831] ata6: SATA link down (SStatus 0 SControl 300)
[ 1444.037878] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1444.046092] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1444.054510] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1444.064691] ata4.00: configured for UDMA/133
[ 1444.071386] ata3.00: configured for UDMA/133
[ 1444.077579] ata2.00: configured for UDMA/133
[ 1444.430017] PM: thaw of devices complete after 852.888 msecs
[ 1444.439553] PM: writing image.
[ 1444.446508] PM: Using 3 thread(s) for compression.
[ 1444.446508] PM: Compressing and saving image data (171542 pages)...
[ 1444.461242] PM: Image saving progress: 0%
[ 1444.712573] PM: Image saving progress: 10%
[ 1444.911138] PM: Image saving progress: 20%
[ 1445.235778] PM: Image saving progress: 30%
[ 1445.268331] r8169 0000:02:00.0 eth0: link up
[ 1445.713293] PM: Image saving progress: 40%
[ 1446.067955] PM: Image saving progress: 50%
[ 1446.269045] PM: Image saving progress: 60%
[ 1447.010418] PM: Image saving progress: 90%
[ 1447.249890] PM: Image saving progress: 100%
[ 1447.275236] PM: Image saving done.
[ 1447.279229] PM: Wrote 686168 kbytes in 2.81 seconds (244.18 MB/s)
[ 1447.286362] PM: S|
[ 1447.640958] kvm: exiting hardware virtualization
[ 1447.648264] sd 3:0:0:0: [sdc] Synchronizing SCSI cache
[ 1447.654329] sd 3:0:0:0: [sdc] Stopping disk
[ 1448.184498] sd 2:0:0:0: [sdb] Synchronizing SCSI cache
[ 1448.191528] sd 2:0:0:0: [sdb] Stopping disk
[ 1449.065339] sd 1:0:0:0: [sda] Synchronizing SCSI cache
[ 1449.072924] sd 1:0:0:0: [sda] Stopping disk
[ 1449.112649] pcieport 0000:00:04.0: System wakeup enabled by ACPI
[ 1449.134864] ACPI: Preparing to enter system sleep state S5
[ 1449.141054] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
[ 1451.616441] Call Trace:
[ 1451.620521] <IRQ> [<ffffffff810b729d>] __report_bad_irq+0x3d/0xe0
[ 1451.628446] [<ffffffff810b7736>] note_interrupt+0x1a6/0x1f0
[ 1451.635730] [<ffffffff8134ddc6>] ? add_interrupt_randomness+0x26/0x180
[ 1451.644003] [<ffffffff810b4f2b>] handle_irq_event_percpu+0xcb/0x3a0
[ 1451.652058] [<ffffffff810b5248>] handle_irq_event+0x48/0x70
[ 1451.659437] [<ffffffff8159ba7d>] ? _raw_spin_lock+0x6d/0x80
[ 1451.666764] [<ffffffff810b826e>] ? handle_fasteoi_irq+0x1e/0xf0
[ 1451.674457] [<ffffffff810b82aa>] handle_fasteoi_irq+0x5a/0xf0
[ 1451.681959] [<ffffffff810042a2>] handle_irq+0x22/0x40
[ 1451.688791] [<ffffffff8159f04a>] do_IRQ+0x5a/0xd0
[ 1451.695296] [<ffffffff8159cbaf>] common_interrupt+0x6f/0x6f
[ 1451.702672] <EOI> [<ffffffff8107a775>] ? sched_clock_cpu+0xc5/0x120
[ 1451.710934] [<ffffffff8144f703>] ? cpuidle_wrap_enter+0x53/0x90
[ 1451.718745] [<ffffffff8144f6ff>] ? cpuidle_wrap_enter+0x4f/0x90
[ 1451.726568] [<ffffffff8144ebc0>] cpuidle_enter_tk+0x10/0x20
[ 1451.734033] [<ffffffff8144f265>] cpuidle_idle_call+0xb5/0x390
[ 1451.741673] [<ffffffff8100c12e>] arch_cpu_idle+0xe/0x30
[ 1451.748793] [<ffffffff8108d3ee>] cpu_startup_entry+0x7e/0x3d0
[ 1451.756433] [<ffffffff8158d364>] start_secondary+0x1b7/0x1bb
[ 1451.763984] handlers:
[ 1451.768074] [<ffffffff814b6ce0>] azx_interrupt
[ 1451.774386] Disabling IRQ #16
[ 1464.581166] acpi_power_off called

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-22 22:07:56

by Paul E. McKenney

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Mon, Apr 22, 2013 at 11:33:03PM +0200, Borislav Petkov wrote:
> On Mon, Apr 22, 2013 at 07:44:02AM -0700, Paul E. McKenney wrote:
> > Would it make sense to try shutting off the hardware, waiting (say)
> > 100 milliseconds, then diabling the irq? (Hey, had to ask!)
>
> Ok, after a whole day of debugging, here's what tglx and I found out
> (Thomas, please correct me if I've misinterpreted something):
>
> When we suspend, we do the following things:
>
> * suspend/freeze devices
> * build hibernation image
> * resume/thaw devices
> * write out image to swap
> * enter S5
>
> Below's how it looks like in dmesg (the azx_resume is debugging output
> to show us when we resume the audio crap).
>
> Now, when we resume the devices to write out the image, the hda_intel
> thing gets an interrupt on irq line 16.
>
> What's puzzling is that
>
> 1) HDA's INTSTS status register says no IRQ occurred
> 2) HDA's PCI status register says no IRQ is active
> 3) however, (IO)APIC says an IRQ is pending
>
> The azx_interrupt() IRQ handler looks at the INTSTS and since there's no
> status bit set, it returns IRQ_NONE.
>
> After 99900 retries, the spurious IRQ detection triggers and we disable
> IRQ 16.

Yow!!!

> Now, the RCU expedited grace periods patch only brown-paper-bags this
> issue because the IRQ is still pending but we don't wait a couple of
> seconds before powering off the machine to actually trigger the spurious
> IRQ detection.
>
> However, paulmck wants to take this patch anyway and for an unrelated
> reason or simply because he's greedy that way :-)

And I did queue it, and followed it up with a patchlet that refrains
from expediting on systems with more than 256 CPUs. Faster suspends
and resumes and all that...

Thanx, Paul

> Further action is to try to find out why this IRQ is still there when
> the device gets enabled.
>
> That should be all for now.
>
> [ 1442.248431] PM: Marking nosave pages: [mem 0xba9ba000-0xbca4dfff]
> [ 1442.256534] PM: Marking nosave pages: [mem 0xbca4f000-0xbcc54fff]
> [ 1442.256542] PM: Marking nosave pages: [mem 0xbd083000-0xbd7f3fff]
> [ 1442.256571] PM: Marking nosave pages: [mem 0xbd800000-0x100000fff]
> [ 1442.257147] PM: Basic memory bitmaps created
> [ 1442.257147] PM: Syncing filesystems ... done.
> [ 1442.294602] Freezing user space processes ... (elapsed 0.01 seconds) done.
> [ 1442.315451] PM: Preallocating image memory... done (allocated 168444 pages)
> [ 1442.826308] PM: Allocated 673776 kbytes in 0.50 seconds (1347.55 MB/s)
> [ 1442.833410] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
> [ 1443.579467] [drm] PCIE gen 2 link speeds already enabled
> [ 1443.579583] serial 00:09: activated
> [ 1443.582498] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
> [ 1443.582557] radeon 0000:01:00.0: WB enabled
> [ 1443.582559] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff880428eefc00
> [ 1443.582561] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff880428eefc0c
> [ 1443.588070] azx_resume: irq: -1, after PCI_D0
> 162] azx_resume: irq: -1, after restore_state
> [ 1443.588256] azx_resume: irq: -1, after set_master
> [ 1443.588258] azx_resume: irq: -1, msi
> [ 1443.588528] azx_resume: irq: 90, after acquire_irq
> [ 1443.588530] azx_resume: irq: 90, after init_pci
> [ 1443.588983] azx_resume: irq: -1, after PCI_D0
> [ 1443.589059] azx_resume: irq: -1, after restore_state
> [ 1443.589189] azx_resume: irq: -1, after set_master
> [ 1443.589218] azx_resume: irq: 16, after acquire_irq
> [ 1443.589227] azx_resume: irq: 16, after init_pci
> [ 1443.590827] azx_resume: irq: 90, after init_chip
> [ 1443.590982] azx_resume: irq: 90, after hda_resume
> [ 1443.590986] azx_resume: irq: 90, exit
> [ 1443.593278] azx_resume: irq: 16, after init_chip
> [ 1443.613761] [drm] ring test on 0 succeeded in 0 usecs
> [ 1443.613819] [drm] ring test on 3 succeeded in 1 usecs
> [ 1443.613837] [drm] ib test on ring 0 succeeded in 0 usecs
> [ 1443.613852] [drm] ib test on ring 3 succeeded in 1 usecs
> [ 1443.763299] r8169 0000:02:00.0 eth0: link down
> [ 1443.764763] azx_resume: irq: 16, after hda_resume
> [ 1443.764765] azx_resume: irq: 16, exit
> [ 1443.882924] ata7: SATA link down (SStatus 0 SControl 300)
> [ 1443.888956] ata5: SATA link down (SStatus 0 SControl 300)
> [ 1443.897831] ata6: SATA link down (SStatus 0 SControl 300)
> [ 1444.037878] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [ 1444.046092] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [ 1444.054510] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [ 1444.064691] ata4.00: configured for UDMA/133
> [ 1444.071386] ata3.00: configured for UDMA/133
> [ 1444.077579] ata2.00: configured for UDMA/133
> [ 1444.430017] PM: thaw of devices complete after 852.888 msecs
> [ 1444.439553] PM: writing image.
> [ 1444.446508] PM: Using 3 thread(s) for compression.
> [ 1444.446508] PM: Compressing and saving image data (171542 pages)...
> [ 1444.461242] PM: Image saving progress: 0%
> [ 1444.712573] PM: Image saving progress: 10%
> [ 1444.911138] PM: Image saving progress: 20%
> [ 1445.235778] PM: Image saving progress: 30%
> [ 1445.268331] r8169 0000:02:00.0 eth0: link up
> [ 1445.713293] PM: Image saving progress: 40%
> [ 1446.067955] PM: Image saving progress: 50%
> [ 1446.269045] PM: Image saving progress: 60%
> [ 1447.010418] PM: Image saving progress: 90%
> [ 1447.249890] PM: Image saving progress: 100%
> [ 1447.275236] PM: Image saving done.
> [ 1447.279229] PM: Wrote 686168 kbytes in 2.81 seconds (244.18 MB/s)
> [ 1447.286362] PM: S|
> [ 1447.640958] kvm: exiting hardware virtualization
> [ 1447.648264] sd 3:0:0:0: [sdc] Synchronizing SCSI cache
> [ 1447.654329] sd 3:0:0:0: [sdc] Stopping disk
> [ 1448.184498] sd 2:0:0:0: [sdb] Synchronizing SCSI cache
> [ 1448.191528] sd 2:0:0:0: [sdb] Stopping disk
> [ 1449.065339] sd 1:0:0:0: [sda] Synchronizing SCSI cache
> [ 1449.072924] sd 1:0:0:0: [sda] Stopping disk
> [ 1449.112649] pcieport 0000:00:04.0: System wakeup enabled by ACPI
> [ 1449.134864] ACPI: Preparing to enter system sleep state S5
> [ 1449.141054] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
> [ 1451.616441] Call Trace:
> [ 1451.620521] <IRQ> [<ffffffff810b729d>] __report_bad_irq+0x3d/0xe0
> [ 1451.628446] [<ffffffff810b7736>] note_interrupt+0x1a6/0x1f0
> [ 1451.635730] [<ffffffff8134ddc6>] ? add_interrupt_randomness+0x26/0x180
> [ 1451.644003] [<ffffffff810b4f2b>] handle_irq_event_percpu+0xcb/0x3a0
> [ 1451.652058] [<ffffffff810b5248>] handle_irq_event+0x48/0x70
> [ 1451.659437] [<ffffffff8159ba7d>] ? _raw_spin_lock+0x6d/0x80
> [ 1451.666764] [<ffffffff810b826e>] ? handle_fasteoi_irq+0x1e/0xf0
> [ 1451.674457] [<ffffffff810b82aa>] handle_fasteoi_irq+0x5a/0xf0
> [ 1451.681959] [<ffffffff810042a2>] handle_irq+0x22/0x40
> [ 1451.688791] [<ffffffff8159f04a>] do_IRQ+0x5a/0xd0
> [ 1451.695296] [<ffffffff8159cbaf>] common_interrupt+0x6f/0x6f
> [ 1451.702672] <EOI> [<ffffffff8107a775>] ? sched_clock_cpu+0xc5/0x120
> [ 1451.710934] [<ffffffff8144f703>] ? cpuidle_wrap_enter+0x53/0x90
> [ 1451.718745] [<ffffffff8144f6ff>] ? cpuidle_wrap_enter+0x4f/0x90
> [ 1451.726568] [<ffffffff8144ebc0>] cpuidle_enter_tk+0x10/0x20
> [ 1451.734033] [<ffffffff8144f265>] cpuidle_idle_call+0xb5/0x390
> [ 1451.741673] [<ffffffff8100c12e>] arch_cpu_idle+0xe/0x30
> [ 1451.748793] [<ffffffff8108d3ee>] cpu_startup_entry+0x7e/0x3d0
> [ 1451.756433] [<ffffffff8158d364>] start_secondary+0x1b7/0x1bb
> [ 1451.763984] handlers:
> [ 1451.768074] [<ffffffff814b6ce0>] azx_interrupt
> [ 1451.774386] Disabling IRQ #16
> [ 1464.581166] acpi_power_off called
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
>

2013-04-23 14:11:06

by Thomas Gleixner

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Mon, 22 Apr 2013, Borislav Petkov wrote:
> [ 1449.134864] ACPI: Preparing to enter system sleep state S5
> [ 1449.141054] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored

and right after that the spurious interrupt happens. I asked Boris to
boot with acpi_osi=Linux on the command line and the issue went
away, though now it complains right before shutdown about:

[ 123.142377] do_IRQ: 2.179 No irq handler for vector (irq -1)

I wouldn't be surprised if that vector 179 was previously used for
irq16.

Without acpi_osi=Linux the BIOS manages to put the hardware into a
state which causes an interrupt to be sent from thin air.

We should have an annual "Brain Is Obviously Superfluous" award.
This one would be high up on the nomination list.

Yours grumpy

tglx

2013-04-23 14:35:06

by Borislav Petkov

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Tue, Apr 23, 2013 at 04:10:22PM +0200, Thomas Gleixner wrote:
> On Mon, 22 Apr 2013, Borislav Petkov wrote:
> > [ 1449.134864] ACPI: Preparing to enter system sleep state S5
> > [ 1449.141054] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
>
> and right after that the spurious interrupt happens. I asked Boris to
> boot with acpi_osi=Linux on the command line and the issue went
> away, though now it complains right before shutdown about:
>
> [ 123.142377] do_IRQ: 2.179 No irq handler for vector (irq -1)
>
> I wouldn't be surprised if that vector 179 was previously used for
> irq16.
>
> Without acpi_osi=Linux the BIOS manages to put the hardware into a
> state which causes an interrupt to be sent from thin air.
>
> We should have an annual "Brain Is Obviously Superfluous" award.
> This one would be high up on the nomination list.

Yep, crack must've particularly cheap during that time so the monkeys
went and bought double the amount of bags and did hit new levels of
SNAFU with an excessive, couple-of-days binge of smoking and typing into
their keyboards.

They probably looked like that while at it:
http://devopsreactions.tumblr.com/post/47851533209/learning-linux

Oh boy, this is a new low in my BIOS-f*ckups collection.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-23 15:02:50

by Paul E. McKenney

[permalink] [raw]
Subject: Re: irq 16: nobody cared

On Tue, Apr 23, 2013 at 04:10:22PM +0200, Thomas Gleixner wrote:
> On Mon, 22 Apr 2013, Borislav Petkov wrote:
> > [ 1449.134864] ACPI: Preparing to enter system sleep state S5
> > [ 1449.141054] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
>
> and right after that the spurious interrupt happens. I asked Boris to
> boot with acpi_osi=Linux on the command line and the issue went
> away, though now it complains right before shutdown about:
>
> [ 123.142377] do_IRQ: 2.179 No irq handler for vector (irq -1)
>
> I wouldn't be surprised if that vector 179 was previously used for
> irq16.
>
> Without acpi_osi=Linux the BIOS manages to put the hardware into a
> state which causes an interrupt to be sent from thin air.
>
> We should have an annual "Brain Is Obviously Superfluous" award.
> This one would be high up on the nomination list.

Yow!!!

Many thanks to you and Borislav for tracking this down!!!

Thanx, Paul

> Yours grumpy
>
> tglx
>