2012-03-05 00:43:30

by Stefan Lippers-Hollmann

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

Hi

On Monday 05 March 2012, Sven Joachim wrote:
> On 2012-02-28 02:05 +0100, Greg KH wrote:
>
> > 3.2-stable review patch. If anyone has any objections, please let me know.
>
> Alas, the commit below broke the WiFi on my laptop, where
> "iwlist wlan0 scan" now just reports: " wlan0 No scan results".
>
> This is not specific to 3.2, the same problem exists in 3.0.23 and
> 3.3-rc6 while previous releases worked.

I can confirm this regression on an old PIII based notebook with a
32 bit BCM4306/3 PCMCIA card:

02:00.0 Network controller [0280]: Broadcom Corporation BCM4306 802.11b/g Wireless LAN Controller [14e4:4320] (rev 03)
Subsystem: Linksys WPC54G v1 / WPC54GS v1 802.11g Wireless-G Notebook Adapter [1737:4320]
Flags: bus master, fast devsel, latency 64, IRQ 10
Memory at 20000000 (32-bit, non-prefetchable) [size=8K]
Capabilities: [40] Power Management version 2
Kernel driver in use: b43-pci-bridge

I can observe these differences in the dmesg output (3.2.9 + current
stable queue-3.2):

The kernel configuration is identical in both cases, the only
difference is "genirq: Unmask oneshot irqs when thread was not woken"
being applied or not.

--- working <-- 3.2.9 + current stable queue-3.2, with "genirq: Unmask oneshot irqs when thread was not woken" reverted
+++ broken <-- 3.2.9 + current stable queue-3.2
@@ -125,6 +125,7 @@ Initializing cgroup subsys freezer
Initializing cgroup subsys net_cls
Initializing cgroup subsys blkio
Initializing cgroup subsys perf_event
+CPU serial number disabled.
mce: CPU supports 5 MCE banks
SMP alternatives: switching to UP code
Freeing SMP alternatives: 8k freed
@@ -137,9 +138,16 @@ SMP disabled
Performance Events:
no APIC, boot with the "lapic" boot parameter to force-enable it.
no hardware sampling interrupt available.
-Broken PMU hardware detected, using software events only.
+p6 PMU driver.
+... version: 0
+... bit width: 32
+... generic registers: 2
+... value mask: 00000000ffffffff
+... max period: 000000007fffffff
+... fixed-purpose events: 0
+... event mask: 0000000000000003
Brought up 1 CPUs
Total of 1 processors activated (1989.75 BogoMIPS).
devtmpfs: initialized
PM: Registering ACPI NVS region at eeff000 (4096 bytes)
print_constraints: dummy:
@@ -174,12 +182,12 @@ pci 0000:00:00.0: reg 10: [mem 0xec00000
type 1 class 0x000604
pci 0000:00:01.0: supports D1
type 2 class 0x000607
-pci 0000:00:04.0: reg 10: [mem 0x10000000-0x10000fff]
+pci 0000:00:04.0: reg 10: [mem 0x00000000-0x00000fff]
pci 0000:00:04.0: supports D1 D2
pci 0000:00:04.0: PME# supported from D0 D1 D2 D3hot D3cold
pci 0000:00:04.0: PME# disabled
type 2 class 0x000607
-pci 0000:00:04.1: reg 10: [mem 0x14000000-0x14000fff]
+pci 0000:00:04.1: reg 10: [mem 0x00000000-0x00000fff]
pci 0000:00:04.1: supports D1 D2
pci 0000:00:04.1: PME# supported from D0 D1 D2 D3hot D3cold
pci 0000:00:04.1: PME# disabled
@@ -299,6 +307,10 @@ pnp: PnP ACPI: found 11 devices
ACPI: ACPI bus type pnp unregistered
Switching to clocksource acpi_pm
PCI: max bus depth: 1 pci_try_num: 2
+pci 0000:00:04.0: BAR 0: assigned [mem 0x10000000-0x10000fff]
+(PCI address [0x10000000-0x10000fff])
+pci 0000:00:04.1: BAR 0: assigned [mem 0x14000000-0x14000fff]
+(PCI address [0x14000000-0x14000fff])
pci 0000:00:11.0: BAR 6: assigned [mem 0x10020000-0x1003ffff pref]
pci 0000:00:04.1: BAR 16: assigned [mem 0x18000000-0x1bffffff]
pci 0000:00:04.1: BAR 15: assigned [mem 0x1c000000-0x1fffffff pref]
@@ -326,9 +338,11 @@ pci 0000:00:01.0: setting latency timer
enabled at IRQ 10
PCI: setting IRQ 10 as level-triggered
-> GSI 10 (level, low) -> IRQ 10
+pci 0000:00:04.0: setting latency timer to 64
enabled at IRQ 11
PCI: setting IRQ 11 as level-triggered
-> GSI 11 (level, low) -> IRQ 11
+pci 0000:00:04.1: setting latency timer to 64
pci_bus 0000:00: resource 0 [io 0x0000-0xffff]
pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffff]
pci_bus 0000:01: resource 1 [mem 0xe8100000-0xe81fffff]
@@ -562,20 +576,6 @@ b43-phy0: Hardware crypto acceleration n
b43-phy0: QoS not supported by firmware
ADDRCONF(NETDEV_UP): wlan0: link is not ready
NET: Registered protocol family 17
-wlan0: authenticate with [redacted MAC address] (try 1)
-wlan0: authenticated
-wlan0: associate with [redacted MAC address] (try 1)
-wlan0: RX AssocResp from [redacted MAC address] (capab=0x411 status=0 aid=1)
-wlan0: associated
-ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
-cfg80211: Calling CRDA for country: DE
-Intel AES-NI instructions are not detected.
-cfg80211: Regulatory domain changed to country: DE
-cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
-cfg80211: (2400000 KHz - 2483500 KHz @ 40000 KHz), (N/A, 2000 mBm)
-cfg80211: (5150000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm)
-cfg80211: (5250000 KHz - 5350000 KHz @ 40000 KHz), (N/A, 2000 mBm)
-cfg80211: (5470000 KHz - 5725000 KHz @ 40000 KHz), (N/A, 2698 mBm)
lp0: using parport0 (interrupt-driven).
lp0: console ready
ppdev: user-space parallel port driver
@@ -589,4 +589,3 @@ mtrr: base(0xf2000000) is not aligned on
agpgart-via 0000:00:00.0: AGP 2.0 bridge
agpgart-via 0000:00:00.0: putting AGP V2 device into 0x mode
pci 0000:01:00.0: putting AGP V2 device into 0x mode
-wlan0: no IPv6 routers present

While I don't get any revealing error messages, once "genirq: Unmask
oneshot irqs when thread was not woken" is applied the wlan card
stops working (no scan results, unable to auth).

I cannot reproduce this problem on a different amd64/ 64 bit system
with a BCM4318 PCI card, which continues to work fine.

Regards
Stefan Lippers-Hollmann


2012-03-06 19:31:51

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

On Tue, 6 Mar 2012, Thomas Gleixner wrote:
> On Tue, 6 Mar 2012, Thomas Gleixner wrote:
>
> > On Mon, 5 Mar 2012, Linus Torvalds wrote:
> >
> > > Thomas, should we just plan on reverting that commit from mainline? It
> > > clearly causes regressions.
> >
> > Give me a day or two to figure out why it breaks stuff. I have no idea
> > why it should wreckage anything.
>
> Hmm. This is interesting. The b43 driver has a primary handler which
> can return IRQ_NONE. So up to that change the interrupt line was kept
> disabled when that happened. Possibly the driver relies on that
> behaviour. Digging for a machine with a b43.

Does not reproduce. Now I was looking at the driver again, it does not
use IRQ_ONESHOT anyway.

So for handle_fasteoi_irq() this patch is actually a NOOP. So the only
affected handler would be handle_level_irq(). Still can't see how it
changes the !IRQ_ONESHOT behaviour :(

Stephan, Sven: Can you please provide the output of /proc/interrupts ?

Thanks,

tglx




2012-03-06 23:38:48

by Stefan Lippers-Hollmann

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

Hi

On Wednesday 07 March 2012, Linus Torvalds wrote:
> On Tue, Mar 6, 2012 at 2:18 PM, Thomas Gleixner <[email protected]> wrote:
> > Subject: genirq: Clear action->thread_mask if IRQ_ONESHOT is not set
>
> Ok, can we get this fixed version tested too by people who saw the problem?
>
> Just to make sure..
[…]

This variant is also working fine on PIII/ coppermine and bcm4306/3.

Tested-by: Stefan Lippers-Hollmann <[email protected]>

Regards
Stefan Lippers-Hollmann

2012-03-06 21:07:33

by Sven Joachim

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

Am 06.03.2012 um 21:26 schrieb Thomas Gleixner:

> On Tue, 6 Mar 2012, Sven Joachim wrote:
>> Am 06.03.2012 um 20:31 schrieb Thomas Gleixner:
>>
>> > Stephan, Sven: Can you please provide the output of /proc/interrupts ?
>>
>> Here is mine, from a freshly booted 3.3-rc6 kernel.
>>
>> Cheers,
>> Sven
>>
>> CPU0
>> 0: 25050 XT-PIC-XT-PIC timer
>> 1: 101 XT-PIC-XT-PIC i8042
>> 2: 0 XT-PIC-XT-PIC cascade
>> 3: 1 XT-PIC-XT-PIC
>> 4: 1 XT-PIC-XT-PIC
>> 5: 0 XT-PIC-XT-PIC ehci_hcd:usb1, uhci_hcd:usb2
>> 6: 1 XT-PIC-XT-PIC i915, uhci_hcd:usb5, yenta
>> 7: 1 XT-PIC-XT-PIC
>> 8: 0 XT-PIC-XT-PIC rtc0
>> 9: 634 XT-PIC-XT-PIC acpi
>> 10: 341 XT-PIC-XT-PIC uhci_hcd:usb4, snd_hda_intel, b43
>
> Ah, XT-PIC uses handle_level_irq(). /me bangs head against desk.
>
> Does the patch below fix the problem for you ?

Yep, it does. Thanks!

> ----------------->
> Subject: genirq: Clear action->thread_mask if IRQ_ONESHOT is not set
>
> commit ac5637611(genirq: Unmask oneshot irqs when thread was not
> woken) fails to unmask when a !IRQ_ONESHOT threaded handler is handled
> by handle_level_irq. This happens because thread_mask is or'ed
> unconditionally in irq_wake_thread(), but for !IRQ_ONESHOT interrupts
> never cleared. So the check for !desc->thread_active fails and keeps
> the interrupt disabled.
>
> Keep the thread_mask zero for !IRQ_ONESHOT interrupts.
>
> Reported-by: Sven Joachim <[email protected]>
> Cc: [email protected]
> Signed-off-by: Thomas Gleixner <[email protected]>
> ---
>
> Index: linux-2.6/kernel/irq/manage.c
> ===================================================================
> --- linux-2.6.orig/kernel/irq/manage.c
> +++ linux-2.6/kernel/irq/manage.c
> @@ -996,11 +996,13 @@ __setup_irq(unsigned int irq, struct irq
> * Setup the thread mask for this irqaction. Unlikely to have
> * 32 resp 64 irqs sharing one line, but who knows.
> */
> - if (new->flags & IRQF_ONESHOT && thread_mask == ~0UL) {
> - ret = -EBUSY;
> - goto out_mask;
> + if (new->flags & IRQF_ONESHOT) {
> + if (thread_mask == ~0UL) {
> + ret = -EBUSY;
> + goto out_mask;
> + }
> + new->thread_mask = new->flags & IRQF_ONESHOT;
> }
> - new->thread_mask = 1 << ffz(thread_mask);
>
> if (!shared) {
> init_waitqueue_head(&desc->wait_for_threads);

2012-03-06 21:40:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

On Tue, Mar 6, 2012 at 12:26 PM, Thomas Gleixner <[email protected]> wrote:
>
> Subject: genirq: Clear action->thread_mask if IRQ_ONESHOT is not set

Umm. Apparently this patch fixes the bug, but the patch itself is just insane.

> - ? ? ? if (new->flags & IRQF_ONESHOT && thread_mask == ~0UL) {
> - ? ? ? ? ? ? ? ret = -EBUSY;
> - ? ? ? ? ? ? ? goto out_mask;
> + ? ? ? if (new->flags & IRQF_ONESHOT) {
> + ? ? ? ? ? ? ? if (thread_mask == ~0UL) {
> + ? ? ? ? ? ? ? ? ? ? ? ret = -EBUSY;
> + ? ? ? ? ? ? ? ? ? ? ? goto out_mask;
> + ? ? ? ? ? ? ? }
> + ? ? ? ? ? ? ? new->thread_mask = new->flags & IRQF_ONESHOT;
> ? ? ? ?}
> - ? ? ? new->thread_mask = 1 << ffz(thread_mask);

WHAT?

You just checked that "new->flags & IRQF_ONESHOT" nonzero, and inside
that if-statement, you then do

new->thread_mask = new->flags & IRQF_ONESHOT;

which is just crazy. Why don't you just do

new->thread_mask = IRQF_ONESHOT;

if that is what you actually meant?

What is that code actually *supposed* to do?

Also, what was the meaning of that old insane line:

new->thread_mask = 1 << ffz(thread_mask);

which you removed? It was crap, I agree, but what was the thinking
behind it? And the reason it was crap is because that's a crazy
expression that could be written better ways (*), and it needs a
comment on what the heck the point of it was..

So stop with these "random code" snippets, and explain what the f*&^
the code is meant to do, AND THEN WRITE THE CODE IN A SANE MANNER
instead of posting these kinds of insane patches.

Because right now it really looks like the "random monkey" approach to
programming.

Linus

(*) "1 << ffz(a)" can be written as

a = ~a; /* Turn the zero bits into 1 bits */
a &= -a; /* .. and find the first one. */

without ever doing any insane bit scanning.

2012-03-06 09:52:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

On Tue, 6 Mar 2012, Thomas Gleixner wrote:

> On Mon, 5 Mar 2012, Linus Torvalds wrote:
>
> > Thomas, should we just plan on reverting that commit from mainline? It
> > clearly causes regressions.
>
> Give me a day or two to figure out why it breaks stuff. I have no idea
> why it should wreckage anything.

Hmm. This is interesting. The b43 driver has a primary handler which
can return IRQ_NONE. So up to that change the interrupt line was kept
disabled when that happened. Possibly the driver relies on that
behaviour. Digging for a machine with a b43.


2012-03-06 20:54:57

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

On Tue, 6 Mar 2012, Thomas Gleixner wrote:
> On Tue, 6 Mar 2012, Sven Joachim wrote:
> > Am 06.03.2012 um 20:31 schrieb Thomas Gleixner:
> >
> > > Stephan, Sven: Can you please provide the output of /proc/interrupts ?
> >
> > Here is mine, from a freshly booted 3.3-rc6 kernel.
> >
> > Cheers,
> > Sven
> >
> > CPU0
> > 0: 25050 XT-PIC-XT-PIC timer
> > 1: 101 XT-PIC-XT-PIC i8042
> > 2: 0 XT-PIC-XT-PIC cascade
> > 3: 1 XT-PIC-XT-PIC
> > 4: 1 XT-PIC-XT-PIC
> > 5: 0 XT-PIC-XT-PIC ehci_hcd:usb1, uhci_hcd:usb2
> > 6: 1 XT-PIC-XT-PIC i915, uhci_hcd:usb5, yenta
> > 7: 1 XT-PIC-XT-PIC
> > 8: 0 XT-PIC-XT-PIC rtc0
> > 9: 634 XT-PIC-XT-PIC acpi
> > 10: 341 XT-PIC-XT-PIC uhci_hcd:usb4, snd_hda_intel, b43
>
> Ah, XT-PIC uses handle_level_irq(). /me bangs head against desk.

Ha, nosmp on the command line lets me reproduce as well and the patch
fixes it. Duh, stupid me.

Thanks,

tglx

2012-03-06 21:09:00

by Stefan Lippers-Hollmann

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

Hi

On Tuesday 06 March 2012, Thomas Gleixner wrote:
> On Tue, 6 Mar 2012, Sven Joachim wrote:
> > Am 06.03.2012 um 20:31 schrieb Thomas Gleixner:
> >
> > > Stephan, Sven: Can you please provide the output of /proc/interrupts ?
> >
> > Here is mine, from a freshly booted 3.3-rc6 kernel.
> >
> > Cheers,
> > Sven
> >
> > CPU0
> > 0: 25050 XT-PIC-XT-PIC timer
[…]
> > 10: 341 XT-PIC-XT-PIC uhci_hcd:usb4, snd_hda_intel, b43
>
> Ah, XT-PIC uses handle_level_irq(). /me bangs head against desk.
>
> Does the patch below fix the problem for you ?
>
> Thanks,
>
> tglx
>
> ----------------->
> Subject: genirq: Clear action->thread_mask if IRQ_ONESHOT is not set
>
> commit ac5637611(genirq: Unmask oneshot irqs when thread was not
> woken) fails to unmask when a !IRQ_ONESHOT threaded handler is handled
> by handle_level_irq. This happens because thread_mask is or'ed
> unconditionally in irq_wake_thread(), but for !IRQ_ONESHOT interrupts
> never cleared. So the check for !desc->thread_active fails and keeps
> the interrupt disabled.
>
> Keep the thread_mask zero for !IRQ_ONESHOT interrupts.
[…]

I can confirm that this patch, applied to 2.6.9, including "genirq:
Unmask oneshot irqs when thread was not woken", and + the current
stable queue-3.2 fixes b43 wlan operations. Feel free to add
Tested-by: Stefan Lippers-Hollmann <[email protected]>
if you like.

Thanks a lot
Stefan Lippers-Hollmann


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part.

2012-03-06 19:45:25

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

On Mon, 5 Mar 2012, Stefan Lippers-Hollmann wrote:
> On Monday 05 March 2012, Sven Joachim wrote:
> > On 2012-02-28 02:05 +0100, Greg KH wrote:
> >
> > > 3.2-stable review patch. If anyone has any objections, please let me know.
> >
> > Alas, the commit below broke the WiFi on my laptop, where
> > "iwlist wlan0 scan" now just reports: " wlan0 No scan results".
> >
> > This is not specific to 3.2, the same problem exists in 3.0.23 and
> > 3.3-rc6 while previous releases worked.
>
> I can confirm this regression on an old PIII based notebook with a
> 32 bit BCM4306/3 PCMCIA card:
>
> 02:00.0 Network controller [0280]: Broadcom Corporation BCM4306 802.11b/g Wireless LAN Controller [14e4:4320] (rev 03)
> Subsystem: Linksys WPC54G v1 / WPC54GS v1 802.11g Wireless-G Notebook Adapter [1737:4320]
> Flags: bus master, fast devsel, latency 64, IRQ 10
> Memory at 20000000 (32-bit, non-prefetchable) [size=8K]
> Capabilities: [40] Power Management version 2
> Kernel driver in use: b43-pci-bridge
>
> I can observe these differences in the dmesg output (3.2.9 + current
> stable queue-3.2):
>
> The kernel configuration is identical in both cases, the only
> difference is "genirq: Unmask oneshot irqs when thread was not woken"
> being applied or not.
>
> --- working <-- 3.2.9 + current stable queue-3.2, with "genirq: Unmask oneshot irqs when thread was not woken" reverted
> +++ broken <-- 3.2.9 + current stable queue-3.2
> @@ -125,6 +125,7 @@ Initializing cgroup subsys freezer
> Initializing cgroup subsys net_cls
> Initializing cgroup subsys blkio
> Initializing cgroup subsys perf_event
> +CPU serial number disabled.

I can't see why reverting that genirq patch would cause this.

> mce: CPU supports 5 MCE banks
> SMP alternatives: switching to UP code
> Freeing SMP alternatives: 8k freed
> @@ -137,9 +138,16 @@ SMP disabled
> Performance Events:
> no APIC, boot with the "lapic" boot parameter to force-enable it.
> no hardware sampling interrupt available.
> -Broken PMU hardware detected, using software events only.
> +p6 PMU driver.
> +... version: 0
> +... bit width: 32
> +... generic registers: 2
> +... value mask: 00000000ffffffff
> +... max period: 000000007fffffff
> +... fixed-purpose events: 0
> +... event mask: 0000000000000003

Ditto.

> Brought up 1 CPUs
> Total of 1 processors activated (1989.75 BogoMIPS).
> devtmpfs: initialized
> PM: Registering ACPI NVS region at eeff000 (4096 bytes)
> print_constraints: dummy:
> @@ -174,12 +182,12 @@ pci 0000:00:00.0: reg 10: [mem 0xec00000
> type 1 class 0x000604
> pci 0000:00:01.0: supports D1
> type 2 class 0x000607
> -pci 0000:00:04.0: reg 10: [mem 0x10000000-0x10000fff]
> +pci 0000:00:04.0: reg 10: [mem 0x00000000-0x00000fff]

Even more.

There is something extremly fishy.

Thanks,

tglx

2012-03-06 22:19:00

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

On Tue, 6 Mar 2012, Linus Torvalds wrote:
> On Tue, Mar 6, 2012 at 12:26 PM, Thomas Gleixner <[email protected]> wrote:
> without ever doing any insane bit scanning.

Duh, yes. That's really insane. I first did

- new->thread_mask = 1 << ffz(thread_mask);
+ new->thread_mask = (new->flags & IRQF_ONESHOT) ? 1 << ffz(thread_mask) : 0;

and then decided to move it into the already existing new->flags &
IRQF_ONESHOT check section and deleted the wrong part ....

/me looks for a huge brown paperbag.

Thanks for catching it !

tglx

-------------->
Subject: genirq: Clear action->thread_mask if IRQ_ONESHOT is not set

commit ac5637611(genirq: Unmask oneshot irqs when thread was not
woken) fails to unmask when a !IRQ_ONESHOT threaded handler is handled
by handle_level_irq. This happens because thread_mask is or'ed
unconditionally in irq_wake_thread(), but for !IRQ_ONESHOT interrupts
never cleared. So the check for !desc->thread_active fails and keeps
the interrupt disabled.

Keep the thread_mask zero for !IRQ_ONESHOT interrupts.

Document the thread_mask magic while at it.

Reported-by: Sven Joachim <[email protected]>
Cc: [email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
---

Index: tip/kernel/irq/manage.c
===================================================================
--- tip.orig/kernel/irq/manage.c
+++ tip/kernel/irq/manage.c
@@ -985,6 +985,11 @@ __setup_irq(unsigned int irq, struct irq

/* add new interrupt at end of irq queue */
do {
+ /*
+ * Or all existing action->thread_mask bits,
+ * so we can find the next zero bit for this
+ * new action.
+ */
thread_mask |= old->thread_mask;
old_ptr = &old->next;
old = *old_ptr;
@@ -993,14 +998,41 @@ __setup_irq(unsigned int irq, struct irq
}

/*
- * Setup the thread mask for this irqaction. Unlikely to have
- * 32 resp 64 irqs sharing one line, but who knows.
+ * Setup the thread mask for this irqaction for ONESHOT. For
+ * !ONESHOT irqs the thread mask is 0 so we can avoid a
+ * conditional in irq_wake_thread().
*/
- if (new->flags & IRQF_ONESHOT && thread_mask == ~0UL) {
- ret = -EBUSY;
- goto out_mask;
+ if (new->flags & IRQF_ONESHOT) {
+ /*
+ * Unlikely to have 32 resp 64 irqs sharing one line,
+ * but who knows.
+ */
+ if (thread_mask == ~0UL) {
+ ret = -EBUSY;
+ goto out_mask;
+ }
+ /*
+ * The thread_mask for the action is or'ed to
+ * desc->thread_active to indicate that the
+ * IRQF_ONESHOT thread handler has been woken, but not
+ * yet finished. The bit is cleared when a thread
+ * completes. When all threads of a shared interrupt
+ * line have completed desc->threads_active becomes
+ * zero and the interrupt line is unmasked. See
+ * handle.c:irq_wake_thread() for further information.
+ *
+ * If no thread is woken by primary (hard irq context)
+ * interrupt handlers, then desc->threads_active is
+ * also checked for zero to unmask the irq line in the
+ * affected hard irq flow handlers
+ * (handle_[fasteoi|level]_irq).
+ *
+ * The new action gets the first zero bit of
+ * thread_mask assigned. See the loop above which or's
+ * all existing action->thread_mask bits.
+ */
+ new->thread_mask = 1 << ffz(thread_mask);
}
- new->thread_mask = 1 << ffz(thread_mask);

if (!shared) {
init_waitqueue_head(&desc->wait_for_threads);

2012-03-06 19:53:16

by Sven Joachim

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

Am 06.03.2012 um 20:31 schrieb Thomas Gleixner:

> Stephan, Sven: Can you please provide the output of /proc/interrupts ?

Here is mine, from a freshly booted 3.3-rc6 kernel.

Cheers,
Sven

CPU0
0: 25050 XT-PIC-XT-PIC timer
1: 101 XT-PIC-XT-PIC i8042
2: 0 XT-PIC-XT-PIC cascade
3: 1 XT-PIC-XT-PIC
4: 1 XT-PIC-XT-PIC
5: 0 XT-PIC-XT-PIC ehci_hcd:usb1, uhci_hcd:usb2
6: 1 XT-PIC-XT-PIC i915, uhci_hcd:usb5, yenta
7: 1 XT-PIC-XT-PIC
8: 0 XT-PIC-XT-PIC rtc0
9: 634 XT-PIC-XT-PIC acpi
10: 341 XT-PIC-XT-PIC uhci_hcd:usb4, snd_hda_intel, b43
11: 0 XT-PIC-XT-PIC uhci_hcd:usb3
12: 144 XT-PIC-XT-PIC i8042
14: 4141 XT-PIC-XT-PIC ata_piix
15: 0 XT-PIC-XT-PIC ata_piix
NMI: 0 Non-maskable interrupts
MCE: 0 Machine check exceptions
MCP: 1 Machine check polls
ERR: 0


2012-03-06 20:26:11

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

On Tue, 6 Mar 2012, Sven Joachim wrote:
> Am 06.03.2012 um 20:31 schrieb Thomas Gleixner:
>
> > Stephan, Sven: Can you please provide the output of /proc/interrupts ?
>
> Here is mine, from a freshly booted 3.3-rc6 kernel.
>
> Cheers,
> Sven
>
> CPU0
> 0: 25050 XT-PIC-XT-PIC timer
> 1: 101 XT-PIC-XT-PIC i8042
> 2: 0 XT-PIC-XT-PIC cascade
> 3: 1 XT-PIC-XT-PIC
> 4: 1 XT-PIC-XT-PIC
> 5: 0 XT-PIC-XT-PIC ehci_hcd:usb1, uhci_hcd:usb2
> 6: 1 XT-PIC-XT-PIC i915, uhci_hcd:usb5, yenta
> 7: 1 XT-PIC-XT-PIC
> 8: 0 XT-PIC-XT-PIC rtc0
> 9: 634 XT-PIC-XT-PIC acpi
> 10: 341 XT-PIC-XT-PIC uhci_hcd:usb4, snd_hda_intel, b43

Ah, XT-PIC uses handle_level_irq(). /me bangs head against desk.

Does the patch below fix the problem for you ?

Thanks,

tglx

----------------->
Subject: genirq: Clear action->thread_mask if IRQ_ONESHOT is not set

commit ac5637611(genirq: Unmask oneshot irqs when thread was not
woken) fails to unmask when a !IRQ_ONESHOT threaded handler is handled
by handle_level_irq. This happens because thread_mask is or'ed
unconditionally in irq_wake_thread(), but for !IRQ_ONESHOT interrupts
never cleared. So the check for !desc->thread_active fails and keeps
the interrupt disabled.

Keep the thread_mask zero for !IRQ_ONESHOT interrupts.

Reported-by: Sven Joachim <[email protected]>
Cc: [email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
---

Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -996,11 +996,13 @@ __setup_irq(unsigned int irq, struct irq
* Setup the thread mask for this irqaction. Unlikely to have
* 32 resp 64 irqs sharing one line, but who knows.
*/
- if (new->flags & IRQF_ONESHOT && thread_mask == ~0UL) {
- ret = -EBUSY;
- goto out_mask;
+ if (new->flags & IRQF_ONESHOT) {
+ if (thread_mask == ~0UL) {
+ ret = -EBUSY;
+ goto out_mask;
+ }
+ new->thread_mask = new->flags & IRQF_ONESHOT;
}
- new->thread_mask = 1 << ffz(thread_mask);

if (!shared) {
init_waitqueue_head(&desc->wait_for_threads);

2012-03-06 21:47:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

On Tue, Mar 6, 2012 at 1:40 PM, Linus Torvalds
<[email protected]> wrote:
>
> (*) "1 << ffz(a)" can be written as
>
> ? a = ~a; ? ? ? ?/* Turn the zero bits into 1 bits */
> ? a &= -a; ? ? ?/* .. and find the first one. */
>
> without ever doing any insane bit scanning.

Alternatively, wite it directly as "(a+1) &~a", which is the same
expression just written differently (due to "-a == ~a+1")

Yeah, I've been playing too much with the bitwise optimizations of the
dentry cache name comparisons lately.

Linus

2012-03-06 20:25:42

by Stefan Lippers-Hollmann

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

Hi

On Tuesday 06 March 2012, Thomas Gleixner wrote:
> On Tue, 6 Mar 2012, Thomas Gleixner wrote:
> > On Tue, 6 Mar 2012, Thomas Gleixner wrote:
> >
> > > On Mon, 5 Mar 2012, Linus Torvalds wrote:
> > >
> > > > Thomas, should we just plan on reverting that commit from mainline? It
> > > > clearly causes regressions.
> > >
> > > Give me a day or two to figure out why it breaks stuff. I have no idea
> > > why it should wreckage anything.
> >
> > Hmm. This is interesting. The b43 driver has a primary handler which
> > can return IRQ_NONE. So up to that change the interrupt line was kept
> > disabled when that happened. Possibly the driver relies on that
> > behaviour. Digging for a machine with a b43.
>
> Does not reproduce. Now I was looking at the driver again, it does not
> use IRQ_ONESHOT anyway.
>
> So for handle_fasteoi_irq() this patch is actually a NOOP. So the only
> affected handler would be handle_level_irq(). Still can't see how it
> changes the !IRQ_ONESHOT behaviour :(
>
> Stephan, Sven: Can you please provide the output of /proc/interrupts ?

Full /proc/interrupts and gzipped dmesg attached for kernel 3.2.9 +
current stable queue-3.2, no difference in the interrupt output between
working and broken state. Both outputs taken immediately after a cold
reboot.

*.broken is with "genirq: Unmask oneshot irqs when thread was not
woken" applied, resulting in empty wlan scan results, no auth).

*.working with only this patch reverted (3 wlan cells visible,
immediate auth.

Regards
Stefan Lippers-Hollmann


Attachments:
(No filename) (1.51 kB)
dmesg.broken.gz (10.98 kB)
dmesg.working.gz (11.24 kB)
interrupts.broken (1.19 kB)
interrupts.working (1.19 kB)
signature.asc (198.00 B)
This is a digitally signed message part.
Download all attachments

2012-03-06 00:34:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

Thomas, should we just plan on reverting that commit from mainline? It
clearly causes regressions.

Linus

On Sun, Mar 4, 2012 at 4:43 PM, Stefan Lippers-Hollmann <[email protected]> wrote:
> Hi
>
> On Monday 05 March 2012, Sven Joachim wrote:
>> On 2012-02-28 02:05 +0100, Greg KH wrote:
>>
>> > 3.2-stable review patch. ?If anyone has any objections, please let me know.
>>
>> Alas, the commit below broke the WiFi on my laptop, where
>> "iwlist wlan0 scan" now just reports: " wlan0 ? ? No scan results".
>>
>> This is not specific to 3.2, the same problem exists in 3.0.23 and
>> 3.3-rc6 while previous releases worked.
>
> I can confirm this regression on an old PIII based notebook with a
> 32 bit BCM4306/3 PCMCIA card:
>
> 02:00.0 Network controller [0280]: Broadcom Corporation BCM4306 802.11b/g Wireless LAN Controller [14e4:4320] (rev 03)
> ? ? ? ?Subsystem: Linksys WPC54G v1 / WPC54GS v1 802.11g Wireless-G Notebook Adapter [1737:4320]
> ? ? ? ?Flags: bus master, fast devsel, latency 64, IRQ 10
> ? ? ? ?Memory at 20000000 (32-bit, non-prefetchable) [size=8K]
> ? ? ? ?Capabilities: [40] Power Management version 2
> ? ? ? ?Kernel driver in use: b43-pci-bridge
>
> I can observe these differences in the dmesg output (3.2.9 + current
> stable queue-3.2):
>
> The kernel configuration is identical in both cases, the only
> difference is "genirq: Unmask oneshot irqs when thread was not woken"
> being applied or not.
>
> --- working ? ? <-- 3.2.9 + current stable queue-3.2, with "genirq: Unmask oneshot irqs when thread was not woken" reverted
> +++ broken ? ? ?<-- 3.2.9 + current stable queue-3.2
> @@ -125,6 +125,7 @@ Initializing cgroup subsys freezer
> ?Initializing cgroup subsys net_cls
> ?Initializing cgroup subsys blkio
> ?Initializing cgroup subsys perf_event
> +CPU serial number disabled.
> ?mce: CPU supports 5 MCE banks
> ?SMP alternatives: switching to UP code
> ?Freeing SMP alternatives: 8k freed
> @@ -137,9 +138,16 @@ SMP disabled
> ?Performance Events:
> ?no APIC, boot with the "lapic" boot parameter to force-enable it.
> ?no hardware sampling interrupt available.
> -Broken PMU hardware detected, using software events only.
> +p6 PMU driver.
> +... version: ? ? ? ? ? ? ? ?0
> +... bit width: ? ? ? ? ? ? ?32
> +... generic registers: ? ? ?2
> +... value mask: ? ? ? ? ? ? 00000000ffffffff
> +... max period: ? ? ? ? ? ? 000000007fffffff
> +... fixed-purpose events: ? 0
> +... event mask: ? ? ? ? ? ? 0000000000000003
> ?Brought up 1 CPUs
> ?Total of 1 processors activated (1989.75 BogoMIPS).
> ?devtmpfs: initialized
> ?PM: Registering ACPI NVS region at eeff000 (4096 bytes)
> ?print_constraints: dummy:
> @@ -174,12 +182,12 @@ pci 0000:00:00.0: reg 10: [mem 0xec00000
> ?type 1 class 0x000604
> ?pci 0000:00:01.0: supports D1
> ?type 2 class 0x000607
> -pci 0000:00:04.0: reg 10: [mem 0x10000000-0x10000fff]
> +pci 0000:00:04.0: reg 10: [mem 0x00000000-0x00000fff]
> ?pci 0000:00:04.0: supports D1 D2
> ?pci 0000:00:04.0: PME# supported from D0 D1 D2 D3hot D3cold
> ?pci 0000:00:04.0: PME# disabled
> ?type 2 class 0x000607
> -pci 0000:00:04.1: reg 10: [mem 0x14000000-0x14000fff]
> +pci 0000:00:04.1: reg 10: [mem 0x00000000-0x00000fff]
> ?pci 0000:00:04.1: supports D1 D2
> ?pci 0000:00:04.1: PME# supported from D0 D1 D2 D3hot D3cold
> ?pci 0000:00:04.1: PME# disabled
> @@ -299,6 +307,10 @@ pnp: PnP ACPI: found 11 devices
> ?ACPI: ACPI bus type pnp unregistered
> ?Switching to clocksource acpi_pm
> ?PCI: max bus depth: 1 pci_try_num: 2
> +pci 0000:00:04.0: BAR 0: assigned [mem 0x10000000-0x10000fff]
> +(PCI address [0x10000000-0x10000fff])
> +pci 0000:00:04.1: BAR 0: assigned [mem 0x14000000-0x14000fff]
> +(PCI address [0x14000000-0x14000fff])
> ?pci 0000:00:11.0: BAR 6: assigned [mem 0x10020000-0x1003ffff pref]
> ?pci 0000:00:04.1: BAR 16: assigned [mem 0x18000000-0x1bffffff]
> ?pci 0000:00:04.1: BAR 15: assigned [mem 0x1c000000-0x1fffffff pref]
> @@ -326,9 +338,11 @@ pci 0000:00:01.0: setting latency timer
> ?enabled at IRQ 10
> ?PCI: setting IRQ 10 as level-triggered
> ?-> GSI 10 (level, low) -> IRQ 10
> +pci 0000:00:04.0: setting latency timer to 64
> ?enabled at IRQ 11
> ?PCI: setting IRQ 11 as level-triggered
> ?-> GSI 11 (level, low) -> IRQ 11
> +pci 0000:00:04.1: setting latency timer to 64
> ?pci_bus 0000:00: resource 0 [io ?0x0000-0xffff]
> ?pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffff]
> ?pci_bus 0000:01: resource 1 [mem 0xe8100000-0xe81fffff]
> @@ -562,20 +576,6 @@ b43-phy0: Hardware crypto acceleration n
> ?b43-phy0: QoS not supported by firmware
> ?ADDRCONF(NETDEV_UP): wlan0: link is not ready
> ?NET: Registered protocol family 17
> -wlan0: authenticate with [redacted MAC address] (try 1)
> -wlan0: authenticated
> -wlan0: associate with [redacted MAC address] (try 1)
> -wlan0: RX AssocResp from [redacted MAC address] (capab=0x411 status=0 aid=1)
> -wlan0: associated
> -ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
> -cfg80211: Calling CRDA for country: DE
> -Intel AES-NI instructions are not detected.
> -cfg80211: Regulatory domain changed to country: DE
> -cfg80211: ? ? (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
> -cfg80211: ? ? (2400000 KHz - 2483500 KHz @ 40000 KHz), (N/A, 2000 mBm)
> -cfg80211: ? ? (5150000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm)
> -cfg80211: ? ? (5250000 KHz - 5350000 KHz @ 40000 KHz), (N/A, 2000 mBm)
> -cfg80211: ? ? (5470000 KHz - 5725000 KHz @ 40000 KHz), (N/A, 2698 mBm)
> ?lp0: using parport0 (interrupt-driven).
> ?lp0: console ready
> ?ppdev: user-space parallel port driver
> @@ -589,4 +589,3 @@ mtrr: base(0xf2000000) is not aligned on
> ?agpgart-via 0000:00:00.0: AGP 2.0 bridge
> ?agpgart-via 0000:00:00.0: putting AGP V2 device into 0x mode
> ?pci 0000:01:00.0: putting AGP V2 device into 0x mode
> -wlan0: no IPv6 routers present
>
> While I don't get any revealing error messages, once "genirq: Unmask
> oneshot irqs when thread was not woken" is applied the wlan card
> stops working (no scan results, unable to auth).
>
> I cannot reproduce this problem on a different amd64/ 64 bit system
> with a BCM4318 PCI card, which continues to work fine.
>
> Regards
> ? ? ? ?Stefan Lippers-Hollmann

2012-03-06 21:11:34

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

Linus,

On Tue, 6 Mar 2012, Sven Joachim wrote:
> Am 06.03.2012 um 21:26 schrieb Thomas Gleixner:
>
> > On Tue, 6 Mar 2012, Sven Joachim wrote:
> >> Am 06.03.2012 um 20:31 schrieb Thomas Gleixner:
> >>
> >> > Stephan, Sven: Can you please provide the output of /proc/interrupts ?
> >>
> >> Here is mine, from a freshly booted 3.3-rc6 kernel.
> >>
> >> Cheers,
> >> Sven
> >>
> >> CPU0
> >> 0: 25050 XT-PIC-XT-PIC timer
> >> 1: 101 XT-PIC-XT-PIC i8042
> >> 2: 0 XT-PIC-XT-PIC cascade
> >> 3: 1 XT-PIC-XT-PIC
> >> 4: 1 XT-PIC-XT-PIC
> >> 5: 0 XT-PIC-XT-PIC ehci_hcd:usb1, uhci_hcd:usb2
> >> 6: 1 XT-PIC-XT-PIC i915, uhci_hcd:usb5, yenta
> >> 7: 1 XT-PIC-XT-PIC
> >> 8: 0 XT-PIC-XT-PIC rtc0
> >> 9: 634 XT-PIC-XT-PIC acpi
> >> 10: 341 XT-PIC-XT-PIC uhci_hcd:usb4, snd_hda_intel, b43
> >
> > Ah, XT-PIC uses handle_level_irq(). /me bangs head against desk.
> >
> > Does the patch below fix the problem for you ?
>
> Yep, it does. Thanks!

Can you pick it up directly or want me to send it via tip ?

Thanks,

tglx

2012-03-06 22:34:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

On Tue, Mar 6, 2012 at 2:18 PM, Thomas Gleixner <[email protected]> wrote:
> Subject: genirq: Clear action->thread_mask if IRQ_ONESHOT is not set

Ok, can we get this fixed version tested too by people who saw the problem?

Just to make sure..

Thanks,

Linus

2012-03-07 05:36:31

by Sven Joachim

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

On 2012-03-06 23:33 +0100, Linus Torvalds wrote:

> On Tue, Mar 6, 2012 at 2:18 PM, Thomas Gleixner <[email protected]> wrote:
>> Subject: genirq: Clear action->thread_mask if IRQ_ONESHOT is not set
>
> Ok, can we get this fixed version tested too by people who saw the problem?

It works for me.

Cheers,
Sven

2012-03-06 08:28:34

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

On Mon, 5 Mar 2012, Linus Torvalds wrote:

> Thomas, should we just plan on reverting that commit from mainline? It
> clearly causes regressions.

Give me a day or two to figure out why it breaks stuff. I have no idea
why it should wreckage anything.

Thanks,

tglx

2012-03-06 20:10:58

by Sven Joachim

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

Am 06.03.2012 um 20:45 schrieb Thomas Gleixner:

> On Mon, 5 Mar 2012, Stefan Lippers-Hollmann wrote:
>> On Monday 05 March 2012, Sven Joachim wrote:
>> > On 2012-02-28 02:05 +0100, Greg KH wrote:
>> >
>> > > 3.2-stable review patch. If anyone has any objections, please let me know.
>> >
>> > Alas, the commit below broke the WiFi on my laptop, where
>> > "iwlist wlan0 scan" now just reports: " wlan0 No scan results".
>> >
>> > This is not specific to 3.2, the same problem exists in 3.0.23 and
>> > 3.3-rc6 while previous releases worked.
>>
>> I can confirm this regression on an old PIII based notebook with a
>> 32 bit BCM4306/3 PCMCIA card:
>>
>> 02:00.0 Network controller [0280]: Broadcom Corporation BCM4306 802.11b/g Wireless LAN Controller [14e4:4320] (rev 03)
>> Subsystem: Linksys WPC54G v1 / WPC54GS v1 802.11g Wireless-G Notebook Adapter [1737:4320]
>> Flags: bus master, fast devsel, latency 64, IRQ 10
>> Memory at 20000000 (32-bit, non-prefetchable) [size=8K]
>> Capabilities: [40] Power Management version 2
>> Kernel driver in use: b43-pci-bridge
>>
>> I can observe these differences in the dmesg output (3.2.9 + current
>> stable queue-3.2):
>>
>> The kernel configuration is identical in both cases, the only
>> difference is "genirq: Unmask oneshot irqs when thread was not woken"
>> being applied or not.
>>
>> --- working <-- 3.2.9 + current stable queue-3.2, with "genirq: Unmask oneshot irqs when thread was not woken" reverted
>> +++ broken <-- 3.2.9 + current stable queue-3.2
>> @@ -125,6 +125,7 @@ Initializing cgroup subsys freezer
>> Initializing cgroup subsys net_cls
>> Initializing cgroup subsys blkio
>> Initializing cgroup subsys perf_event
>> +CPU serial number disabled.
>
> I can't see why reverting that genirq patch would cause this.
>
>> mce: CPU supports 5 MCE banks
>> SMP alternatives: switching to UP code
>> Freeing SMP alternatives: 8k freed
>> @@ -137,9 +138,16 @@ SMP disabled
>> Performance Events:
>> no APIC, boot with the "lapic" boot parameter to force-enable it.
>> no hardware sampling interrupt available.
>> -Broken PMU hardware detected, using software events only.
>> +p6 PMU driver.
>> +... version: 0
>> +... bit width: 32
>> +... generic registers: 2
>> +... value mask: 00000000ffffffff
>> +... max period: 000000007fffffff
>> +... fixed-purpose events: 0
>> +... event mask: 0000000000000003
>
> Ditto.
>
>> Brought up 1 CPUs
>> Total of 1 processors activated (1989.75 BogoMIPS).
>> devtmpfs: initialized
>> PM: Registering ACPI NVS region at eeff000 (4096 bytes)
>> print_constraints: dummy:
>> @@ -174,12 +182,12 @@ pci 0000:00:00.0: reg 10: [mem 0xec00000
>> type 1 class 0x000604
>> pci 0000:00:01.0: supports D1
>> type 2 class 0x000607
>> -pci 0000:00:04.0: reg 10: [mem 0x10000000-0x10000fff]
>> +pci 0000:00:04.0: reg 10: [mem 0x00000000-0x00000fff]
>
> Even more.

FWIW, I don't see any such differences on my system (with 3.3-rc6).

> There is something extremly fishy.

Cheers,
Sven

2012-03-06 21:41:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: [ 57/72] genirq: Unmask oneshot irqs when thread was not woken

On Tue, Mar 6, 2012 at 1:11 PM, Thomas Gleixner <[email protected]> wrote:
>
> Can you pick it up directly or want me to send it via tip ?

I can certainly pick it up directly, but I want the sane version, not
the "random monkeys typing" version. Please.

Linus