Hi
On Monday 05 March 2012, Sven Joachim wrote:
> On 2012-02-28 02:05 +0100, Greg KH wrote:
>
> > 3.2-stable review patch. If anyone has any objections, please let me know.
>
> Alas, the commit below broke the WiFi on my laptop, where
> "iwlist wlan0 scan" now just reports: " wlan0 No scan results".
>
> This is not specific to 3.2, the same problem exists in 3.0.23 and
> 3.3-rc6 while previous releases worked.
I can confirm this regression on an old PIII based notebook with a
32 bit BCM4306/3 PCMCIA card:
02:00.0 Network controller [0280]: Broadcom Corporation BCM4306 802.11b/g Wireless LAN Controller [14e4:4320] (rev 03)
Subsystem: Linksys WPC54G v1 / WPC54GS v1 802.11g Wireless-G Notebook Adapter [1737:4320]
Flags: bus master, fast devsel, latency 64, IRQ 10
Memory at 20000000 (32-bit, non-prefetchable) [size=8K]
Capabilities: [40] Power Management version 2
Kernel driver in use: b43-pci-bridge
I can observe these differences in the dmesg output (3.2.9 + current
stable queue-3.2):
The kernel configuration is identical in both cases, the only
difference is "genirq: Unmask oneshot irqs when thread was not woken"
being applied or not.
--- working <-- 3.2.9 + current stable queue-3.2, with "genirq: Unmask oneshot irqs when thread was not woken" reverted
+++ broken <-- 3.2.9 + current stable queue-3.2
@@ -125,6 +125,7 @@ Initializing cgroup subsys freezer
Initializing cgroup subsys net_cls
Initializing cgroup subsys blkio
Initializing cgroup subsys perf_event
+CPU serial number disabled.
mce: CPU supports 5 MCE banks
SMP alternatives: switching to UP code
Freeing SMP alternatives: 8k freed
@@ -137,9 +138,16 @@ SMP disabled
Performance Events:
no APIC, boot with the "lapic" boot parameter to force-enable it.
no hardware sampling interrupt available.
-Broken PMU hardware detected, using software events only.
+p6 PMU driver.
+... version: 0
+... bit width: 32
+... generic registers: 2
+... value mask: 00000000ffffffff
+... max period: 000000007fffffff
+... fixed-purpose events: 0
+... event mask: 0000000000000003
Brought up 1 CPUs
Total of 1 processors activated (1989.75 BogoMIPS).
devtmpfs: initialized
PM: Registering ACPI NVS region at eeff000 (4096 bytes)
print_constraints: dummy:
@@ -174,12 +182,12 @@ pci 0000:00:00.0: reg 10: [mem 0xec00000
type 1 class 0x000604
pci 0000:00:01.0: supports D1
type 2 class 0x000607
-pci 0000:00:04.0: reg 10: [mem 0x10000000-0x10000fff]
+pci 0000:00:04.0: reg 10: [mem 0x00000000-0x00000fff]
pci 0000:00:04.0: supports D1 D2
pci 0000:00:04.0: PME# supported from D0 D1 D2 D3hot D3cold
pci 0000:00:04.0: PME# disabled
type 2 class 0x000607
-pci 0000:00:04.1: reg 10: [mem 0x14000000-0x14000fff]
+pci 0000:00:04.1: reg 10: [mem 0x00000000-0x00000fff]
pci 0000:00:04.1: supports D1 D2
pci 0000:00:04.1: PME# supported from D0 D1 D2 D3hot D3cold
pci 0000:00:04.1: PME# disabled
@@ -299,6 +307,10 @@ pnp: PnP ACPI: found 11 devices
ACPI: ACPI bus type pnp unregistered
Switching to clocksource acpi_pm
PCI: max bus depth: 1 pci_try_num: 2
+pci 0000:00:04.0: BAR 0: assigned [mem 0x10000000-0x10000fff]
+(PCI address [0x10000000-0x10000fff])
+pci 0000:00:04.1: BAR 0: assigned [mem 0x14000000-0x14000fff]
+(PCI address [0x14000000-0x14000fff])
pci 0000:00:11.0: BAR 6: assigned [mem 0x10020000-0x1003ffff pref]
pci 0000:00:04.1: BAR 16: assigned [mem 0x18000000-0x1bffffff]
pci 0000:00:04.1: BAR 15: assigned [mem 0x1c000000-0x1fffffff pref]
@@ -326,9 +338,11 @@ pci 0000:00:01.0: setting latency timer
enabled at IRQ 10
PCI: setting IRQ 10 as level-triggered
-> GSI 10 (level, low) -> IRQ 10
+pci 0000:00:04.0: setting latency timer to 64
enabled at IRQ 11
PCI: setting IRQ 11 as level-triggered
-> GSI 11 (level, low) -> IRQ 11
+pci 0000:00:04.1: setting latency timer to 64
pci_bus 0000:00: resource 0 [io 0x0000-0xffff]
pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffff]
pci_bus 0000:01: resource 1 [mem 0xe8100000-0xe81fffff]
@@ -562,20 +576,6 @@ b43-phy0: Hardware crypto acceleration n
b43-phy0: QoS not supported by firmware
ADDRCONF(NETDEV_UP): wlan0: link is not ready
NET: Registered protocol family 17
-wlan0: authenticate with [redacted MAC address] (try 1)
-wlan0: authenticated
-wlan0: associate with [redacted MAC address] (try 1)
-wlan0: RX AssocResp from [redacted MAC address] (capab=0x411 status=0 aid=1)
-wlan0: associated
-ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
-cfg80211: Calling CRDA for country: DE
-Intel AES-NI instructions are not detected.
-cfg80211: Regulatory domain changed to country: DE
-cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
-cfg80211: (2400000 KHz - 2483500 KHz @ 40000 KHz), (N/A, 2000 mBm)
-cfg80211: (5150000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm)
-cfg80211: (5250000 KHz - 5350000 KHz @ 40000 KHz), (N/A, 2000 mBm)
-cfg80211: (5470000 KHz - 5725000 KHz @ 40000 KHz), (N/A, 2698 mBm)
lp0: using parport0 (interrupt-driven).
lp0: console ready
ppdev: user-space parallel port driver
@@ -589,4 +589,3 @@ mtrr: base(0xf2000000) is not aligned on
agpgart-via 0000:00:00.0: AGP 2.0 bridge
agpgart-via 0000:00:00.0: putting AGP V2 device into 0x mode
pci 0000:01:00.0: putting AGP V2 device into 0x mode
-wlan0: no IPv6 routers present
While I don't get any revealing error messages, once "genirq: Unmask
oneshot irqs when thread was not woken" is applied the wlan card
stops working (no scan results, unable to auth).
I cannot reproduce this problem on a different amd64/ 64 bit system
with a BCM4318 PCI card, which continues to work fine.
Regards
Stefan Lippers-Hollmann
On Tue, 6 Mar 2012, Thomas Gleixner wrote:
> On Tue, 6 Mar 2012, Thomas Gleixner wrote:
>
> > On Mon, 5 Mar 2012, Linus Torvalds wrote:
> >
> > > Thomas, should we just plan on reverting that commit from mainline? It
> > > clearly causes regressions.
> >
> > Give me a day or two to figure out why it breaks stuff. I have no idea
> > why it should wreckage anything.
>
> Hmm. This is interesting. The b43 driver has a primary handler which
> can return IRQ_NONE. So up to that change the interrupt line was kept
> disabled when that happened. Possibly the driver relies on that
> behaviour. Digging for a machine with a b43.
Does not reproduce. Now I was looking at the driver again, it does not
use IRQ_ONESHOT anyway.
So for handle_fasteoi_irq() this patch is actually a NOOP. So the only
affected handler would be handle_level_irq(). Still can't see how it
changes the !IRQ_ONESHOT behaviour :(
Stephan, Sven: Can you please provide the output of /proc/interrupts ?
Thanks,
tglx
Hi
On Wednesday 07 March 2012, Linus Torvalds wrote:
> On Tue, Mar 6, 2012 at 2:18 PM, Thomas Gleixner <[email protected]> wrote:
> > Subject: genirq: Clear action->thread_mask if IRQ_ONESHOT is not set
>
> Ok, can we get this fixed version tested too by people who saw the problem?
>
> Just to make sure..
[…]
This variant is also working fine on PIII/ coppermine and bcm4306/3.
Tested-by: Stefan Lippers-Hollmann <[email protected]>
Regards
Stefan Lippers-Hollmann
Am 06.03.2012 um 21:26 schrieb Thomas Gleixner:
> On Tue, 6 Mar 2012, Sven Joachim wrote:
>> Am 06.03.2012 um 20:31 schrieb Thomas Gleixner:
>>
>> > Stephan, Sven: Can you please provide the output of /proc/interrupts ?
>>
>> Here is mine, from a freshly booted 3.3-rc6 kernel.
>>
>> Cheers,
>> Sven
>>
>> CPU0
>> 0: 25050 XT-PIC-XT-PIC timer
>> 1: 101 XT-PIC-XT-PIC i8042
>> 2: 0 XT-PIC-XT-PIC cascade
>> 3: 1 XT-PIC-XT-PIC
>> 4: 1 XT-PIC-XT-PIC
>> 5: 0 XT-PIC-XT-PIC ehci_hcd:usb1, uhci_hcd:usb2
>> 6: 1 XT-PIC-XT-PIC i915, uhci_hcd:usb5, yenta
>> 7: 1 XT-PIC-XT-PIC
>> 8: 0 XT-PIC-XT-PIC rtc0
>> 9: 634 XT-PIC-XT-PIC acpi
>> 10: 341 XT-PIC-XT-PIC uhci_hcd:usb4, snd_hda_intel, b43
>
> Ah, XT-PIC uses handle_level_irq(). /me bangs head against desk.
>
> Does the patch below fix the problem for you ?
Yep, it does. Thanks!
> ----------------->
> Subject: genirq: Clear action->thread_mask if IRQ_ONESHOT is not set
>
> commit ac5637611(genirq: Unmask oneshot irqs when thread was not
> woken) fails to unmask when a !IRQ_ONESHOT threaded handler is handled
> by handle_level_irq. This happens because thread_mask is or'ed
> unconditionally in irq_wake_thread(), but for !IRQ_ONESHOT interrupts
> never cleared. So the check for !desc->thread_active fails and keeps
> the interrupt disabled.
>
> Keep the thread_mask zero for !IRQ_ONESHOT interrupts.
>
> Reported-by: Sven Joachim <[email protected]>
> Cc: [email protected]
> Signed-off-by: Thomas Gleixner <[email protected]>
> ---
>
> Index: linux-2.6/kernel/irq/manage.c
> ===================================================================
> --- linux-2.6.orig/kernel/irq/manage.c
> +++ linux-2.6/kernel/irq/manage.c
> @@ -996,11 +996,13 @@ __setup_irq(unsigned int irq, struct irq
> * Setup the thread mask for this irqaction. Unlikely to have
> * 32 resp 64 irqs sharing one line, but who knows.
> */
> - if (new->flags & IRQF_ONESHOT && thread_mask == ~0UL) {
> - ret = -EBUSY;
> - goto out_mask;
> + if (new->flags & IRQF_ONESHOT) {
> + if (thread_mask == ~0UL) {
> + ret = -EBUSY;
> + goto out_mask;
> + }
> + new->thread_mask = new->flags & IRQF_ONESHOT;
> }
> - new->thread_mask = 1 << ffz(thread_mask);
>
> if (!shared) {
> init_waitqueue_head(&desc->wait_for_threads);
On Tue, Mar 6, 2012 at 12:26 PM, Thomas Gleixner <[email protected]> wrote:
>
> Subject: genirq: Clear action->thread_mask if IRQ_ONESHOT is not set
Umm. Apparently this patch fixes the bug, but the patch itself is just insane.
> - ? ? ? if (new->flags & IRQF_ONESHOT && thread_mask == ~0UL) {
> - ? ? ? ? ? ? ? ret = -EBUSY;
> - ? ? ? ? ? ? ? goto out_mask;
> + ? ? ? if (new->flags & IRQF_ONESHOT) {
> + ? ? ? ? ? ? ? if (thread_mask == ~0UL) {
> + ? ? ? ? ? ? ? ? ? ? ? ret = -EBUSY;
> + ? ? ? ? ? ? ? ? ? ? ? goto out_mask;
> + ? ? ? ? ? ? ? }
> + ? ? ? ? ? ? ? new->thread_mask = new->flags & IRQF_ONESHOT;
> ? ? ? ?}
> - ? ? ? new->thread_mask = 1 << ffz(thread_mask);
WHAT?
You just checked that "new->flags & IRQF_ONESHOT" nonzero, and inside
that if-statement, you then do
new->thread_mask = new->flags & IRQF_ONESHOT;
which is just crazy. Why don't you just do
new->thread_mask = IRQF_ONESHOT;
if that is what you actually meant?
What is that code actually *supposed* to do?
Also, what was the meaning of that old insane line:
new->thread_mask = 1 << ffz(thread_mask);
which you removed? It was crap, I agree, but what was the thinking
behind it? And the reason it was crap is because that's a crazy
expression that could be written better ways (*), and it needs a
comment on what the heck the point of it was..
So stop with these "random code" snippets, and explain what the f*&^
the code is meant to do, AND THEN WRITE THE CODE IN A SANE MANNER
instead of posting these kinds of insane patches.
Because right now it really looks like the "random monkey" approach to
programming.
Linus
(*) "1 << ffz(a)" can be written as
a = ~a; /* Turn the zero bits into 1 bits */
a &= -a; /* .. and find the first one. */
without ever doing any insane bit scanning.
On Tue, 6 Mar 2012, Thomas Gleixner wrote:
> On Mon, 5 Mar 2012, Linus Torvalds wrote:
>
> > Thomas, should we just plan on reverting that commit from mainline? It
> > clearly causes regressions.
>
> Give me a day or two to figure out why it breaks stuff. I have no idea
> why it should wreckage anything.
Hmm. This is interesting. The b43 driver has a primary handler which
can return IRQ_NONE. So up to that change the interrupt line was kept
disabled when that happened. Possibly the driver relies on that
behaviour. Digging for a machine with a b43.
On Tue, 6 Mar 2012, Thomas Gleixner wrote:
> On Tue, 6 Mar 2012, Sven Joachim wrote:
> > Am 06.03.2012 um 20:31 schrieb Thomas Gleixner:
> >
> > > Stephan, Sven: Can you please provide the output of /proc/interrupts ?
> >
> > Here is mine, from a freshly booted 3.3-rc6 kernel.
> >
> > Cheers,
> > Sven
> >
> > CPU0
> > 0: 25050 XT-PIC-XT-PIC timer
> > 1: 101 XT-PIC-XT-PIC i8042
> > 2: 0 XT-PIC-XT-PIC cascade
> > 3: 1 XT-PIC-XT-PIC
> > 4: 1 XT-PIC-XT-PIC
> > 5: 0 XT-PIC-XT-PIC ehci_hcd:usb1, uhci_hcd:usb2
> > 6: 1 XT-PIC-XT-PIC i915, uhci_hcd:usb5, yenta
> > 7: 1 XT-PIC-XT-PIC
> > 8: 0 XT-PIC-XT-PIC rtc0
> > 9: 634 XT-PIC-XT-PIC acpi
> > 10: 341 XT-PIC-XT-PIC uhci_hcd:usb4, snd_hda_intel, b43
>
> Ah, XT-PIC uses handle_level_irq(). /me bangs head against desk.
Ha, nosmp on the command line lets me reproduce as well and the patch
fixes it. Duh, stupid me.
Thanks,
tglx
Hi
On Tuesday 06 March 2012, Thomas Gleixner wrote:
> On Tue, 6 Mar 2012, Sven Joachim wrote:
> > Am 06.03.2012 um 20:31 schrieb Thomas Gleixner:
> >
> > > Stephan, Sven: Can you please provide the output of /proc/interrupts ?
> >
> > Here is mine, from a freshly booted 3.3-rc6 kernel.
> >
> > Cheers,
> > Sven
> >
> > CPU0
> > 0: 25050 XT-PIC-XT-PIC timer
[…]
> > 10: 341 XT-PIC-XT-PIC uhci_hcd:usb4, snd_hda_intel, b43
>
> Ah, XT-PIC uses handle_level_irq(). /me bangs head against desk.
>
> Does the patch below fix the problem for you ?
>
> Thanks,
>
> tglx
>
> ----------------->
> Subject: genirq: Clear action->thread_mask if IRQ_ONESHOT is not set
>
> commit ac5637611(genirq: Unmask oneshot irqs when thread was not
> woken) fails to unmask when a !IRQ_ONESHOT threaded handler is handled
> by handle_level_irq. This happens because thread_mask is or'ed
> unconditionally in irq_wake_thread(), but for !IRQ_ONESHOT interrupts
> never cleared. So the check for !desc->thread_active fails and keeps
> the interrupt disabled.
>
> Keep the thread_mask zero for !IRQ_ONESHOT interrupts.
[…]
I can confirm that this patch, applied to 2.6.9, including "genirq:
Unmask oneshot irqs when thread was not woken", and + the current
stable queue-3.2 fixes b43 wlan operations. Feel free to add
Tested-by: Stefan Lippers-Hollmann <[email protected]>
if you like.
Thanks a lot
Stefan Lippers-Hollmann
On Mon, 5 Mar 2012, Stefan Lippers-Hollmann wrote:
> On Monday 05 March 2012, Sven Joachim wrote:
> > On 2012-02-28 02:05 +0100, Greg KH wrote:
> >
> > > 3.2-stable review patch. If anyone has any objections, please let me know.
> >
> > Alas, the commit below broke the WiFi on my laptop, where
> > "iwlist wlan0 scan" now just reports: " wlan0 No scan results".
> >
> > This is not specific to 3.2, the same problem exists in 3.0.23 and
> > 3.3-rc6 while previous releases worked.
>
> I can confirm this regression on an old PIII based notebook with a
> 32 bit BCM4306/3 PCMCIA card:
>
> 02:00.0 Network controller [0280]: Broadcom Corporation BCM4306 802.11b/g Wireless LAN Controller [14e4:4320] (rev 03)
> Subsystem: Linksys WPC54G v1 / WPC54GS v1 802.11g Wireless-G Notebook Adapter [1737:4320]
> Flags: bus master, fast devsel, latency 64, IRQ 10
> Memory at 20000000 (32-bit, non-prefetchable) [size=8K]
> Capabilities: [40] Power Management version 2
> Kernel driver in use: b43-pci-bridge
>
> I can observe these differences in the dmesg output (3.2.9 + current
> stable queue-3.2):
>
> The kernel configuration is identical in both cases, the only
> difference is "genirq: Unmask oneshot irqs when thread was not woken"
> being applied or not.
>
> --- working <-- 3.2.9 + current stable queue-3.2, with "genirq: Unmask oneshot irqs when thread was not woken" reverted
> +++ broken <-- 3.2.9 + current stable queue-3.2
> @@ -125,6 +125,7 @@ Initializing cgroup subsys freezer
> Initializing cgroup subsys net_cls
> Initializing cgroup subsys blkio
> Initializing cgroup subsys perf_event
> +CPU serial number disabled.
I can't see why reverting that genirq patch would cause this.
> mce: CPU supports 5 MCE banks
> SMP alternatives: switching to UP code
> Freeing SMP alternatives: 8k freed
> @@ -137,9 +138,16 @@ SMP disabled
> Performance Events:
> no APIC, boot with the "lapic" boot parameter to force-enable it.
> no hardware sampling interrupt available.
> -Broken PMU hardware detected, using software events only.
> +p6 PMU driver.
> +... version: 0
> +... bit width: 32
> +... generic registers: 2
> +... value mask: 00000000ffffffff
> +... max period: 000000007fffffff
> +... fixed-purpose events: 0
> +... event mask: 0000000000000003
Ditto.
> Brought up 1 CPUs
> Total of 1 processors activated (1989.75 BogoMIPS).
> devtmpfs: initialized
> PM: Registering ACPI NVS region at eeff000 (4096 bytes)
> print_constraints: dummy:
> @@ -174,12 +182,12 @@ pci 0000:00:00.0: reg 10: [mem 0xec00000
> type 1 class 0x000604
> pci 0000:00:01.0: supports D1
> type 2 class 0x000607
> -pci 0000:00:04.0: reg 10: [mem 0x10000000-0x10000fff]
> +pci 0000:00:04.0: reg 10: [mem 0x00000000-0x00000fff]
Even more.
There is something extremly fishy.
Thanks,
tglx
On Tue, 6 Mar 2012, Linus Torvalds wrote:
> On Tue, Mar 6, 2012 at 12:26 PM, Thomas Gleixner <[email protected]> wrote:
> without ever doing any insane bit scanning.
Duh, yes. That's really insane. I first did
- new->thread_mask = 1 << ffz(thread_mask);
+ new->thread_mask = (new->flags & IRQF_ONESHOT) ? 1 << ffz(thread_mask) : 0;
and then decided to move it into the already existing new->flags &
IRQF_ONESHOT check section and deleted the wrong part ....
/me looks for a huge brown paperbag.
Thanks for catching it !
tglx
-------------->
Subject: genirq: Clear action->thread_mask if IRQ_ONESHOT is not set
commit ac5637611(genirq: Unmask oneshot irqs when thread was not
woken) fails to unmask when a !IRQ_ONESHOT threaded handler is handled
by handle_level_irq. This happens because thread_mask is or'ed
unconditionally in irq_wake_thread(), but for !IRQ_ONESHOT interrupts
never cleared. So the check for !desc->thread_active fails and keeps
the interrupt disabled.
Keep the thread_mask zero for !IRQ_ONESHOT interrupts.
Document the thread_mask magic while at it.
Reported-by: Sven Joachim <[email protected]>
Cc: [email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
---
Index: tip/kernel/irq/manage.c
===================================================================
--- tip.orig/kernel/irq/manage.c
+++ tip/kernel/irq/manage.c
@@ -985,6 +985,11 @@ __setup_irq(unsigned int irq, struct irq
/* add new interrupt at end of irq queue */
do {
+ /*
+ * Or all existing action->thread_mask bits,
+ * so we can find the next zero bit for this
+ * new action.
+ */
thread_mask |= old->thread_mask;
old_ptr = &old->next;
old = *old_ptr;
@@ -993,14 +998,41 @@ __setup_irq(unsigned int irq, struct irq
}
/*
- * Setup the thread mask for this irqaction. Unlikely to have
- * 32 resp 64 irqs sharing one line, but who knows.
+ * Setup the thread mask for this irqaction for ONESHOT. For
+ * !ONESHOT irqs the thread mask is 0 so we can avoid a
+ * conditional in irq_wake_thread().
*/
- if (new->flags & IRQF_ONESHOT && thread_mask == ~0UL) {
- ret = -EBUSY;
- goto out_mask;
+ if (new->flags & IRQF_ONESHOT) {
+ /*
+ * Unlikely to have 32 resp 64 irqs sharing one line,
+ * but who knows.
+ */
+ if (thread_mask == ~0UL) {
+ ret = -EBUSY;
+ goto out_mask;
+ }
+ /*
+ * The thread_mask for the action is or'ed to
+ * desc->thread_active to indicate that the
+ * IRQF_ONESHOT thread handler has been woken, but not
+ * yet finished. The bit is cleared when a thread
+ * completes. When all threads of a shared interrupt
+ * line have completed desc->threads_active becomes
+ * zero and the interrupt line is unmasked. See
+ * handle.c:irq_wake_thread() for further information.
+ *
+ * If no thread is woken by primary (hard irq context)
+ * interrupt handlers, then desc->threads_active is
+ * also checked for zero to unmask the irq line in the
+ * affected hard irq flow handlers
+ * (handle_[fasteoi|level]_irq).
+ *
+ * The new action gets the first zero bit of
+ * thread_mask assigned. See the loop above which or's
+ * all existing action->thread_mask bits.
+ */
+ new->thread_mask = 1 << ffz(thread_mask);
}
- new->thread_mask = 1 << ffz(thread_mask);
if (!shared) {
init_waitqueue_head(&desc->wait_for_threads);
Am 06.03.2012 um 20:31 schrieb Thomas Gleixner:
> Stephan, Sven: Can you please provide the output of /proc/interrupts ?
Here is mine, from a freshly booted 3.3-rc6 kernel.
Cheers,
Sven
CPU0
0: 25050 XT-PIC-XT-PIC timer
1: 101 XT-PIC-XT-PIC i8042
2: 0 XT-PIC-XT-PIC cascade
3: 1 XT-PIC-XT-PIC
4: 1 XT-PIC-XT-PIC
5: 0 XT-PIC-XT-PIC ehci_hcd:usb1, uhci_hcd:usb2
6: 1 XT-PIC-XT-PIC i915, uhci_hcd:usb5, yenta
7: 1 XT-PIC-XT-PIC
8: 0 XT-PIC-XT-PIC rtc0
9: 634 XT-PIC-XT-PIC acpi
10: 341 XT-PIC-XT-PIC uhci_hcd:usb4, snd_hda_intel, b43
11: 0 XT-PIC-XT-PIC uhci_hcd:usb3
12: 144 XT-PIC-XT-PIC i8042
14: 4141 XT-PIC-XT-PIC ata_piix
15: 0 XT-PIC-XT-PIC ata_piix
NMI: 0 Non-maskable interrupts
MCE: 0 Machine check exceptions
MCP: 1 Machine check polls
ERR: 0
On Tue, 6 Mar 2012, Sven Joachim wrote:
> Am 06.03.2012 um 20:31 schrieb Thomas Gleixner:
>
> > Stephan, Sven: Can you please provide the output of /proc/interrupts ?
>
> Here is mine, from a freshly booted 3.3-rc6 kernel.
>
> Cheers,
> Sven
>
> CPU0
> 0: 25050 XT-PIC-XT-PIC timer
> 1: 101 XT-PIC-XT-PIC i8042
> 2: 0 XT-PIC-XT-PIC cascade
> 3: 1 XT-PIC-XT-PIC
> 4: 1 XT-PIC-XT-PIC
> 5: 0 XT-PIC-XT-PIC ehci_hcd:usb1, uhci_hcd:usb2
> 6: 1 XT-PIC-XT-PIC i915, uhci_hcd:usb5, yenta
> 7: 1 XT-PIC-XT-PIC
> 8: 0 XT-PIC-XT-PIC rtc0
> 9: 634 XT-PIC-XT-PIC acpi
> 10: 341 XT-PIC-XT-PIC uhci_hcd:usb4, snd_hda_intel, b43
Ah, XT-PIC uses handle_level_irq(). /me bangs head against desk.
Does the patch below fix the problem for you ?
Thanks,
tglx
----------------->
Subject: genirq: Clear action->thread_mask if IRQ_ONESHOT is not set
commit ac5637611(genirq: Unmask oneshot irqs when thread was not
woken) fails to unmask when a !IRQ_ONESHOT threaded handler is handled
by handle_level_irq. This happens because thread_mask is or'ed
unconditionally in irq_wake_thread(), but for !IRQ_ONESHOT interrupts
never cleared. So the check for !desc->thread_active fails and keeps
the interrupt disabled.
Keep the thread_mask zero for !IRQ_ONESHOT interrupts.
Reported-by: Sven Joachim <[email protected]>
Cc: [email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
---
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -996,11 +996,13 @@ __setup_irq(unsigned int irq, struct irq
* Setup the thread mask for this irqaction. Unlikely to have
* 32 resp 64 irqs sharing one line, but who knows.
*/
- if (new->flags & IRQF_ONESHOT && thread_mask == ~0UL) {
- ret = -EBUSY;
- goto out_mask;
+ if (new->flags & IRQF_ONESHOT) {
+ if (thread_mask == ~0UL) {
+ ret = -EBUSY;
+ goto out_mask;
+ }
+ new->thread_mask = new->flags & IRQF_ONESHOT;
}
- new->thread_mask = 1 << ffz(thread_mask);
if (!shared) {
init_waitqueue_head(&desc->wait_for_threads);
On Tue, Mar 6, 2012 at 1:40 PM, Linus Torvalds
<[email protected]> wrote:
>
> (*) "1 << ffz(a)" can be written as
>
> ? a = ~a; ? ? ? ?/* Turn the zero bits into 1 bits */
> ? a &= -a; ? ? ?/* .. and find the first one. */
>
> without ever doing any insane bit scanning.
Alternatively, wite it directly as "(a+1) &~a", which is the same
expression just written differently (due to "-a == ~a+1")
Yeah, I've been playing too much with the bitwise optimizations of the
dentry cache name comparisons lately.
Linus
Hi
On Tuesday 06 March 2012, Thomas Gleixner wrote:
> On Tue, 6 Mar 2012, Thomas Gleixner wrote:
> > On Tue, 6 Mar 2012, Thomas Gleixner wrote:
> >
> > > On Mon, 5 Mar 2012, Linus Torvalds wrote:
> > >
> > > > Thomas, should we just plan on reverting that commit from mainline? It
> > > > clearly causes regressions.
> > >
> > > Give me a day or two to figure out why it breaks stuff. I have no idea
> > > why it should wreckage anything.
> >
> > Hmm. This is interesting. The b43 driver has a primary handler which
> > can return IRQ_NONE. So up to that change the interrupt line was kept
> > disabled when that happened. Possibly the driver relies on that
> > behaviour. Digging for a machine with a b43.
>
> Does not reproduce. Now I was looking at the driver again, it does not
> use IRQ_ONESHOT anyway.
>
> So for handle_fasteoi_irq() this patch is actually a NOOP. So the only
> affected handler would be handle_level_irq(). Still can't see how it
> changes the !IRQ_ONESHOT behaviour :(
>
> Stephan, Sven: Can you please provide the output of /proc/interrupts ?
Full /proc/interrupts and gzipped dmesg attached for kernel 3.2.9 +
current stable queue-3.2, no difference in the interrupt output between
working and broken state. Both outputs taken immediately after a cold
reboot.
*.broken is with "genirq: Unmask oneshot irqs when thread was not
woken" applied, resulting in empty wlan scan results, no auth).
*.working with only this patch reverted (3 wlan cells visible,
immediate auth.
Regards
Stefan Lippers-Hollmann
Thomas, should we just plan on reverting that commit from mainline? It
clearly causes regressions.
Linus
On Sun, Mar 4, 2012 at 4:43 PM, Stefan Lippers-Hollmann <[email protected]> wrote:
> Hi
>
> On Monday 05 March 2012, Sven Joachim wrote:
>> On 2012-02-28 02:05 +0100, Greg KH wrote:
>>
>> > 3.2-stable review patch. ?If anyone has any objections, please let me know.
>>
>> Alas, the commit below broke the WiFi on my laptop, where
>> "iwlist wlan0 scan" now just reports: " wlan0 ? ? No scan results".
>>
>> This is not specific to 3.2, the same problem exists in 3.0.23 and
>> 3.3-rc6 while previous releases worked.
>
> I can confirm this regression on an old PIII based notebook with a
> 32 bit BCM4306/3 PCMCIA card:
>
> 02:00.0 Network controller [0280]: Broadcom Corporation BCM4306 802.11b/g Wireless LAN Controller [14e4:4320] (rev 03)
> ? ? ? ?Subsystem: Linksys WPC54G v1 / WPC54GS v1 802.11g Wireless-G Notebook Adapter [1737:4320]
> ? ? ? ?Flags: bus master, fast devsel, latency 64, IRQ 10
> ? ? ? ?Memory at 20000000 (32-bit, non-prefetchable) [size=8K]
> ? ? ? ?Capabilities: [40] Power Management version 2
> ? ? ? ?Kernel driver in use: b43-pci-bridge
>
> I can observe these differences in the dmesg output (3.2.9 + current
> stable queue-3.2):
>
> The kernel configuration is identical in both cases, the only
> difference is "genirq: Unmask oneshot irqs when thread was not woken"
> being applied or not.
>
> --- working ? ? <-- 3.2.9 + current stable queue-3.2, with "genirq: Unmask oneshot irqs when thread was not woken" reverted
> +++ broken ? ? ?<-- 3.2.9 + current stable queue-3.2
> @@ -125,6 +125,7 @@ Initializing cgroup subsys freezer
> ?Initializing cgroup subsys net_cls
> ?Initializing cgroup subsys blkio
> ?Initializing cgroup subsys perf_event
> +CPU serial number disabled.
> ?mce: CPU supports 5 MCE banks
> ?SMP alternatives: switching to UP code
> ?Freeing SMP alternatives: 8k freed
> @@ -137,9 +138,16 @@ SMP disabled
> ?Performance Events:
> ?no APIC, boot with the "lapic" boot parameter to force-enable it.
> ?no hardware sampling interrupt available.
> -Broken PMU hardware detected, using software events only.
> +p6 PMU driver.
> +... version: ? ? ? ? ? ? ? ?0
> +... bit width: ? ? ? ? ? ? ?32
> +... generic registers: ? ? ?2
> +... value mask: ? ? ? ? ? ? 00000000ffffffff
> +... max period: ? ? ? ? ? ? 000000007fffffff
> +... fixed-purpose events: ? 0
> +... event mask: ? ? ? ? ? ? 0000000000000003
> ?Brought up 1 CPUs
> ?Total of 1 processors activated (1989.75 BogoMIPS).
> ?devtmpfs: initialized
> ?PM: Registering ACPI NVS region at eeff000 (4096 bytes)
> ?print_constraints: dummy:
> @@ -174,12 +182,12 @@ pci 0000:00:00.0: reg 10: [mem 0xec00000
> ?type 1 class 0x000604
> ?pci 0000:00:01.0: supports D1
> ?type 2 class 0x000607
> -pci 0000:00:04.0: reg 10: [mem 0x10000000-0x10000fff]
> +pci 0000:00:04.0: reg 10: [mem 0x00000000-0x00000fff]
> ?pci 0000:00:04.0: supports D1 D2
> ?pci 0000:00:04.0: PME# supported from D0 D1 D2 D3hot D3cold
> ?pci 0000:00:04.0: PME# disabled
> ?type 2 class 0x000607
> -pci 0000:00:04.1: reg 10: [mem 0x14000000-0x14000fff]
> +pci 0000:00:04.1: reg 10: [mem 0x00000000-0x00000fff]
> ?pci 0000:00:04.1: supports D1 D2
> ?pci 0000:00:04.1: PME# supported from D0 D1 D2 D3hot D3cold
> ?pci 0000:00:04.1: PME# disabled
> @@ -299,6 +307,10 @@ pnp: PnP ACPI: found 11 devices
> ?ACPI: ACPI bus type pnp unregistered
> ?Switching to clocksource acpi_pm
> ?PCI: max bus depth: 1 pci_try_num: 2
> +pci 0000:00:04.0: BAR 0: assigned [mem 0x10000000-0x10000fff]
> +(PCI address [0x10000000-0x10000fff])
> +pci 0000:00:04.1: BAR 0: assigned [mem 0x14000000-0x14000fff]
> +(PCI address [0x14000000-0x14000fff])
> ?pci 0000:00:11.0: BAR 6: assigned [mem 0x10020000-0x1003ffff pref]
> ?pci 0000:00:04.1: BAR 16: assigned [mem 0x18000000-0x1bffffff]
> ?pci 0000:00:04.1: BAR 15: assigned [mem 0x1c000000-0x1fffffff pref]
> @@ -326,9 +338,11 @@ pci 0000:00:01.0: setting latency timer
> ?enabled at IRQ 10
> ?PCI: setting IRQ 10 as level-triggered
> ?-> GSI 10 (level, low) -> IRQ 10
> +pci 0000:00:04.0: setting latency timer to 64
> ?enabled at IRQ 11
> ?PCI: setting IRQ 11 as level-triggered
> ?-> GSI 11 (level, low) -> IRQ 11
> +pci 0000:00:04.1: setting latency timer to 64
> ?pci_bus 0000:00: resource 0 [io ?0x0000-0xffff]
> ?pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffff]
> ?pci_bus 0000:01: resource 1 [mem 0xe8100000-0xe81fffff]
> @@ -562,20 +576,6 @@ b43-phy0: Hardware crypto acceleration n
> ?b43-phy0: QoS not supported by firmware
> ?ADDRCONF(NETDEV_UP): wlan0: link is not ready
> ?NET: Registered protocol family 17
> -wlan0: authenticate with [redacted MAC address] (try 1)
> -wlan0: authenticated
> -wlan0: associate with [redacted MAC address] (try 1)
> -wlan0: RX AssocResp from [redacted MAC address] (capab=0x411 status=0 aid=1)
> -wlan0: associated
> -ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
> -cfg80211: Calling CRDA for country: DE
> -Intel AES-NI instructions are not detected.
> -cfg80211: Regulatory domain changed to country: DE
> -cfg80211: ? ? (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
> -cfg80211: ? ? (2400000 KHz - 2483500 KHz @ 40000 KHz), (N/A, 2000 mBm)
> -cfg80211: ? ? (5150000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm)
> -cfg80211: ? ? (5250000 KHz - 5350000 KHz @ 40000 KHz), (N/A, 2000 mBm)
> -cfg80211: ? ? (5470000 KHz - 5725000 KHz @ 40000 KHz), (N/A, 2698 mBm)
> ?lp0: using parport0 (interrupt-driven).
> ?lp0: console ready
> ?ppdev: user-space parallel port driver
> @@ -589,4 +589,3 @@ mtrr: base(0xf2000000) is not aligned on
> ?agpgart-via 0000:00:00.0: AGP 2.0 bridge
> ?agpgart-via 0000:00:00.0: putting AGP V2 device into 0x mode
> ?pci 0000:01:00.0: putting AGP V2 device into 0x mode
> -wlan0: no IPv6 routers present
>
> While I don't get any revealing error messages, once "genirq: Unmask
> oneshot irqs when thread was not woken" is applied the wlan card
> stops working (no scan results, unable to auth).
>
> I cannot reproduce this problem on a different amd64/ 64 bit system
> with a BCM4318 PCI card, which continues to work fine.
>
> Regards
> ? ? ? ?Stefan Lippers-Hollmann
Linus,
On Tue, 6 Mar 2012, Sven Joachim wrote:
> Am 06.03.2012 um 21:26 schrieb Thomas Gleixner:
>
> > On Tue, 6 Mar 2012, Sven Joachim wrote:
> >> Am 06.03.2012 um 20:31 schrieb Thomas Gleixner:
> >>
> >> > Stephan, Sven: Can you please provide the output of /proc/interrupts ?
> >>
> >> Here is mine, from a freshly booted 3.3-rc6 kernel.
> >>
> >> Cheers,
> >> Sven
> >>
> >> CPU0
> >> 0: 25050 XT-PIC-XT-PIC timer
> >> 1: 101 XT-PIC-XT-PIC i8042
> >> 2: 0 XT-PIC-XT-PIC cascade
> >> 3: 1 XT-PIC-XT-PIC
> >> 4: 1 XT-PIC-XT-PIC
> >> 5: 0 XT-PIC-XT-PIC ehci_hcd:usb1, uhci_hcd:usb2
> >> 6: 1 XT-PIC-XT-PIC i915, uhci_hcd:usb5, yenta
> >> 7: 1 XT-PIC-XT-PIC
> >> 8: 0 XT-PIC-XT-PIC rtc0
> >> 9: 634 XT-PIC-XT-PIC acpi
> >> 10: 341 XT-PIC-XT-PIC uhci_hcd:usb4, snd_hda_intel, b43
> >
> > Ah, XT-PIC uses handle_level_irq(). /me bangs head against desk.
> >
> > Does the patch below fix the problem for you ?
>
> Yep, it does. Thanks!
Can you pick it up directly or want me to send it via tip ?
Thanks,
tglx
On Tue, Mar 6, 2012 at 2:18 PM, Thomas Gleixner <[email protected]> wrote:
> Subject: genirq: Clear action->thread_mask if IRQ_ONESHOT is not set
Ok, can we get this fixed version tested too by people who saw the problem?
Just to make sure..
Thanks,
Linus
On 2012-03-06 23:33 +0100, Linus Torvalds wrote:
> On Tue, Mar 6, 2012 at 2:18 PM, Thomas Gleixner <[email protected]> wrote:
>> Subject: genirq: Clear action->thread_mask if IRQ_ONESHOT is not set
>
> Ok, can we get this fixed version tested too by people who saw the problem?
It works for me.
Cheers,
Sven
On Mon, 5 Mar 2012, Linus Torvalds wrote:
> Thomas, should we just plan on reverting that commit from mainline? It
> clearly causes regressions.
Give me a day or two to figure out why it breaks stuff. I have no idea
why it should wreckage anything.
Thanks,
tglx
Am 06.03.2012 um 20:45 schrieb Thomas Gleixner:
> On Mon, 5 Mar 2012, Stefan Lippers-Hollmann wrote:
>> On Monday 05 March 2012, Sven Joachim wrote:
>> > On 2012-02-28 02:05 +0100, Greg KH wrote:
>> >
>> > > 3.2-stable review patch. If anyone has any objections, please let me know.
>> >
>> > Alas, the commit below broke the WiFi on my laptop, where
>> > "iwlist wlan0 scan" now just reports: " wlan0 No scan results".
>> >
>> > This is not specific to 3.2, the same problem exists in 3.0.23 and
>> > 3.3-rc6 while previous releases worked.
>>
>> I can confirm this regression on an old PIII based notebook with a
>> 32 bit BCM4306/3 PCMCIA card:
>>
>> 02:00.0 Network controller [0280]: Broadcom Corporation BCM4306 802.11b/g Wireless LAN Controller [14e4:4320] (rev 03)
>> Subsystem: Linksys WPC54G v1 / WPC54GS v1 802.11g Wireless-G Notebook Adapter [1737:4320]
>> Flags: bus master, fast devsel, latency 64, IRQ 10
>> Memory at 20000000 (32-bit, non-prefetchable) [size=8K]
>> Capabilities: [40] Power Management version 2
>> Kernel driver in use: b43-pci-bridge
>>
>> I can observe these differences in the dmesg output (3.2.9 + current
>> stable queue-3.2):
>>
>> The kernel configuration is identical in both cases, the only
>> difference is "genirq: Unmask oneshot irqs when thread was not woken"
>> being applied or not.
>>
>> --- working <-- 3.2.9 + current stable queue-3.2, with "genirq: Unmask oneshot irqs when thread was not woken" reverted
>> +++ broken <-- 3.2.9 + current stable queue-3.2
>> @@ -125,6 +125,7 @@ Initializing cgroup subsys freezer
>> Initializing cgroup subsys net_cls
>> Initializing cgroup subsys blkio
>> Initializing cgroup subsys perf_event
>> +CPU serial number disabled.
>
> I can't see why reverting that genirq patch would cause this.
>
>> mce: CPU supports 5 MCE banks
>> SMP alternatives: switching to UP code
>> Freeing SMP alternatives: 8k freed
>> @@ -137,9 +138,16 @@ SMP disabled
>> Performance Events:
>> no APIC, boot with the "lapic" boot parameter to force-enable it.
>> no hardware sampling interrupt available.
>> -Broken PMU hardware detected, using software events only.
>> +p6 PMU driver.
>> +... version: 0
>> +... bit width: 32
>> +... generic registers: 2
>> +... value mask: 00000000ffffffff
>> +... max period: 000000007fffffff
>> +... fixed-purpose events: 0
>> +... event mask: 0000000000000003
>
> Ditto.
>
>> Brought up 1 CPUs
>> Total of 1 processors activated (1989.75 BogoMIPS).
>> devtmpfs: initialized
>> PM: Registering ACPI NVS region at eeff000 (4096 bytes)
>> print_constraints: dummy:
>> @@ -174,12 +182,12 @@ pci 0000:00:00.0: reg 10: [mem 0xec00000
>> type 1 class 0x000604
>> pci 0000:00:01.0: supports D1
>> type 2 class 0x000607
>> -pci 0000:00:04.0: reg 10: [mem 0x10000000-0x10000fff]
>> +pci 0000:00:04.0: reg 10: [mem 0x00000000-0x00000fff]
>
> Even more.
FWIW, I don't see any such differences on my system (with 3.3-rc6).
> There is something extremly fishy.
Cheers,
Sven
On Tue, Mar 6, 2012 at 1:11 PM, Thomas Gleixner <[email protected]> wrote:
>
> Can you pick it up directly or want me to send it via tip ?
I can certainly pick it up directly, but I want the sane version, not
the "random monkeys typing" version. Please.
Linus