LinuxLists.cc - BCM4312 Fails when xdm is started

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

2008-11-25 11:06:20

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

On Tuesday 25 November 2008 06:43:22 Yuval Hager wrote:
> However, I have some few interesting findings.
> First, this is totally unrelated to b43, but to the PCI. I get the flawed 1's
> read from lspci even without loading b43.
>
> I played around with different video drivers and the results are:
> * If using the 'via' driver, I lose the PCIe card immediately upon
> initialization
> * Using the 'openchrome' (trunk version), It works well in the beginning.
> After first blanking the register reads are all 1's, and then when the screen
> is blank I get a different read (some registers are correct, some are wrong),
> and when the screen is unblanked, I get 0xff's again. Very consistent and
> predictabe (same read every time).
> * Using the 'vesa' driver I could not recreate the problem. I could not get
> the screen to blank for some reason, but closing the lid, going on
> standby/hibernate, restarting X - all didn't matter much to the PCI and the
> wireless card kept on working.

Ok, then you should report the stuff to the X guys. This is not a b43 problem
and I also don't think it's a kernel problem.

--
Greetings Michael.

2008-11-23 12:20:58

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

On Sunday 23 November 2008 12:49:55 Yuval Hager wrote:
> [ 182.891400] ****** b43: B43_MMIO_MACCTL 0x840A0503
> [ 182.891409] ****** b43: SSB_TMSLOW 0x20150000
> [ 258.299027] irq 10: nobody cared (try booting with the "irqpoll" option)

Does the kernel disable the PCI device, if it ignores the IRQ?

> [ 258.299038] Pid: 0, comm: swapper Not tainted 2.6.28-rc5 #15
> [ 258.299043] Call Trace:
> [ 258.299062] [<c0148d9a>] __report_bad_irq+0x24/0x69
> [ 258.299071] [<c0148da1>] __report_bad_irq+0x2b/0x69
> [ 258.299080] [<c0148ec8>] note_interrupt+0xe9/0x12d
> [ 258.299090] [<c014976d>] handle_level_irq+0x87/0xba
> [ 258.299101] [<c010564e>] do_IRQ+0x89/0x9f
> [ 258.299109] [<c0103ea8>] common_interrupt+0x28/0x30
> [ 258.299119] [<c0125406>] do_softirq+0x37/0x4d
> [ 258.299127] [<c0125301>] __do_softirq+0x62/0x130
> [ 258.299135] [<c0125406>] do_softirq+0x37/0x4d
> [ 258.299142] [<c0105653>] do_IRQ+0x8e/0x9f
> [ 258.299150] [<c0103ea8>] common_interrupt+0x28/0x30
> [ 258.299161] [<c0108682>] default_idle+0x2f/0x4c
> [ 258.299168] [<c0101a20>] cpu_idle+0x63/0x77
> [ 258.299173] handlers:
> [ 258.299176] [<f7906455>] (b43_interrupt_handler+0x0/0x1b7 [b43])
> [ 258.299212] Disabling IRQ #10
> [ 258.315148] b43-phy0: Radio hardware status changed to DISABLED
> [ 258.315160] b43-phy0: ******** B43_B43_MMIO_RADIO_HWENABLED_HI 0xFFFFFFFF
> [ 258.342341] kobject: 'rfkill0' (f43b7d78): kobject_uevent_env
> [ 258.342367] kobject: 'rfkill0' (f43b7d78): fill_kobj_path: path = '/class/rfkill/rfkill0'
> [ 258.342418] kobject: 'ssb0:0' (f40dfcd8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:02.0/0000:02:00.0/ssb0:0'
> [ 258.391951]
> [ 258.391956] =================================
> [ 258.391964] [ INFO: inconsistent lock state ]
> [ 258.391971] 2.6.28-rc5 #15
> [ 258.391975] ---------------------------------
> [ 258.391980] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
> [ 258.391987] X/3965 [HC0[0]:SC1[1]:HE1:SE0] takes:
> [ 258.391993] (&irq_desc_lock_class){++..}, at: [<c0148c60>] try_one_irq+0x15/0xe8
> [ 258.392016] {in-hardirq-W} state was registered at:
> [ 258.392021] [<c013bc07>] __lock_acquire+0x490/0x6bc
> [ 258.392034] [<c013be8d>] lock_acquire+0x5a/0x74
> [ 258.392043] [<c01496f8>] handle_level_irq+0x12/0xba
> [ 258.392053] [<c03c4842>] _spin_lock+0x1c/0x45
> [ 258.392066] [<c01496f8>] handle_level_irq+0x12/0xba
> [ 258.392076] [<c01496f8>] handle_level_irq+0x12/0xba
> [ 258.392085] [<c010564e>] do_IRQ+0x89/0x9f
> [ 258.392096] [<c0103ea8>] common_interrupt+0x28/0x30
> [ 258.392105] [<c03c4cc2>] _spin_unlock_irqrestore+0x37/0x39
> [ 258.392115] [<c01487e6>] __setup_irq+0x17a/0x1f3
> [ 258.392124] [<c05ce79d>] start_kernel+0x285/0x2f1
> [ 258.392140] [<ffffffff>] 0xffffffff
> [ 258.392159] irq event stamp: 1844456
> [ 258.392164] hardirqs last enabled at (1844456): [<c03c4b6f>] _spin_unlock_irq+0x20/0x23
> [ 258.392175] hardirqs last disabled at (1844455): [<c03c4ac3>] _spin_lock_irq+0xa/0x4b
> [ 258.392186] softirqs last enabled at (1844310): [<c0125406>] do_softirq+0x37/0x4d
> [ 258.392198] softirqs last disabled at (1844447): [<c0125406>] do_softirq+0x37/0x4d

That's a bit weird. Looks like another bug in the IRQ layer.

> [ 258.392208]
> [ 258.392209] other info that might help us debug this:
> [ 258.392215] no locks held by X/3965.
> [ 258.392219]
> [ 258.392220] stack backtrace:
> [ 258.392226] Pid: 3965, comm: X Not tainted 2.6.28-rc5 #15
> [ 258.392231] Call Trace:
> [ 258.392241] [<c0139175>] print_usage_bug+0x13d/0x146
> [ 258.392249] [<c013a2ff>] mark_lock+0x4b1/0x7c7
> [ 258.392257] [<c013bc7e>] __lock_acquire+0x507/0x6bc
> [ 258.392266] [<c013be8d>] lock_acquire+0x5a/0x74
> [ 258.392275] [<c0148c60>] try_one_irq+0x15/0xe8
> [ 258.392283] [<c03c4842>] _spin_lock+0x1c/0x45
> [ 258.392291] [<c0148c60>] try_one_irq+0x15/0xe8
> [ 258.392300] [<c0148c60>] try_one_irq+0x15/0xe8
> [ 258.392308] [<c03c4b6f>] _spin_unlock_irq+0x20/0x23
> [ 258.392317] [<c0148d33>] poll_spurious_irqs+0x0/0x43
> [ 258.392326] [<c0148d55>] poll_spurious_irqs+0x22/0x43
> [ 258.392338] [<c012874a>] run_timer_softirq+0x101/0x156
> [ 258.392346] [<c0125321>] __do_softirq+0x82/0x130
> [ 258.392354] [<c0125406>] do_softirq+0x37/0x4d
> [ 258.392362] [<c0105653>] do_IRQ+0x8e/0x9f
> [ 258.392370] [<c0103ea8>] common_interrupt+0x28/0x30

--
Greetings Michael.

2008-11-23 17:43:17

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

On Sunday 23 November 2008 16:42:28 Larry Finger wrote:
> Michael Buesch wrote:
> > On Sunday 23 November 2008 12:49:55 Yuval Hager wrote:
> >> [ 182.891400] ****** b43: B43_MMIO_MACCTL 0x840A0503
> >> [ 182.891409] ****** b43: SSB_TMSLOW 0x20150000
> >> [ 258.299027] irq 10: nobody cared (try booting with the "irqpoll" option)
> >
> >
> > Does the kernel disable the PCI device, if it ignores the IRQ?
>
> According to /proc/interrupts that Yuval posted earlier, IRQ 10 is not used.

Can you try booting with kernel parameters "noapic" and "noacpi"
and reproduce?

--
Greetings Michael.

2008-11-22 06:43:13

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

On Friday 21 November 2008, Larry Finger wrote:
> Yuval,
>
> Michael Buesch wrote:
> > Can you dump PCI config space and SSB registers (TMSLOW, maybe others,
> > too). It looks like a random bus write disabled the device.
>
> Please incorporate the following patch and run your system. In addition,
> run the following command when the wireless is working and after it fails:
>
> sudo lspci -d 14e4:4312 -x
>

When the wireless is working:
$ lspci -d 14e4:4312 -x
02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev 02)
00: e4 14 12 43 06 01 10 00 02 00 80 02 08 00 00 00
10: 04 c0 ff fd 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 71 13
30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00

After it fails:
$ lspci -d 14e4:4312 -x
02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev 02)
00: e4 14 12 43 00 00 10 00 02 00 80 02 00 00 00 00
10: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 71 13
30: 00 00 00 00 40 00 00 00 00 00 00 00 00 01 00 00

> Post the results of the above commands and any entries in /var/log/messages
> that dump registers. They should all be prefaced with ****
>

Sorry for the long time to reply, it took a while to recreate the problem.
According to the logs, it happened exactly when I checked the machine in the
morning.
At the beginning, the register dumps look like this:

[ 57.279984] ****** b43: B43_MMIO_MACCTL 0x840A0503
[ 57.279992] ****** b43: SSB_TMSLOW 0x20150000
(these line repeat exactly the same. Skipping)

[31723.961262] ****** b43: B43_MMIO_MACCTL 0x840A0503
[31723.961275] ****** b43: SSB_TMSLOW 0x20150000
[31732.959490] b43-phy0: Radio hardware status changed to DISABLED
[31732.959505] b43-phy0: ******** B43_B43_MMIO_RADIO_HWENABLED_HI 0xFFFFFFFF
[31738.130551] wlan0: No ProbeResp from current AP 00:22:3f:18:89:5e - assume
out of range
[31783.855931] ------------[ cut here ]------------
[31783.855944] WARNING: at drivers/net/wireless/b43/phy_common.c:135
b43_radio_lock+0x29/0x7e [b43]()
[31783.855955] Modules linked in: via drm rfkill_input hci_usb b43 led_class
input_polldev rtc snd_hda_intel snd_pcm snd_tim
er snd_page_alloc snd_hwdep snd soundcore ehci_hcd uhci_hcd usbcore sg ssb
video output via_agp agpgart
[31783.856023] Pid: 1220, comm: b43 Not tainted 2.6.28-rc5 #13
[31783.856032] Call Trace:
[31783.856055] [<c011f4e9>] warn_on_slowpath+0x40/0x59
[31783.856102] [<f7d93da3>] b43_gphy_op_write+0x25/0x29 [b43]
[31783.856143] [<f7d90735>] b43_calc_nrssi_slope+0x103e/0x105a [b43]
[31783.856180] [<f7cba429>] ssb_pci_write32+0x15/0x3f [ssb]
[31783.856209] [<f7cba545>] ssb_pci_read16+0x31/0x3f [ssb]
[31783.856244] [<f7d8854c>] __b43_shm_read16+0x79/0x81 [b43]
[31783.856272] [<f7cba545>] ssb_pci_read16+0x31/0x3f [ssb]
[31783.856306] [<f7d8854c>] __b43_shm_read16+0x79/0x81 [b43]
[31783.856334] [<f7cba4eb>] ssb_pci_read32+0x12/0x3b [ssb]
[31783.856370] [<f7d8dc77>] b43_radio_lock+0x29/0x7e [b43]
[31783.856408] [<f7d91e92>] b43_gphy_op_adjust_txpower+0x111/0x138 [b43]
[31783.856446] [<f7d8da06>] b43_phy_txpower_adjust_work+0x0/0x39 [b43]
[31783.856483] [<f7d8da36>] b43_phy_txpower_adjust_work+0x30/0x39 [b43]
[31783.856500] [<c012b2a4>] run_workqueue+0x6a/0xdf
[31783.856515] [<c012b9ba>] worker_thread+0x0/0xbd
[31783.856527] [<c012ba6d>] worker_thread+0xb3/0xbd
[31783.856545] [<c012dc8c>] autoremove_wake_function+0x0/0x2d
[31783.856560] [<c012dbc9>] kthread+0x38/0x5f
[31783.856573] [<c012db91>] kthread+0x0/0x5f
[31783.856588] [<c01040a7>] kernel_thread_helper+0x7/0x10
[31783.856598] ---[ end trace 7548c7ede66fa0d3 ]---
[31783.856607] ****** b43: B43_MMIO_MACCTL 0xFFFFFFFF
[31783.856616] ****** b43: SSB_TMSLOW 0xFFFFFFFF

And from now on, all reads are ones. I have the full logs if you need.

Cheers,

--yuval

Attachments:

(No filename) (3.76 kB)
signature.asc (197.00 B)
This is a digitally signed message part. Download all attachments

2008-11-22 15:32:13

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

Michael Buesch wrote:

> Somebody disabled MMIO and busmastering.
> And somebody cleared the CACHE_LINE_SIZE register.

Are these all the read/write bits in the configuration area? Should I conclude
that someone zeroed this area?

In case the kernel memory diagnostics don't help, is there any way to trap
writes to the configuration registers?

Larry

2008-11-24 16:11:54

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

Michael Buesch wrote:
> On Monday 24 November 2008 09:49:38 Yuval Hager wrote:
>> * Now check this out - the output of lspci -d 14e4:4312 -x
>> 02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev ff)
>> 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>
>> (I double checked this)
>>
>> huh?
>
> Hah, interesting. I think your hardware may be faulty, in fact.
> To me it really seems like the mainboard has power failures on the PCI bus.
>
> This is a laptop, so you can't pull random hardware? Can you run some
> hardware burn-in tests like mprime (http://mersenne.org/freesoft/) or memtest?
> If that doesn't help, can you try with another operating system?
>

I also think you are seeing a hardware failure. Another test to try is
http://freshmeat.net/projects/cpuburn/?topic_id=146, which will exercise the system.

Larry

2008-11-21 17:42:37

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

On Friday 21 November 2008 17:25:22 Larry Finger wrote:
> A problem was recently posted to the bcm43xx mailing list that I am unable to
> solve. The machine in question is an HP Mini 2133 (HP product number FU346EA)
> with a BCM4312 PCIe wireless card. This card is known to work with the b43
> driver (I have one.) and it does work on this machine - at least initially.
>
> A problem occurs when xdm/kde is started. Suddenly a read operation on device
> hardware returns all ones as though the register does not exist, or if it were
> suddenly mismapped. If the OP doesn't try to run xdm, the same problem will
> eventually occur, it just takes longer.

Can you dump PCI config space and SSB registers (TMSLOW, maybe others, too).
It looks like a random bus write disabled the device.

> [ 0.000000] Zone PFN ranges:
> [ 0.000000] DMA 0x00000000 -> 0x00001000
> [ 0.000000] Normal 0x00001000 -> 0x000373fe
> [ 0.000000] HighMem 0x000373fe -> 0x0006feb0
>
> On my 64-bit HP machine, I see:
>
> Zone PFN ranges:
> DMA 0x00000000 -> 0x00001000
> DMA32 0x00001000 -> 0x00100000
> Normal 0x00100000 -> 0x00100000
>
> Is it "normal" for there not to be a DMA32 range with a 32-bit version of Linux?

Yeah, I think so.

--
Greetings Michael.

2008-11-22 15:13:33

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

On Saturday 22 November 2008 07:39:24 Yuval Hager wrote:
> On Friday 21 November 2008, Larry Finger wrote:
> > Yuval,
> >
> > Michael Buesch wrote:
> > > Can you dump PCI config space and SSB registers (TMSLOW, maybe others,
> > > too). It looks like a random bus write disabled the device.
> >
> > Please incorporate the following patch and run your system. In addition,
> > run the following command when the wireless is working and after it fails:
> >
> > sudo lspci -d 14e4:4312 -x
> >
>
> When the wireless is working:
> $ lspci -d 14e4:4312 -x
> 02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev 02)
> 00: e4 14 12 43 06 01 10 00 02 00 80 02 08 00 00 00
> 10: 04 c0 ff fd 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 71 13
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00
>
> After it fails:
> $ lspci -d 14e4:4312 -x
> 02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev 02)
> 00: e4 14 12 43 00 00 10 00 02 00 80 02 00 00 00 00
^^ ^^
Somebody disabled MMIO and busmastering.
And somebody cleared the CACHE_LINE_SIZE register.

> 10: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 71 13
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 00 01 00 00

--
Greetings Michael.

2008-11-21 18:28:11

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

Yuval,

Michael Buesch wrote:
>
> Can you dump PCI config space and SSB registers (TMSLOW, maybe others, too).
> It looks like a random bus write disabled the device.

Please incorporate the following patch and run your system. In addition, run the
following command when the wireless is working and after it fails:

sudo lspci -d 14e4:4312 -x

Post the results of the above commands and any entries in /var/log/messages that
dump registers. They should all be prefaced with ****

Index: linux-2.6/drivers/net/wireless/b43/phy_common.c
===================================================================
--- linux-2.6.orig/drivers/net/wireless/b43/phy_common.c
+++ linux-2.6/drivers/net/wireless/b43/phy_common.c
@@ -133,6 +133,11 @@ void b43_radio_lock(struct b43_wldev *de

macctl = b43_read32(dev, B43_MMIO_MACCTL);
B43_WARN_ON(macctl & B43_MACCTL_RADIOLOCK);
+ if (macctl & B43_MACCTL_RADIOLOCK) {
+ printk(KERN_INFO "****** b43: B43_MMIO_MACCTL 0x%X\n", macctl);
+ printk(KERN_INFO "****** b43: SSB_TMSLOW 0x%X\n",
+ ssb_read32(dev->dev, SSB_TMSLOW));
+ }
macctl |= B43_MACCTL_RADIOLOCK;
b43_write32(dev, B43_MMIO_MACCTL, macctl);
/* Commit the write and wait for the device
@@ -150,6 +155,11 @@ void b43_radio_unlock(struct b43_wldev *
/* unlock */
macctl = b43_read32(dev, B43_MMIO_MACCTL);
B43_WARN_ON(!(macctl & B43_MACCTL_RADIOLOCK));
+ if (macctl & B43_MACCTL_RADIOLOCK) {
+ printk(KERN_INFO "****** b43: B43_MMIO_MACCTL 0x%X\n", macctl);
+ printk(KERN_INFO "****** b43: SSB_TMSLOW 0x%X\n",
+ ssb_read32(dev->dev, SSB_TMSLOW));
+ }
macctl &= ~B43_MACCTL_RADIOLOCK;
b43_write32(dev, B43_MMIO_MACCTL, macctl);
}
Index: linux-2.6/drivers/net/wireless/b43/rfkill.c
===================================================================
--- linux-2.6.orig/drivers/net/wireless/b43/rfkill.c
+++ linux-2.6/drivers/net/wireless/b43/rfkill.c
@@ -63,6 +63,8 @@ static void b43_rfkill_poll(struct input
report_change = 1;
b43info(wl, "Radio hardware status changed to %s\n",
enabled ? "ENABLED" : "DISABLED");
+ b43info(wl, "******** B43_B43_MMIO_RADIO_HWENABLED_HI 0x%X\n",
+ b43_read32(dev, B43_MMIO_RADIO_HWENABLED_HI));
}
mutex_unlock(&wl->mutex);

2008-11-25 07:18:24

by Peter Stuge

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

Yuval Hager wrote:
> I played around with different video drivers and the results are:
> * If using the 'via' driver, I lose the PCIe card immediately upon
> initialization
> * Using the 'openchrome' (trunk version), It works well in the
> beginning.
> After first blanking the register reads are all 1's, and then
> when the screen is blank I get a different read (some registers
> are correct, some are wrong), and when the screen is unblanked, I
> get 0xff's again. Very consistent and predictabe (same read every
> time).
> * Using the 'vesa' driver I could not recreate the problem. I could
> not get the screen to blank for some reason, but closing the lid,
> going on standby/hibernate, restarting X - all didn't matter much
> to the PCI and the wireless card kept on working.

Good work! You have beyond any doubt established that the X graphics
driver can cause this problem.

Were you using a kernel framebuffer driver when you saw the problem
also without running X at all? If not, the cause of trouble then is
still unidentified.

//Peter

Attachments:

(No filename) (1.05 kB)
(No filename) (189.00 B)
Download all attachments

2008-11-23 20:53:14

by Peter Stuge

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

Michael Buesch wrote:
> On Sunday 23 November 2008 12:49:55 Yuval Hager wrote:
> > [ 182.891400] ****** b43: B43_MMIO_MACCTL 0x840A0503
> > [ 182.891409] ****** b43: SSB_TMSLOW 0x20150000
> > [ 258.299027] irq 10: nobody cared (try booting with the "irqpoll" option)
>
> Does the kernel disable the PCI device, if it ignores the IRQ?

The kernel disables the IRQ at least internally, maybe also by
deconfiguring the interrupt register in any devices using it, which
would explain the change in config register 0x3c (but not the changes
in all the other bytes, could that be a freak chain reaction inside
the hardware?) but I haven't heard/seen the kernel disable the PCI
device itself. I don't know if it can.

Why doesn't b43 care about this interrupt? Without APIC interrupt 10
is what both device and driver should be using (according to earlier
lspci -x output).

> > [ 258.299173] handlers:
> > [ 258.299176] [<f7906455>] (b43_interrupt_handler+0x0/0x1b7 [b43])
> > [ 258.299212] Disabling IRQ #10
> > [ 258.315148] b43-phy0: Radio hardware status changed to DISABLED
> > [ 258.315160] b43-phy0: ******** B43_B43_MMIO_RADIO_HWENABLED_HI 0xFFFFFFFF
> > [ 258.342341] kobject: 'rfkill0' (f43b7d78): kobject_uevent_env
> > [ 258.342367] kobject: 'rfkill0' (f43b7d78): fill_kobj_path: path = '/class/rfkill/rfkill0'
> > [ 258.342418] kobject: 'ssb0:0' (f40dfcd8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:02.0/0000:02:00.0/ssb0:0'

Why does the radio hw status changes here?
How is the change notified to the driver?

> > [ 258.391951]
> > [ 258.391956] =================================
> > [ 258.391964] [ INFO: inconsistent lock state ]
> > [ 258.391971] 2.6.28-rc5 #15
> > [ 258.391975] ---------------------------------
> > [ 258.391980] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
> > [ 258.391987] X/3965 [HC0[0]:SC1[1]:HE1:SE0] takes:
> > [ 258.391993] (&irq_desc_lock_class){++..}, at: [<c0148c60>] try_one_irq+0x15/0xe8
> > [ 258.392016] {in-hardirq-W} state was registered at:
> > [ 258.392021] [<c013bc07>] __lock_acquire+0x490/0x6bc
> > [ 258.392034] [<c013be8d>] lock_acquire+0x5a/0x74
> > [ 258.392043] [<c01496f8>] handle_level_irq+0x12/0xba
> > [ 258.392053] [<c03c4842>] _spin_lock+0x1c/0x45
> > [ 258.392066] [<c01496f8>] handle_level_irq+0x12/0xba
> > [ 258.392076] [<c01496f8>] handle_level_irq+0x12/0xba
> > [ 258.392085] [<c010564e>] do_IRQ+0x89/0x9f
> > [ 258.392096] [<c0103ea8>] common_interrupt+0x28/0x30
> > [ 258.392105] [<c03c4cc2>] _spin_unlock_irqrestore+0x37/0x39
> > [ 258.392115] [<c01487e6>] __setup_irq+0x17a/0x1f3
> > [ 258.392124] [<c05ce79d>] start_kernel+0x285/0x2f1
> > [ 258.392140] [<ffffffff>] 0xffffffff
> > [ 258.392159] irq event stamp: 1844456
> > [ 258.392164] hardirqs last enabled at (1844456): [<c03c4b6f>] _spin_unlock_irq+0x20/0x23
> > [ 258.392175] hardirqs last disabled at (1844455): [<c03c4ac3>] _spin_lock_irq+0xa/0x4b
> > [ 258.392186] softirqs last enabled at (1844310): [<c0125406>] do_softirq+0x37/0x4d
> > [ 258.392198] softirqs last disabled at (1844447): [<c0125406>] do_softirq+0x37/0x4d
>
>
> That's a bit weird. Looks like another bug in the IRQ layer.

Something happens with the hardware that confuses the kernel. It's
triggered by software but I don't know where.. Like Michael, I'm
not too convinced that it is in b43. :\

//Peter

2008-11-23 21:09:42

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

Peter Stuge wrote:
> Michael Buesch wrote:
>> On Sunday 23 November 2008 12:49:55 Yuval Hager wrote:
>>> [ 182.891400] ****** b43: B43_MMIO_MACCTL 0x840A0503
>>> [ 182.891409] ****** b43: SSB_TMSLOW 0x20150000
>>> [ 258.299027] irq 10: nobody cared (try booting with the "irqpoll" option)
>> Does the kernel disable the PCI device, if it ignores the IRQ?
>
> The kernel disables the IRQ at least internally, maybe also by
> deconfiguring the interrupt register in any devices using it, which
> would explain the change in config register 0x3c (but not the changes
> in all the other bytes, could that be a freak chain reaction inside
> the hardware?) but I haven't heard/seen the kernel disable the PCI
> device itself. I don't know if it can.
>
> Why doesn't b43 care about this interrupt? Without APIC interrupt 10
> is what both device and driver should be using (according to earlier
> lspci -x output).

I think by this point the BCM43xx hardware is disabled.

>>> [ 258.299173] handlers:
>>> [ 258.299176] [<f7906455>] (b43_interrupt_handler+0x0/0x1b7 [b43])
>>> [ 258.299212] Disabling IRQ #10
>>> [ 258.315148] b43-phy0: Radio hardware status changed to DISABLED
>>> [ 258.315160] b43-phy0: ******** B43_B43_MMIO_RADIO_HWENABLED_HI 0xFFFFFFFF
>>> [ 258.342341] kobject: 'rfkill0' (f43b7d78): kobject_uevent_env
>>> [ 258.342367] kobject: 'rfkill0' (f43b7d78): fill_kobj_path: path = '/class/rfkill/rfkill0'
>>> [ 258.342418] kobject: 'ssb0:0' (f40dfcd8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:02.0/0000:02:00.0/ssb0:0'
>
> Why does the radio hw status changes here?
> How is the change notified to the driver?

By setting a bit in the appropriate register; however, device is disabled and
all bits are set. This is a false indication.

>>> [ 258.391951]
>>> [ 258.391956] =================================
>>> [ 258.391964] [ INFO: inconsistent lock state ]
>>> [ 258.391971] 2.6.28-rc5 #15
>>> [ 258.391975] ---------------------------------
>>> [ 258.391980] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
>>> [ 258.391987] X/3965 [HC0[0]:SC1[1]:HE1:SE0] takes:
>>> [ 258.391993] (&irq_desc_lock_class){++..}, at: [<c0148c60>] try_one_irq+0x15/0xe8
>>> [ 258.392016] {in-hardirq-W} state was registered at:
>>> [ 258.392021] [<c013bc07>] __lock_acquire+0x490/0x6bc
>>> [ 258.392034] [<c013be8d>] lock_acquire+0x5a/0x74
>>> [ 258.392043] [<c01496f8>] handle_level_irq+0x12/0xba
>>> [ 258.392053] [<c03c4842>] _spin_lock+0x1c/0x45
>>> [ 258.392066] [<c01496f8>] handle_level_irq+0x12/0xba
>>> [ 258.392076] [<c01496f8>] handle_level_irq+0x12/0xba
>>> [ 258.392085] [<c010564e>] do_IRQ+0x89/0x9f
>>> [ 258.392096] [<c0103ea8>] common_interrupt+0x28/0x30
>>> [ 258.392105] [<c03c4cc2>] _spin_unlock_irqrestore+0x37/0x39
>>> [ 258.392115] [<c01487e6>] __setup_irq+0x17a/0x1f3
>>> [ 258.392124] [<c05ce79d>] start_kernel+0x285/0x2f1
>>> [ 258.392140] [<ffffffff>] 0xffffffff
>>> [ 258.392159] irq event stamp: 1844456
>>> [ 258.392164] hardirqs last enabled at (1844456): [<c03c4b6f>] _spin_unlock_irq+0x20/0x23
>>> [ 258.392175] hardirqs last disabled at (1844455): [<c03c4ac3>] _spin_lock_irq+0xa/0x4b
>>> [ 258.392186] softirqs last enabled at (1844310): [<c0125406>] do_softirq+0x37/0x4d
>>> [ 258.392198] softirqs last disabled at (1844447): [<c0125406>] do_softirq+0x37/0x4d
>>
>> That's a bit weird. Looks like another bug in the IRQ layer.
>
> Something happens with the hardware that confuses the kernel. It's
> triggered by software but I don't know where.. Like Michael, I'm
> not too convinced that it is in b43. :\

>From a config file posted earlier, the OP is using SLAB. Is there any point in
trying SLUB?

Larry

2008-11-25 05:45:32

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

On Monday 24 November 2008, Larry Finger wrote:
> Michael Buesch wrote:
> > On Monday 24 November 2008 09:49:38 Yuval Hager wrote:
> >> * Now check this out - the output of lspci -d 14e4:4312 -x
> >> 02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g
> >> (rev ff) 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >> 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >> 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >> 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >>
> >> (I double checked this)
> >>
> >> huh?
> >
> > Hah, interesting. I think your hardware may be faulty, in fact.
> > To me it really seems like the mainboard has power failures on the PCI
> > bus.
> >
> > This is a laptop, so you can't pull random hardware? Can you run some
> > hardware burn-in tests like mprime (http://mersenne.org/freesoft/) or
> > memtest? If that doesn't help, can you try with another operating system?
>
> I also think you are seeing a hardware failure. Another test to try is
> http://freshmeat.net/projects/cpuburn/?topic_id=146, which will exercise
> the system.
>
> Larry

I can't argue with what the bits mean, but I must say it doesn't "feel" like a
hardware problem. It is very consistent and deterministic.

I've been running mprime & burnBX & burnMMX for over 6 hours and it is all
fine (memtest not ran yet).

However, I have some few interesting findings.
First, this is totally unrelated to b43, but to the PCI. I get the flawed 1's
read from lspci even without loading b43.

I played around with different video drivers and the results are:
* If using the 'via' driver, I lose the PCIe card immediately upon
initialization
* Using the 'openchrome' (trunk version), It works well in the beginning.
After first blanking the register reads are all 1's, and then when the screen
is blank I get a different read (some registers are correct, some are wrong),
and when the screen is unblanked, I get 0xff's again. Very consistent and
predictabe (same read every time).
* Using the 'vesa' driver I could not recreate the problem. I could not get
the screen to blank for some reason, but closing the lid, going on
standby/hibernate, restarting X - all didn't matter much to the PCI and the
wireless card kept on working.

--yuval

Attachments:

(No filename) (2.23 kB)
signature.asc (197.00 B)
This is a digitally signed message part. Download all attachments

2008-11-24 10:56:42

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

On Monday 24 November 2008 09:49:38 Yuval Hager wrote:
> On Sunday 23 November 2008, Larry Finger wrote:
> > From a config file posted earlier, the OP is using SLAB. Is there any point
> > in trying SLUB?
>
> Another try, not sure what it means:
>
> * Added CONFIG_SLUB and CONFIG_SLUB_DEBUG
>
> * boot parameters: root=/dev/sda3 debug memory_corruption_check=1 devres.log=1 debug_objects debugpat
> acpi.debug_layer=0x00410002 acpi.debug_level=0xffffffff acpi=off apic=debug nolapic irqpoll pci=noacpi slub_debug=FZPU
>
> * cat /proc/interrupts is
> CPU0
> 0: 16658 XT-PIC-XT timer
> 1: 289 XT-PIC-XT i8042
> 2: 0 XT-PIC-XT cascade
> 3: 60 XT-PIC-XT uhci_hcd:usb2, ehci_hcd:usb4
> 5: 9163 XT-PIC-XT sata_via, HDA Intel
> 7: 0 XT-PIC-XT uhci_hcd:usb3
> 8: 2 XT-PIC-XT rtc
> 10: 1712 XT-PIC-XT b43
> 11: 131 XT-PIC-XT uhci_hcd:usb1
> 12: 706 XT-PIC-XT i8042
> 14: 0 XT-PIC-XT ide0
> 15: 0 XT-PIC-XT ide1
> NMI: 0 Non-maskable interrupts
> LOC: 0 Local timer interrupts
> RES: 0 Rescheduling interrupts
> CAL: 0 Function call interrupts
> TLB: 0 TLB shootdowns
> TRM: 0 Thermal event interrupts
> SPU: 0 Spurious interrupts
> ERR: 0
> MIS: 0
>
> * lspci -d 14e4:4312 -x
> 02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev 02)
> 00: e4 14 12 43 06 01 10 00 02 00 80 02 08 00 00 00
> 10: 04 c0 ff fd 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 71 13
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00
>
> * xset dpms force standby
> * wake up
> * dmesg is virtually the same as before, complaining about nobody handling irq 10 and disabling it.

Actually, b43 _does_ use IRQ10 now.
I guess the card dies such a horrible death, that it also asserts the IRQ line forever.

> * /proc/interrupts now is
> 0: 80987 XT-PIC-XT timer
> 1: 1027 XT-PIC-XT i8042
> 2: 0 XT-PIC-XT cascade
> 3: 60 XT-PIC-XT uhci_hcd:usb2, ehci_hcd:usb4
> 5: 10400 XT-PIC-XT sata_via, HDA Intel
> 7: 0 XT-PIC-XT uhci_hcd:usb3
> 8: 2 XT-PIC-XT rtc
> 10: 200000 XT-PIC-XT b43
> 11: 131 XT-PIC-XT uhci_hcd:usb1
> 12: 3059 XT-PIC-XT i8042
> 14: 0 XT-PIC-XT ide0
> 15: 0 XT-PIC-XT ide1
> NMI: 0 Non-maskable interrupts
> LOC: 0 Local timer interrupts
> RES: 0 Rescheduling interrupts
> CAL: 0 Function call interrupts
> TLB: 0 TLB shootdowns
> TRM: 0 Thermal event interrupts
> SPU: 0 Spurious interrupts
> ERR: 0
> MIS: 0
>
> * Now check this out - the output of lspci -d 14e4:4312 -x
> 02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev ff)
> 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>
> (I double checked this)
>
> huh?

Hah, interesting. I think your hardware may be faulty, in fact.
To me it really seems like the mainboard has power failures on the PCI bus.

This is a laptop, so you can't pull random hardware? Can you run some
hardware burn-in tests like mprime (http://mersenne.org/freesoft/) or memtest?
If that doesn't help, can you try with another operating system?

--
Greetings Michael.

2008-11-23 11:53:30

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

> It doesn't hurt to turn on all debugging options. Often you get some hint
> by doing so.

I booted with 'root=/dev/sda3 debug memory_corruption_check=1 devres.log=1 debug_objects debugpat
acpi.debug_layer=0x00410002 acpi.debug_level=0xffffffff acpi=off noapic nolapic irqpoll pci=noacpi' and issued 'xset dpms force
standby'. After touching the mouse the system locked for about a minute, and the wireless stopped working.
Here's the log portion, I hope it provides a hint of some sort:

[ 182.891400] ****** b43: B43_MMIO_MACCTL 0x840A0503
[ 182.891409] ****** b43: SSB_TMSLOW 0x20150000
[ 258.299027] irq 10: nobody cared (try booting with the "irqpoll" option)
[ 258.299038] Pid: 0, comm: swapper Not tainted 2.6.28-rc5 #15
[ 258.299043] Call Trace:
[ 258.299062] [<c0148d9a>] __report_bad_irq+0x24/0x69
[ 258.299071] [<c0148da1>] __report_bad_irq+0x2b/0x69
[ 258.299080] [<c0148ec8>] note_interrupt+0xe9/0x12d
[ 258.299090] [<c014976d>] handle_level_irq+0x87/0xba
[ 258.299101] [<c010564e>] do_IRQ+0x89/0x9f
[ 258.299109] [<c0103ea8>] common_interrupt+0x28/0x30
[ 258.299119] [<c0125406>] do_softirq+0x37/0x4d
[ 258.299127] [<c0125301>] __do_softirq+0x62/0x130
[ 258.299135] [<c0125406>] do_softirq+0x37/0x4d
[ 258.299142] [<c0105653>] do_IRQ+0x8e/0x9f
[ 258.299150] [<c0103ea8>] common_interrupt+0x28/0x30
[ 258.299161] [<c0108682>] default_idle+0x2f/0x4c
[ 258.299168] [<c0101a20>] cpu_idle+0x63/0x77
[ 258.299173] handlers:
[ 258.299176] [<f7906455>] (b43_interrupt_handler+0x0/0x1b7 [b43])
[ 258.299212] Disabling IRQ #10
[ 258.315148] b43-phy0: Radio hardware status changed to DISABLED
[ 258.315160] b43-phy0: ******** B43_B43_MMIO_RADIO_HWENABLED_HI 0xFFFFFFFF
[ 258.342341] kobject: 'rfkill0' (f43b7d78): kobject_uevent_env
[ 258.342367] kobject: 'rfkill0' (f43b7d78): fill_kobj_path: path = '/class/rfkill/rfkill0'
[ 258.342418] kobject: 'ssb0:0' (f40dfcd8): fill_kobj_path: path = '/devices/pci0000:00/0000:00:02.0/0000:02:00.0/ssb0:0'
[ 258.391951]
[ 258.391956] =================================
[ 258.391964] [ INFO: inconsistent lock state ]
[ 258.391971] 2.6.28-rc5 #15
[ 258.391975] ---------------------------------
[ 258.391980] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
[ 258.391987] X/3965 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 258.391993] (&irq_desc_lock_class){++..}, at: [<c0148c60>] try_one_irq+0x15/0xe8
[ 258.392016] {in-hardirq-W} state was registered at:
[ 258.392021] [<c013bc07>] __lock_acquire+0x490/0x6bc
[ 258.392034] [<c013be8d>] lock_acquire+0x5a/0x74
[ 258.392043] [<c01496f8>] handle_level_irq+0x12/0xba
[ 258.392053] [<c03c4842>] _spin_lock+0x1c/0x45
[ 258.392066] [<c01496f8>] handle_level_irq+0x12/0xba
[ 258.392076] [<c01496f8>] handle_level_irq+0x12/0xba
[ 258.392085] [<c010564e>] do_IRQ+0x89/0x9f
[ 258.392096] [<c0103ea8>] common_interrupt+0x28/0x30
[ 258.392105] [<c03c4cc2>] _spin_unlock_irqrestore+0x37/0x39
[ 258.392115] [<c01487e6>] __setup_irq+0x17a/0x1f3
[ 258.392124] [<c05ce79d>] start_kernel+0x285/0x2f1
[ 258.392140] [<ffffffff>] 0xffffffff
[ 258.392159] irq event stamp: 1844456
[ 258.392164] hardirqs last enabled at (1844456): [<c03c4b6f>] _spin_unlock_irq+0x20/0x23
[ 258.392175] hardirqs last disabled at (1844455): [<c03c4ac3>] _spin_lock_irq+0xa/0x4b
[ 258.392186] softirqs last enabled at (1844310): [<c0125406>] do_softirq+0x37/0x4d
[ 258.392198] softirqs last disabled at (1844447): [<c0125406>] do_softirq+0x37/0x4d
[ 258.392208]
[ 258.392209] other info that might help us debug this:
[ 258.392215] no locks held by X/3965.
[ 258.392219]
[ 258.392220] stack backtrace:
[ 258.392226] Pid: 3965, comm: X Not tainted 2.6.28-rc5 #15
[ 258.392231] Call Trace:
[ 258.392241] [<c0139175>] print_usage_bug+0x13d/0x146
[ 258.392249] [<c013a2ff>] mark_lock+0x4b1/0x7c7
[ 258.392257] [<c013bc7e>] __lock_acquire+0x507/0x6bc
[ 258.392266] [<c013be8d>] lock_acquire+0x5a/0x74
[ 258.392275] [<c0148c60>] try_one_irq+0x15/0xe8
[ 258.392283] [<c03c4842>] _spin_lock+0x1c/0x45
[ 258.392291] [<c0148c60>] try_one_irq+0x15/0xe8
[ 258.392300] [<c0148c60>] try_one_irq+0x15/0xe8
[ 258.392308] [<c03c4b6f>] _spin_unlock_irq+0x20/0x23
[ 258.392317] [<c0148d33>] poll_spurious_irqs+0x0/0x43
[ 258.392326] [<c0148d55>] poll_spurious_irqs+0x22/0x43
[ 258.392338] [<c012874a>] run_timer_softirq+0x101/0x156
[ 258.392346] [<c0125321>] __do_softirq+0x82/0x130
[ 258.392354] [<c0125406>] do_softirq+0x37/0x4d
[ 258.392362] [<c0105653>] do_IRQ+0x8e/0x9f
[ 258.392370] [<c0103ea8>] common_interrupt+0x28/0x30
[ 260.311944] wlan0: No ProbeResp from current AP 00:22:3f:18:89:5e - assume out of range
[ 304.082520] ------------[ cut here ]------------
[ 304.082531] WARNING: at drivers/net/wireless/b43/phy_common.c:135 b43_radio_lock+0x29/0x7c [b43]()
[ 304.082538] Modules linked in: rfkill_input b43 ssb led_class input_polldev via drm rtc hci_usb snd_h
da_intel snd_pcm snd_timer snd_page_alloc snd_hwdep snd soundcore ehci_hcd uhci_hcd usbcore sg via_agp a
gpgart
[ 304.082593] Pid: 5913, comm: b43 Not tainted 2.6.28-rc5 #15
[ 304.082599] Call Trace:
[ 304.082618] [<c012171f>] warn_on_slowpath+0x40/0x59
[ 304.082643] [<f7901c94>] b43_shm_read16+0x20/0x42 [b43]
[ 304.082666] [<f7b9e4dc>] ssb_pci_read32+0x12/0x3b [ssb]
[ 304.082686] [<f7906ded>] b43_radio_lock+0x29/0x7c [b43]
[ 304.082706] [<f790b030>] b43_gphy_op_adjust_txpower+0x112/0x139 [b43]
[ 304.082727] [<f7906ba0>] b43_phy_txpower_adjust_work+0x32/0x3b [b43]
[ 304.082740] [<c012de6d>] run_workqueue+0xbf/0x192
[ 304.082748] [<c012de2f>] run_workqueue+0x81/0x192
[ 304.082767] [<f7906b6e>] b43_phy_txpower_adjust_work+0x0/0x3b [b43]
[ 304.082778] [<c012e72a>] worker_thread+0x0/0xbd
[ 304.082786] [<c012e7dd>] worker_thread+0xb3/0xbd
[ 304.082796] [<c0130bcc>] autoremove_wake_function+0x0/0x2d
[ 304.082803] [<c0130b10>] kthread+0x38/0x60
[ 304.082810] [<c0130ad8>] kthread+0x0/0x60
[ 304.082820] [<c01041c7>] kernel_thread_helper+0x7/0x10
[ 304.082826] ---[ end trace 6ecefdf7cfe131b0 ]---
[ 304.082832] ****** b43: B43_MMIO_MACCTL 0xFFFFFFFF
[ 304.082837] ****** b43: SSB_TMSLOW 0xFFFFFFFF

--yuval

Attachments:

(No filename) (6.05 kB)
signature.asc (197.00 B)
This is a digitally signed message part. Download all attachments

2008-11-24 08:52:58

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

On Sunday 23 November 2008, Larry Finger wrote:
> From a config file posted earlier, the OP is using SLAB. Is there any point
> in trying SLUB?

Another try, not sure what it means:

* Added CONFIG_SLUB and CONFIG_SLUB_DEBUG

* boot parameters: root=/dev/sda3 debug memory_corruption_check=1 devres.log=1 debug_objects debugpat
acpi.debug_layer=0x00410002 acpi.debug_level=0xffffffff acpi=off apic=debug nolapic irqpoll pci=noacpi slub_debug=FZPU

* cat /proc/interrupts is
CPU0
0: 16658 XT-PIC-XT timer
1: 289 XT-PIC-XT i8042
2: 0 XT-PIC-XT cascade
3: 60 XT-PIC-XT uhci_hcd:usb2, ehci_hcd:usb4
5: 9163 XT-PIC-XT sata_via, HDA Intel
7: 0 XT-PIC-XT uhci_hcd:usb3
8: 2 XT-PIC-XT rtc
10: 1712 XT-PIC-XT b43
11: 131 XT-PIC-XT uhci_hcd:usb1
12: 706 XT-PIC-XT i8042
14: 0 XT-PIC-XT ide0
15: 0 XT-PIC-XT ide1
NMI: 0 Non-maskable interrupts
LOC: 0 Local timer interrupts
RES: 0 Rescheduling interrupts
CAL: 0 Function call interrupts
TLB: 0 TLB shootdowns
TRM: 0 Thermal event interrupts
SPU: 0 Spurious interrupts
ERR: 0
MIS: 0

* lspci -d 14e4:4312 -x
02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev 02)
00: e4 14 12 43 06 01 10 00 02 00 80 02 08 00 00 00
10: 04 c0 ff fd 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 71 13
30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00

* xset dpms force standby
* wake up
* dmesg is virtually the same as before, complaining about nobody handling irq 10 and disabling it.
* /proc/interrupts now is
0: 80987 XT-PIC-XT timer
1: 1027 XT-PIC-XT i8042
2: 0 XT-PIC-XT cascade
3: 60 XT-PIC-XT uhci_hcd:usb2, ehci_hcd:usb4
5: 10400 XT-PIC-XT sata_via, HDA Intel
7: 0 XT-PIC-XT uhci_hcd:usb3
8: 2 XT-PIC-XT rtc
10: 200000 XT-PIC-XT b43
11: 131 XT-PIC-XT uhci_hcd:usb1
12: 3059 XT-PIC-XT i8042
14: 0 XT-PIC-XT ide0
15: 0 XT-PIC-XT ide1
NMI: 0 Non-maskable interrupts
LOC: 0 Local timer interrupts
RES: 0 Rescheduling interrupts
CAL: 0 Function call interrupts
TLB: 0 TLB shootdowns
TRM: 0 Thermal event interrupts
SPU: 0 Spurious interrupts
ERR: 0
MIS: 0

* Now check this out - the output of lspci -d 14e4:4312 -x
02:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev ff)
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

(I double checked this)

huh?

--yuval

Attachments:

(No filename) (3.02 kB)
signature.asc (197.00 B)
This is a digitally signed message part. Download all attachments

2008-11-23 07:29:54

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

On Saturday 22 November 2008, Michael Buesch wrote:
> On Saturday 22 November 2008 16:32:08 Larry Finger wrote:
> > Michael Buesch wrote:
> > > Somebody disabled MMIO and busmastering.
> > > And somebody cleared the CACHE_LINE_SIZE register.
> >
> > Are these all the read/write bits in the configuration area? Should I
> > conclude that someone zeroed this area?
>
> Yeah well. I'm not sure. It _looks_ like someone completely cut the
> physical power line to the card and it reset its complete PCI config.
> So well, X does poke with the PCI devices. But as you said it also happens
> if X doesn't run, I'd rule that out.
> But I would not rule out a fucked BIOS, yet.
> Does the BIOS have any powersave options and/or spread-spectrum options for
> the PCI-bus? Can you try to turn them all off?
> I have a machine that has PCI-slot autodetect and turns of the PCI clock,
> if it doesn't detect a card on that slot. Also turn that off, if you have
> it, too.
>
> > In case the kernel memory diagnostics don't help, is there any way to
> > trap writes to the configuration registers?
>
> Well, if we have random memory corruption, that can hit memory and MMIO.
> It doesn't hurt to turn on all debugging options. Often you get some hint
> by doing so.

I've enabled all CONFIG*DEBUG I could find relevant, and ran the system with:
'debug memory_corruption_check=1 devres.log=1 debug_objects debugpat
acpi.debug_layer=0x00410002 acpi.debug_level=0xffffffff'
but no hint appears in the logs during the failure.

I did find that certain events recreate the problem immediately. if I 'xset
dpms force standby' it happens on wakeup. 'xset -dpms' causes this
immediately as well. If I load X without DPMS support, it still happens after
the monitor is waken up from (hardware?) blackness.

--yuval

Attachments:

(No filename) (1.75 kB)
signature.asc (197.00 B)
This is a digitally signed message part. Download all attachments

2008-11-22 15:54:37

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

On Saturday 22 November 2008 16:32:08 Larry Finger wrote:
> Michael Buesch wrote:
>
> > Somebody disabled MMIO and busmastering.
> > And somebody cleared the CACHE_LINE_SIZE register.
>
> Are these all the read/write bits in the configuration area? Should I conclude
> that someone zeroed this area?

Yeah well. I'm not sure. It _looks_ like someone completely cut the physical
power line to the card and it reset its complete PCI config.
So well, X does poke with the PCI devices. But as you said it also happens if
X doesn't run, I'd rule that out.
But I would not rule out a fucked BIOS, yet.
Does the BIOS have any powersave options and/or spread-spectrum options for
the PCI-bus? Can you try to turn them all off?
I have a machine that has PCI-slot autodetect and turns of the PCI clock, if
it doesn't detect a card on that slot. Also turn that off, if you have it, too.

> In case the kernel memory diagnostics don't help, is there any way to trap
> writes to the configuration registers?

Well, if we have random memory corruption, that can hit memory and MMIO.
It doesn't hurt to turn on all debugging options. Often you get some hint
by doing so.

--
Greetings Michael.

2008-12-07 16:15:41

[permalink] [raw]

Subject: Re: BCM4312 Fails when xdm is started

Yuval Hager wrote:
>
> This issue has been tracked down to be at the openchrome driver. It appears
> that somehow it corrupted the PCI bus, and damaged the device right after the
> video card - the wireless card.
>
> The current workaround for this is here -
> http://wiki.openchrome.org/pipermail/openchrome-devel/2008-November/000139.html -
> but this is just a quick hack, not a fix, although it works great for me. The
> openchrome team is working on a patch based on this.
>
> Thanks for all the help - I am a very happy HP2133 user, and a very happy
> community member. This was an amazing opensource support experience, which
> I'll remember for a long time. Thank you all!

Thanks for the feedback. The community may struggle a bit on some problems, and
the approach to a solution may look like a drunken sailor's walk, but we usually
get there.

Larry

2008-12-07 09:30:11