2009-11-20 12:43:01

by Larry Finger

[permalink] [raw]
Subject: Fatal DMA error problem with netbook and BCM4312

One last check. I would appreciate receiving answers to the following questions.
These questions apply to anyone else with this problem.

Does the pm_qos patch help your "fatal DMA error" problem, particularly when
booted from power-off?

If you warm-boot after loading the wl driver, does the patch make any difference?

Larry


2009-11-23 10:18:41

by Johannes Berg

[permalink] [raw]
Subject: Re: Fatal DMA error problem with netbook and BCM4312

On Sun, 2009-11-22 at 19:52 -0600, Larry Finger wrote:

> We know that the wl driver does something to the interface that persists across
> a warm boot - we just do not know what. It does not appear to be done in any of
> the MMIO traffic - at least I have not seen it in the mmio-trace output. If
> anyone has a KVM setup using PCI passthrough, it is possible to trace PCI
> configuration traffic?

I'm pretty sure even the binary driver has to go through
drivers/pci/access.c, maybe you can just insert logging into that code?

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part

2009-11-23 01:52:43

by Larry Finger

[permalink] [raw]
Subject: Re: Fatal DMA error problem with netbook and BCM4312

On 11/22/2009 01:03 PM, Chris Vine wrote:
> On Sat, 21 Nov 2009 00:15:12 +0000
> Chris Vine <[email protected]> wrote:
>> WARM BOOT FROM KERNEL WITH WL MODULE INSTALLED
>>
>> The patched kernel makes no change on a warm boot in the sense that
>> if I warm boot after initialising the wireless device with the wl
>> module then the b43 module appears to work correctly, both with and
>> without the patch applied.
>>
>> On the same stress test as mentioned above, I have not been able to
>> induce the DMA errors nor kernel warnings. It resolutely refuses to
>> do anything except work correctly.
>
> This is just to say that I have carried out further stress tests today
> after warm booting to an unpatched linux-2.6.32-rc8 kernel with the b43
> driver (on the assumption that unpatched is the least favourable case
> for the driver). This is a warm reboot from a 2.6.31.6 kernel which had
> the wl driver installed.
>
> I have created an extended period of high speed traffic on my wireless
> lan and I cannot induce any errors at all with the b43 driver on a warm
> reboot.
>
> This makes me wonder whether the patch is just (partially) masking the
> problem rather than actually dealing with it.

We know that the wl driver does something to the interface that persists across
a warm boot - we just do not know what. It does not appear to be done in any of
the MMIO traffic - at least I have not seen it in the mmio-trace output. If
anyone has a KVM setup using PCI passthrough, it is possible to trace PCI
configuration traffic?

Have you tried running your system with the patch entitled "[PATCH] b43: Rewrite
DMA Tx status handling sanity checks"? It cleared up some of the problems that I
was seeing with the open-source firmware.

Larry

2009-11-20 20:06:19

by Chris Vine

[permalink] [raw]
Subject: Re: Fatal DMA error problem with netbook and BCM4312

On Fri, 20 Nov 2009 06:43:05 -0600
Larry Finger <[email protected]> wrote:
> One last check. I would appreciate receiving answers to the following
> questions. These questions apply to anyone else with this problem.
>
> Does the pm_qos patch help your "fatal DMA error" problem,
> particularly when booted from power-off?
>
> If you warm-boot after loading the wl driver, does the patch make any
> difference?

What's the date/time of the last patch you posted for this and what
kernel does it apply to?

Chris



2009-11-22 19:03:47

by Chris Vine

[permalink] [raw]
Subject: Re: Fatal DMA error problem with netbook and BCM4312

On Sat, 21 Nov 2009 00:15:12 +0000
Chris Vine <[email protected]> wrote:
> WARM BOOT FROM KERNEL WITH WL MODULE INSTALLED
>
> The patched kernel makes no change on a warm boot in the sense that
> if I warm boot after initialising the wireless device with the wl
> module then the b43 module appears to work correctly, both with and
> without the patch applied.
>
> On the same stress test as mentioned above, I have not been able to
> induce the DMA errors nor kernel warnings. It resolutely refuses to
> do anything except work correctly.

This is just to say that I have carried out further stress tests today
after warm booting to an unpatched linux-2.6.32-rc8 kernel with the b43
driver (on the assumption that unpatched is the least favourable case
for the driver). This is a warm reboot from a 2.6.31.6 kernel which had
the wl driver installed.

I have created an extended period of high speed traffic on my wireless
lan and I cannot induce any errors at all with the b43 driver on a warm
reboot.

This makes me wonder whether the patch is just (partially) masking the
problem rather than actually dealing with it.

Chris

PS The wl driver was compiled from
hybrid-portsrc-x86_32-v5.10.91.9.3.tar.gz available at
http://www.broadcom.com/support/802.11/linux_sta.php . (Note for
anyone want to try this with a warm boot from 2.6.32, that this driver
will not compile with 2.6.32 without patching the headers of one of the
blob glue files as one of them fails to include linux/sched.h.)



2009-11-21 00:14:19

by Chris Vine

[permalink] [raw]
Subject: Re: Fatal DMA error problem with netbook and BCM4312

On Fri, 20 Nov 2009 14:05:25 -0600
Larry Finger <[email protected]> wrote:
> On 11/20/2009 01:58 PM, Chris Vine wrote:
> > On Fri, 20 Nov 2009 06:43:05 -0600
> > Larry Finger <[email protected]> wrote:
> >> One last check. I would appreciate receiving answers to the
> >> following questions. These questions apply to anyone else with
> >> this problem.
> >>
> >> Does the pm_qos patch help your "fatal DMA error" problem,
> >> particularly when booted from power-off?
> >>
> >> If you warm-boot after loading the wl driver, does the patch make
> >> any difference?
> >
> > What's the date/time of the last patch you posted for this and what
> > kernel does it apply to?
>
> Friday 11/13 ar 22:38 CET. The link is at
> https://lists.berlios.de/pipermail/bcm43xx-dev/2009-November/006338.html.
>
> The patch is written for wireless-testing, but it should apply on
> mainline kernels as well.

OK. I have applied the patch to mainline kernel 2.6.32-rc8.

COLD BOOT

On a cold boot, the patch gives rise to a very substantial improvement.
Without it, with the processor acpi module loaded, often just bringing
up the interface would cause the DMA errors, and if that didn't do it
then running 'iwlist scan' a few times always would. At that point the
wireless device required a cold boot to get it working again.

With the patch applied and with the processor acpi module loaded it
works well for a period - I can bring up the interface, associate with
my wireless router and do some web browsing. However if I gave it a
stress test by rsyncing a large directory (actually all my
mozilla/firefox caches) then after a while, say 1 minute of full speed
throughput, I got the kernel warning at the end of this post. Shortly
after this the DMA errors arose, all throughput ceased and shortly
afterwards I got a complete kernel lock-up. (I couldn't ssh in or nor
did anything appear in the logs.) I was not able to capture the text of
the DMA errors but they followed the usual pattern.

This is pretty well the same effect as not applying the patch and
blacklisting the processor module: I reported a few weeks ago that
blacklisting that module solves the DMA problem for me, but further
testing showed that to be wrong - I got kernel warnings (sorry I didn't
save them) followed by the usual DMA errors following sometimes by a
kernel lock-up.

WARM BOOT FROM KERNEL WITH WL MODULE INSTALLED

The patched kernel makes no change on a warm boot in the sense that if I
warm boot after initialising the wireless device with the wl module
then the b43 module appears to work correctly, both with and without
the patch applied.

On the same stress test as mentioned above, I have not been able to
induce the DMA errors nor kernel warnings. It resolutely refuses to do
anything except work correctly.

KERNEL WARNING

The kernel warning I induced on a cold boot was as follows:

WARNING: at drivers/net/wireless/b43/dma.c:1151
b43_dma_handle_txstatus+0x55/0x420 [b43]()
Hardware name: 20021,2959
Modules linked in: arc4 ecb b43 ssb mmc_core pcmcia mac80211 cfg80211
led_class pcmcia_core i915 drm_kms_helper drm i2c_algo_bit snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss sco bnep rfcomm l2cap crc16 nfsd exportfs nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables xt_helper xt_conntrack xt_state x_tables nf_conntrack_irc nf_conntrack_ftp nf_conntrack parport_pc parport fuse snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm uvcvideo rtc_cmos btusb bluetooth rtc_core intel_agp rfkill snd_timer battery ac videodev rtc_lib video processor thermal usbhid tg3 psmouse v4l1_compat snd wmi thermal_sys agpgart button output hwmon i2c_i801 evdev libphy soundcore sg serio_raw snd_page_alloc
Pid: 4412, comm: irq/17-b43 Not tainted 2.6.32-rc8 #1
Call Trace:
[<f8dd4c95>] ? b43_dma_handle_txstatus+0x55/0x420 [b43]
[<c10328fc>] warn_slowpath_common+0x7c/0xa0
[<f8dd4c95>] ? b43_dma_handle_txstatus+0x55/0x420 [b43]
[<c1032935>] warn_slowpath_null+0x15/0x20
[<f8dd4c95>] b43_dma_handle_txstatus+0x55/0x420 [b43]
[<c10265e1>] ? __dequeue_entity+0x21/0x40
[<c1028ee9>] ? finish_task_switch+0x39/0x80
[<f8dceb7e>] b43_handle_txstatus+0x6e/0x80 [b43]
[<f8dbcdbb>] b43_do_interrupt_thread+0x21b/0x910 [b43]
[<c102a34f>] ? try_to_wake_up+0x8f/0x210
[<f8dbd518>] b43_interrupt_thread_handler+0x18/0x30 [b43]
[<c1068580>] irq_thread+0xc0/0x1a0
[<c10684c0>] ? irq_thread+0x0/0x1a0
[<c104ac24>] kthread+0x74/0x80
[<c104abb0>] ? kthread+0x0/0x80
[<c100384f>] kernel_thread_helper+0x7/0x38
---[ end trace 778fc6df7aca6d14 ]---
------------[ cut here ]------------
WARNING: at drivers/net/wireless/b43/dma.c:1154
b43_dma_handle_txstatus+0x6e/0x420 [b43]()
Hardware name: 20021,2959
Modules linked in: arc4 ecb b43 ssb mmc_core pcmcia mac80211 cfg80211
led_class pcmcia_core i915 drm_kms_helper drm i2c_algo_bit snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss sco bnep rfcomm l2cap crc16 nfsd exportfs nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables xt_helper xt_conntrack xt_state x_tables nf_conntrack_irc nf_conntrack_ftp nf_conntrack parport_pc parport fuse snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm uvcvideo rtc_cmos btusb bluetooth rtc_core intel_agp rfkill snd_timer battery ac videodev rtc_lib video processor thermal usbhid tg3 psmouse v4l1_compat snd wmi thermal_sys agpgart button output hwmon i2c_i801 evdev libphy soundcore sg serio_raw snd_page_alloc
Pid: 4412, comm: irq/17-b43 Tainted: G W 2.6.32-rc8 #1
Call Trace:
[<f8dd4cae>] ? b43_dma_handle_txstatus+0x6e/0x420 [b43]
[<c10328fc>] warn_slowpath_common+0x7c/0xa0
[<f8dd4cae>] ? b43_dma_handle_txstatus+0x6e/0x420 [b43]
[<c1032935>] warn_slowpath_null+0x15/0x20
[<f8dd4cae>] b43_dma_handle_txstatus+0x6e/0x420 [b43]
[<c10265e1>] ? __dequeue_entity+0x21/0x40
[<c1028ee9>] ? finish_task_switch+0x39/0x80
[<f8dceb7e>] b43_handle_txstatus+0x6e/0x80 [b43]
[<f8dbcdbb>] b43_do_interrupt_thread+0x21b/0x910 [b43]
[<c102a34f>] ? try_to_wake_up+0x8f/0x210
[<f8dbd518>] b43_interrupt_thread_handler+0x18/0x30 [b43]
[<c1068580>] irq_thread+0xc0/0x1a0
[<c10684c0>] ? irq_thread+0x0/0x1a0
[<c104ac24>] kthread+0x74/0x80
[<c104abb0>] ? kthread+0x0/0x80
[<c100384f>] kernel_thread_helper+0x7/0x38
---[ end trace 778fc6df7aca6d15 ]---

Chris




2009-11-20 20:05:24

by Larry Finger

[permalink] [raw]
Subject: Re: Fatal DMA error problem with netbook and BCM4312

On 11/20/2009 01:58 PM, Chris Vine wrote:
> On Fri, 20 Nov 2009 06:43:05 -0600
> Larry Finger <[email protected]> wrote:
>> One last check. I would appreciate receiving answers to the following
>> questions. These questions apply to anyone else with this problem.
>>
>> Does the pm_qos patch help your "fatal DMA error" problem,
>> particularly when booted from power-off?
>>
>> If you warm-boot after loading the wl driver, does the patch make any
>> difference?
>
> What's the date/time of the last patch you posted for this and what
> kernel does it apply to?

Friday 11/13 ar 22:38 CET. The link is at
https://lists.berlios.de/pipermail/bcm43xx-dev/2009-November/006338.html.

The patch is written for wireless-testing, but it should apply on mainline
kernels as well.

Larry