2008-12-15 14:53:13

by Larry Finger

[permalink] [raw]
Subject: Memory errors from p54usb

Christian,

While stressing p54usb with a 'ping -f -i 0.2 AP' to my router as well as
running normal operations, I encountered the following errors:

kernel: thunderbird-bin: page allocation failure. order:1, mode:0x4020
kernel: Pid: 3341, comm: thunderbird-bin Not tainted 2.6.28-rc8-wl #52
kernel: Call Trace:
kernel: <IRQ> [<ffffffff8028074e>] __alloc_pages_internal+0x41c/0x43e
kernel: [<ffffffff8029b595>] alloc_pages_current+0xbe/0xc6
kernel: [<ffffffff802a2d5a>] new_slab+0xd5/0x28d
kernel: [<ffffffff802a1766>] ? unfreeze_slab+0x4c/0xbb
kernel: [<ffffffff802a31ad>] __slab_alloc+0x215/0x43e
kernel: [<ffffffff803c5be7>] ? dev_alloc_skb+0x16/0x2c
kernel: [<ffffffff802a3347>] ? __slab_alloc+0x3af/0x43e
kernel: [<ffffffff802a40c0>] __kmalloc_node_track_caller+0x90/0xe4
kernel: [<ffffffff803c5be7>] ? dev_alloc_skb+0x16/0x2c
kernel: [<ffffffff803c5246>] __alloc_skb+0x6f/0x135
kernel: [<ffffffff803c5be7>] dev_alloc_skb+0x16/0x2c
kernel: [<ffffffffa06df7f5>] p54u_rx_cb+0xad/0x1c8 [p54usb]
kernel: [<ffffffffa00fc062>] usb_hcd_giveback_urb+0x7e/0xb1 [usbcore]
kernel: [<ffffffffa0125883>] ehci_urb_done+0xca/0xdf [ehci_hcd]
kernel: [<ffffffffa0126895>] qh_completions+0xaa/0x346 [ehci_hcd]
kernel: [<ffffffffa0126bff>] ehci_work+0xce/0x80a [ehci_hcd]
kernel: [<ffffffffa012ae7d>] ehci_irq+0x1a6/0x1d9 [ehci_hcd]
kernel: [<ffffffffa00fbb66>] usb_hcd_irq+0x38/0x94 [usbcore]
kernel: [<ffffffff80273ffd>] handle_IRQ_event+0x20/0x55
kernel: [<ffffffff80275501>] handle_fasteoi_irq+0x91/0xd1
kernel: [<ffffffff8020ed39>] do_IRQ+0xfc/0x173
kernel: [<ffffffff8020c54b>] ret_from_intr+0x0/0xf
kernel: <EOI> <6>Mem-Info:
kernel: Node 0 DMA per-cpu:
kernel: CPU 0: hi: 0, btch: 1 usd: 0
kernel: CPU 1: hi: 0, btch: 1 usd: 0
kernel: Node 0 DMA32 per-cpu:
kernel: CPU 0: hi: 186, btch: 31 usd: 66
kernel: CPU 1: hi: 186, btch: 31 usd: 126
kernel: Active_anon:72288 active_file:171922 inactive_anon:19872
kernel: inactive_file:186277 unevictable:8 dirty:6 writeback:0 unstable:0
kernel: free:26292 slab:261910 mapped:21357 pagetables:2878 bounce:0
kernel: Node 0 DMA free:3556kB min:4kB low:4kB high:4kB active_anon:0kB
inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
present:2368kB pages_scanned:0 all_unreclaimable? yes
kernel: lowmem_reserve[]: 0 2927 2927 2927
kernel: Node 0 DMA32 free:101612kB min:6916kB low:8644kB high:10372kB
active_anon:289152kB inactive_anon:79488kB active_file:687688kB
inactive_file:745108kB unevictable:32kB present:2997292kB
pages_scanned:0 all_unreclaimable? no
kernel: lowmem_reserve[]: 0 0 0 0
kernel: Node 0 DMA: 3*4kB 1*8kB 3*16kB 3*32kB 3*64kB 3*128kB 3*256kB 0*512kB
2*1024kB 0*2048kB 0*4096kB = 3556kB
kernel: Node 0 DMA32: 25211*4kB 0*8kB 0*16kB 2*32kB 0*64kB 0*128kB 1*256kB
1*512kB 0*1024kB 0*2048kB 0*4096kB = 101676kB
kernel: 358944 total pagecache pages
kernel: 582 pages in swap cache
kernel: Swap cache stats: add 1457, delete 875, find 41242/41313
kernel: Free swap = 2100852kB
kernel: Total swap = 2104444kB
kernel: 769872 pages RAM
kernel: 20217 pages reserved
kernel: 260581 pages shared
kernel: 518529 pages non-shared

5 minutes later:

kernel: swapper: page allocation failure. order:1, mode:0x4020
kernel: Pid: 0, comm: swapper Not tainted 2.6.28-rc8-wl #52
kernel: Call Trace:
kernel: <IRQ> [<ffffffff8028074e>] __alloc_pages_internal+0x41c/0x43e
kernel: [<ffffffff8029b595>] alloc_pages_current+0xbe/0xc6
kernel: [<ffffffff802a2d5a>] new_slab+0xd5/0x28d
kernel: [<ffffffff802a1766>] ? unfreeze_slab+0x4c/0xbb
kernel: [<ffffffff802a31ad>] __slab_alloc+0x215/0x43e
kernel: [<ffffffff803c5be7>] ? dev_alloc_skb+0x16/0x2c
kernel: [<ffffffff802a3347>] ? __slab_alloc+0x3af/0x43e
kernel: [<ffffffff802a40c0>] __kmalloc_node_track_caller+0x90/0xe4
kernel: [<ffffffff803c5be7>] ? dev_alloc_skb+0x16/0x2c
kernel: [<ffffffff803c5246>] __alloc_skb+0x6f/0x135
kernel: [<ffffffff803c5be7>] dev_alloc_skb+0x16/0x2c
kernel: [<ffffffffa06df7f5>] p54u_rx_cb+0xad/0x1c8 [p54usb]
kernel: [<ffffffffa00fc062>] usb_hcd_giveback_urb+0x7e/0xb1 [usbcore]
kernel: [<ffffffffa0125883>] ehci_urb_done+0xca/0xdf [ehci_hcd]
kernel: [<ffffffffa0126895>] qh_completions+0xaa/0x346 [ehci_hcd]
kernel: [<ffffffffa0126bff>] ehci_work+0xce/0x80a [ehci_hcd]
kernel: [<ffffffffa012ae7d>] ehci_irq+0x1a6/0x1d9 [ehci_hcd]
kernel: [<ffffffff80251aea>] ? ktime_get_ts+0x49/0x4e
kernel: [<ffffffffa00fbb66>] usb_hcd_irq+0x38/0x94 [usbcore]
kernel: [<ffffffff80273ffd>] handle_IRQ_event+0x20/0x55
kernel: [<ffffffff80275501>] handle_fasteoi_irq+0x91/0xd1
kernel: [<ffffffff8020ed39>] do_IRQ+0xfc/0x173
kernel: [<ffffffff8020c54b>] ret_from_intr+0x0/0xf
kernel: <EOI> [<ffffffff80257557>] ? tick_broadcast_oneshot_control+0x1a/0x109
kernel: [<ffffffff802135f1>] ? default_idle+0x30/0x4a
kernel: [<ffffffff802135ef>] ? default_idle+0x2e/0x4a
kernel: [<ffffffff802136ee>] ? c1e_idle+0xd5/0xfc
kernel: [<ffffffff80252b32>] ? atomic_notifier_call_chain+0xf/0x11
kernel: [<ffffffff8020b171>] ? cpu_idle+0x4f/0x90
kernel: [<ffffffff80427e1d>] ? rest_init+0x61/0x63
kernel: Mem-Info:
kernel: Node 0 DMA per-cpu:
kernel: CPU 0: hi: 0, btch: 1 usd: 0
kernel: CPU 1: hi: 0, btch: 1 usd: 0
kernel: Node 0 DMA32 per-cpu:
kernel: CPU 0: hi: 186, btch: 31 usd: 114
kernel: CPU 1: hi: 186, btch: 31 usd: 91
kernel: Active_anon:70899 active_file:172447 inactive_anon:19852
kernel: inactive_file:188543 unevictable:8 dirty:391 writeback:0 unstable:0
kernel: free:25567 slab:261175 mapped:22336 pagetables:2929 bounce:0
kernel: Node 0 DMA free:3556kB min:4kB low:4kB high:4kB active_anon:0kB
inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
present:2368kB pages_scanned:0 all_unreclaimable? yes
kernel: lowmem_reserve[]: 0 2927 2927 2927
kernel: Node 0 DMA32 free:98712kB min:6916kB low:8644kB high:10372kB
active_anon:283596kB inactive_anon:79408kB active_file:689788kB
inactive_file:754172kB unevictable:32kB present:2997292kB
pages_scanned:0 all_unreclaimable? no
kernel: lowmem_reserve[]: 0 0 0 0
kernel: Node 0 DMA: 3*4kB 1*8kB 3*16kB 3*32kB 3*64kB 3*128kB 3*256kB 0*512kB

2*1024kB 0*2048kB 0*4096kB = 3556kB
kernel: Node 0 DMA32: 24463*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB
1*512kB 0*1024kB 0*2048kB 0*4096kB = 98620kB
kernel: 361741 total pagecache pages
kernel: 587 pages in swap cache
kernel: Swap cache stats: add 1474, delete 887, find 41242/41313
kernel: Free swap = 2100784kB
kernel: Total swap = 2104444kB
kernel: 769872 pages RAM
kernel: 20217 pages reserved
kernel: 262368 pages shared
kernel: 519070 pages non-shared

These errors continued until I removed p54usb.

My kernel is wireless-testing 2.6.28-rc8-wl (x86_64) with the "move statistics
timer update", "update ACK failure counter", "remove free on tx" and "more
accurate rssi to dBm" p54 patches applied.

The call that fails in p54u_rx_cb() is in this code block:

if (p54_rx(dev, skb)) {
skb = dev_alloc_skb(priv->common.rx_mtu + 32);
if (unlikely(!skb)) {
/* TODO check rx queue length and refill *somewhere* */
return;
}

If I read this correctly, there are no 8KB DMA32 pages available. Normally, I
would have suspected a memory management problem in the kernel; however, rtl8187
passes this ping test successfully. I suspect a memory leak in p54usb.

I will continue to investigate this problem, but I wanted to give you a warning
that there is a problem.

Larry


2008-12-16 17:50:42

by Larry Finger

[permalink] [raw]
Subject: Re: Memory errors from p54usb

Christian Lamparter wrote:
> On Tuesday 16 December 2008 17:55:22 Larry Finger wrote:
>> It is in station mode. I don't know how to get the radio_enabled flag.
>>> do you see anything else than probe request?
>> just probes.
> does the AP answer the probe requests?

My AP didn't respond to any of them, but my neighbor's AP did to one of them.
That was on channel 6; whereas my AP is on channel 1.

> Is this with WPA-PSK TKIP on? what happens if you disable the
> hardware crypto offload stuff (e.g p54common nohwcrypt=1 )

Yes, WPA-PSK TKIP was on. Same thing happens with hardware crypto disabled.

This time I discovered that ifdown/ifup restores the connection as well as
module unload/load.

Larry

2008-12-16 18:29:27

by Christian Lamparter

[permalink] [raw]
Subject: Re: Memory errors from p54usb

On Tuesday 16 December 2008 18:50:40 Larry Finger wrote:
> Christian Lamparter wrote:
> > On Tuesday 16 December 2008 17:55:22 Larry Finger wrote:
> >> It is in station mode. I don't know how to get the radio_enabled flag.
> >>> do you see anything else than probe request?
> >> just probes.
> > does the AP answer the probe requests?
>
> My AP didn't respond to any of them, but my neighbor's AP did to one of them.
> That was on channel 6; whereas my AP is on channel 1.
>
> > Is this with WPA-PSK TKIP on? what happens if you disable the
> > hardware crypto offload stuff (e.g p54common nohwcrypt=1 )
>
> Yes, WPA-PSK TKIP was on. Same thing happens with hardware crypto disabled.
>
> This time I discovered that ifdown/ifup restores the connection as well as
> module unload/load.
>
well, a first it sounds like TKIP countermeasures...
but since you said it also happens even if you disable the hardware
crytographic offload... I'm a bit "out of clues".

Do you still have kismets captured pcaps?

Regards,
Chr

2008-12-17 16:01:14

by Larry Finger

[permalink] [raw]
Subject: Re: Memory errors from p54usb

Christian Lamparter wrote:
> Ohh, it always gets better. ;)
> Anyway, I'll see what I can do tomorrow.
>
> In the meantime, I remembered that I put a heat sink on my chip a while ago,
> but I'm not sure if this had any effect here... But back in the summer, it allowed me
> fix the iperf stalls I had, but that was after days with 60Gb traffic...

That would be a good idea; however, it is winter in my hemisphere (I don't care
what the calender says.), and the room is cool. In addition, my device is a Dell
Wireless 1450, which comes with a pretty hefty heat sink on its own.

I'll throw printk's into every place I can think of and try to provide more info.

Larry


2008-12-16 17:13:28

by Christian Lamparter

[permalink] [raw]
Subject: Re: Memory errors from p54usb

On Tuesday 16 December 2008 17:55:22 Larry Finger wrote:
> It is in station mode. I don't know how to get the radio_enabled flag.
> >
> > do you see anything else than probe request?
>
> just probes.
does the AP answer the probe requests?
Is this with WPA-PSK TKIP on? what happens if you disable the
hardware crypto offload stuff (e.g p54common nohwcrypt=1 )

Regards,
Chr

2008-12-16 13:31:15

by Christian Lamparter

[permalink] [raw]
Subject: Re: Memory errors from p54usb

On Tuesday 16 December 2008 05:17:13 Larry Finger wrote:
>
> No matter how hard I try, I cannot duplicate these results. They may have been a
> fluke, but I do have another problem fr you to ponder.
>
> If I run the 'ping -f -i 0.2' to my file server in one window, and start a
> kernel compilation with the source on an NFS volume on that same server, p54usb
> will hang. I managed to capture the whole process with kismet running on a third
> computer and have analyzed it with wireshark. I can find the last transmission
> from the server, but no ACK. The next transmission from the p54 is more than 26
> sec later when it sends an 802.11 Probe request. The dmesg log just shows that
> there was a disconnection, and a reconnection attempt.
In which mode is p54? ad-hoc / station? is radio_enabled flag cleared?

do you see anything else than probe request?
(because probe requests are marked as "no cancel" and won't be
delayed / cancelled).

Can you bring back the receiver if you change the mode?
(e.g station to ad-hoc and then back to station,
or ibss to station and bak to ibss again...
and don't forget to do a ifconfig wlanX up all the time...)

do you see if statistic counters in /sys/kernel/debug/ieee80211/phyX/statistics/
increase (especially: FCSErros, RTSCount and RTSFailure), or is the
the noise-level in iwconfig still updated?

> Any idea as to why the receiver would go off-line, but the transmitter is still
> working?
Not really. normally its the other way around and the transmitter is dead.

2008-12-17 16:37:45

by Christian Lamparter

[permalink] [raw]
Subject: Re: Memory errors from p54usb

On Wednesday 17 December 2008 17:01:09 Larry Finger wrote:
> Christian Lamparter wrote:
> > Ohh, it always gets better. ;)
> > Anyway, I'll see what I can do tomorrow.
> >
> > In the meantime, I remembered that I put a heat sink on my chip a while ago,
> > but I'm not sure if this had any effect here... But back in the summer, it allowed me
> > fix the iperf stalls I had, but that was after days with 60Gb traffic...
>
> That would be a good idea; however, it is winter in my hemisphere (I don't care
> what the calender says.), and the room is cool. In addition, my device is a Dell
> Wireless 1450, which comes with a pretty hefty heat sink on its own.

Your had already one? Perhaps, I got a very old one then. (Rev A01)

> I'll throw printk's into every place I can think of and try to provide more info.
maybe this printk could shed some light into the dark...
(In station mode: "status" should be either 0(ACKed) or 1(not ACKed/timeout)
and (2 [ device is sleeping - but we don't set the PSM yet, so this is not expected])

Regards,
Chr


Attachments:
(No filename) (1.03 kB)
p54test.diff (646.00 B)
Download all attachments

2008-12-16 16:55:24

by Larry Finger

[permalink] [raw]
Subject: Re: Memory errors from p54usb

Christian Lamparter wrote:
> In which mode is p54? ad-hoc / station? is radio_enabled flag cleared?

It is in station mode. I don't know how to get the radio_enabled flag.
>
> do you see anything else than probe request?
> (because probe requests are marked as "no cancel" and won't be
> delayed / cancelled).

No, just probes.
>
> Can you bring back the receiver if you change the mode?
> (e.g station to ad-hoc and then back to station,
> or ibss to station and bak to ibss again...
> and don't forget to do a ifconfig wlanX up all the time...)

It normally is under the control of NetworkManager, but when I killed NM and
KNM, I got ifup/ifdown control back. When I downed the interface, switched to
Ad-Hoc mode, then back to Managed and gave an ifup command, it came back.

It continued working under the load conditions for about another 2 minutes, then
froze again. Each cycle seem to last less and less time.
>
> do you see if statistic counters in /sys/kernel/debug/ieee80211/phyX/statistics/
> increase (especially: FCSErros, RTSCount and RTSFailure), or is the
> the noise-level in iwconfig still updated?

FCSerrorcount is being incremented, and the noise level is being updated as well
as quality and signal.

Larry



2008-12-16 21:56:41

by Christian Lamparter

[permalink] [raw]
Subject: Re: Memory errors from p54usb

On Tuesday 16 December 2008 22:25:32 Larry Finger wrote:
> Christian Lamparter wrote:
> >=20
> > and when the traffic dies down, does iwlist wlanX still give you a
> > reasonable output,
>=20
> Yes - the AP scan is normal.
well, so the receiver sort of works.
> >=20
> > or has NM some sort of cli interface (e.g: wpa_cli ) because I woul=
d
> > like to see what
> >=20
> > the station supplicant is doing...?
>=20
> I switched to ifup/down control. It seems that wpa_supplicant is gett=
ing
> confused. Attached is a -ddd dump.
>=20
> BTW, dmesg does not show anything - nothing is logged. Some of those
> wpa_supplicant reconnection events did not kill the connection, but e=
ventually
> it does not recover. I did the same test with rtl8187 and found the s=
ame kind of
> sequences such as
>=20
> EAPOL: SUPP_PAE entering state CONNECTING
> EAPOL: txStart
> WPA: drop TX EAPOL in non-IEEE 802.1X mode (type=3D1 len=3D0)
> ^CCTRL-EVENT-TERMINATING - signal 2 received
> Removing interface wlan1
> State: 4WAY_HANDSHAKE -> DISCONNECTED
> wpa_driver_wext_set_operstate: operstate 0->0 (DORMANT)
> WEXT: Operstate: linkmode=3D-1, operstate=3D5
> wpa_driver_wext_deauthenticate
> No keys have been configured - skip key clearing
> EAPOL: External notification - portEnabled=3D0
> EAPOL: SUPP_PAE entering state DISCONNECTED
> EAPOL: SUPP_BE entering state INITIALIZE
> EAPOL: External notification - portValid=3D0
> wpa_driver_wext_set_wpa
> wpa_driver_wext_set_drop_unencrypted
> wpa_driver_wext_set_countermeasures
> WEXT auth param 4 value 0x0 - ioctl[SIOCSIWAUTH]: Operation not suppo=
rted
> No keys have been configured - skip key clearing
>=20
> When these happen, the flood ping is interrupted with rtl8187 but it =
always
> recovers. I don't remember this behavior from earlier, but it still h=
appens with
> 2.6.25 - it doesn't seem to be a mac80211 regression.
>=20
> Larry

well, looks like TKIP countermeasures after all....

The question is how to "debug" it...

jm, are you aware of any compatibility problems or
regression between broadcom APs and wpa_supplicant
with WPA TKIP?

I just added the extract of wpa_supplicant logs that looked important:

WPA: Invalid EAPOL-Key MIC when using TPTK - ignoring TPTK
WPA: Could not verify EAPOL-Key MIC - dropping packet
RTM_NEWLINK: operstate=3D0 ifi_flags=3D0x11043 ([UP][RUNNING][LOWER_UP]=
)
[...]
WPA: Invalid EAPOL-Key MIC when using TPTK - ignoring TPTK
WPA: Could not verify EAPOL-Key MIC - dropping packet
[...]
Authentication with XX:XX:XX:XX:XX:b1 timed out.
Added BSSID XX:XX:XX:XX:XX:b1 into blacklist
[...]
WPA: Invalid EAPOL-Key MIC when using TPTK - ignoring TPTK
WPA: Could not verify EAPOL-Key MIC - dropping packet
EAPOL: startWhen --> 0
[...]
Authentication with XX:XX:XX:XX:XX:b1 timed out.
BSSID XX:XX:XX:XX:XX:b1 blacklist count incremented to 2
[...]
1: XX:XX:XX:XX:XX:b1 ssid=3D'Larry' wpa_ie_len=3D26 rsn_ie_len=3D0 caps=
=3D0x11
=C2=A0 =C2=A0skip - blacklisted
[...]
No APs found - clear blacklist and try again
Removed BSSID 00:00:00:00:00:00 from blacklist (clear)
Removed BSSID XX:XX:XX:XX:XX:b1 from blacklist (clear)
[...]
State: DISCONNECTED -> ASSOCIATING
wpa_driver_wext_set_operstate: operstate 0->0 (DORMANT)
WEXT: Operstate: linkmode=3D-1, operstate=3D5
wpa_driver_wext_associate
wpa_driver_wext_associate: assoc failed because set_mode failed
Association request to the driver failed
[...]
WPA: Invalid EAPOL-Key MIC when using TPTK - ignoring TPTK
WPA: Could not verify EAPOL-Key MIC - dropping packet
[.......]

Regards,
Chr

2008-12-17 14:20:40

by Christian Lamparter

[permalink] [raw]
Subject: Re: Memory errors from p54usb

On Wednesday 17 December 2008 04:21:28 Larry Finger wrote:
> Christian Lamparter wrote:
> > well, looks like TKIP countermeasures after all....
> >
> > The question is how to "debug" it...
> >
> > jm, are you aware of any compatibility problems or
> > regression between broadcom APs and wpa_supplicant
> > with WPA TKIP?
>
> Unfortunately (?) the problem still occurs with an unencrypted network. I hauled
> out my 801.11b AP/router, and the same problem occurs even when wpa_supplicant
> is not in the equation.
>
Ohh, it always gets better. ;)
Anyway, I'll see what I can do tomorrow.

In the meantime, I remembered that I put a heat sink on my chip a while ago,
but I'm not sure if this had any effect here... But back in the summer, it allowed me
fix the iperf stalls I had, but that was after days with 60Gb traffic...

Regards,
Chr

2008-12-17 03:21:34

by Larry Finger

[permalink] [raw]
Subject: Re: Memory errors from p54usb

Christian Lamparter wrote:
> well, looks like TKIP countermeasures after all....
>
> The question is how to "debug" it...
>
> jm, are you aware of any compatibility problems or
> regression between broadcom APs and wpa_supplicant
> with WPA TKIP?

Unfortunately (?) the problem still occurs with an unencrypted network. I hauled
out my 801.11b AP/router, and the same problem occurs even when wpa_supplicant
is not in the equation.

Larry

2008-12-16 04:17:20

by Larry Finger

[permalink] [raw]
Subject: Re: Memory errors from p54usb

Christian Lamparter wrote:
>
> look at the high 4KB count?! it looks like your memory is highly fragmented...
>
> rx_mtu is usually around 3240 bytes on new firmwares...
> but for a quick test it can limited to something much less like the
> RTS/Fragmentation threshold (somewhere around 2356?)...
> just add:
> priv->rx_mtu = 2356; at line 198 in p54common.c
>
> In the mean time, I've to get rid of my RAM ;-)

No matter how hard I try, I cannot duplicate these results. They may have been a
fluke, but I do have another problem fr you to ponder.

If I run the 'ping -f -i 0.2' to my file server in one window, and start a
kernel compilation with the source on an NFS volume on that same server, p54usb
will hang. I managed to capture the whole process with kismet running on a third
computer and have analyzed it with wireshark. I can find the last transmission
from the server, but no ACK. The next transmission from the p54 is more than 26
sec later when it sends an 802.11 Probe request. The dmesg log just shows that
there was a disconnection, and a reconnection attempt.

Any idea as to why the receiver would go off-line, but the transmitter is still
working?

Larry

2008-12-15 19:51:46

by Christian Lamparter

[permalink] [raw]
Subject: Re: Memory errors from p54usb

On Monday 15 December 2008 15:53:10 Larry Finger wrote:
> Christian,
>
> While stressing p54usb with a 'ping -f -i 0.2 AP' to my router as well as
> running normal operations, I encountered the following errors:

> 5 minutes later:
>
> kernel: swapper: page allocation failure. order:1, mode:0x4020
> kernel: Pid: 0, comm: swapper Not tainted 2.6.28-rc8-wl #52
> kernel: Call Trace:
> kernel: <IRQ> [<ffffffff8028074e>] __alloc_pages_internal+0x41c/0x43e
> kernel: [<ffffffff8029b595>] alloc_pages_current+0xbe/0xc6
> kernel: [<ffffffff802a2d5a>] new_slab+0xd5/0x28d
> kernel: [<ffffffff802a1766>] ? unfreeze_slab+0x4c/0xbb
> kernel: [<ffffffff802a31ad>] __slab_alloc+0x215/0x43e
> kernel: [<ffffffff803c5be7>] ? dev_alloc_skb+0x16/0x2c
> kernel: [<ffffffff802a3347>] ? __slab_alloc+0x3af/0x43e
> kernel: [<ffffffff802a40c0>] __kmalloc_node_track_caller+0x90/0xe4
> kernel: [<ffffffff803c5be7>] ? dev_alloc_skb+0x16/0x2c
> kernel: [<ffffffff803c5246>] __alloc_skb+0x6f/0x135
> kernel: [<ffffffff803c5be7>] dev_alloc_skb+0x16/0x2c
> kernel: [<ffffffffa06df7f5>] p54u_rx_cb+0xad/0x1c8 [p54usb]
> kernel: [<ffffffffa00fc062>] usb_hcd_giveback_urb+0x7e/0xb1 [usbcore]
> kernel: [<ffffffffa0125883>] ehci_urb_done+0xca/0xdf [ehci_hcd]
> kernel: [<ffffffffa0126895>] qh_completions+0xaa/0x346 [ehci_hcd]
> kernel: [<ffffffffa0126bff>] ehci_work+0xce/0x80a [ehci_hcd]
> kernel: [<ffffffffa012ae7d>] ehci_irq+0x1a6/0x1d9 [ehci_hcd]
> kernel: [<ffffffff80251aea>] ? ktime_get_ts+0x49/0x4e
> kernel: [<ffffffffa00fbb66>] usb_hcd_irq+0x38/0x94 [usbcore]
> kernel: [<ffffffff80273ffd>] handle_IRQ_event+0x20/0x55
> kernel: [<ffffffff80275501>] handle_fasteoi_irq+0x91/0xd1
> kernel: [<ffffffff8020ed39>] do_IRQ+0xfc/0x173
> kernel: [<ffffffff8020c54b>] ret_from_intr+0x0/0xf
> kernel: <EOI> [<ffffffff80257557>] ? tick_broadcast_oneshot_control+0x1a/0x109
> kernel: [<ffffffff802135f1>] ? default_idle+0x30/0x4a
> kernel: [<ffffffff802135ef>] ? default_idle+0x2e/0x4a
> kernel: [<ffffffff802136ee>] ? c1e_idle+0xd5/0xfc
> kernel: [<ffffffff80252b32>] ? atomic_notifier_call_chain+0xf/0x11
> kernel: [<ffffffff8020b171>] ? cpu_idle+0x4f/0x90
> kernel: [<ffffffff80427e1d>] ? rest_init+0x61/0x63
> kernel: Mem-Info:
> kernel: Node 0 DMA per-cpu:
> kernel: CPU 0: hi: 0, btch: 1 usd: 0
> kernel: CPU 1: hi: 0, btch: 1 usd: 0
> kernel: Node 0 DMA32 per-cpu:
> kernel: CPU 0: hi: 186, btch: 31 usd: 114
> kernel: CPU 1: hi: 186, btch: 31 usd: 91
> kernel: Active_anon:70899 active_file:172447 inactive_anon:19852
> kernel: inactive_file:188543 unevictable:8 dirty:391 writeback:0 unstable:0
> kernel: free:25567 slab:261175 mapped:22336 pagetables:2929 bounce:0
> kernel: Node 0 DMA free:3556kB min:4kB low:4kB high:4kB active_anon:0kB
> inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
> present:2368kB pages_scanned:0 all_unreclaimable? yes
> kernel: lowmem_reserve[]: 0 2927 2927 2927
> kernel: Node 0 DMA32 free:98712kB min:6916kB low:8644kB high:10372kB
> active_anon:283596kB inactive_anon:79408kB active_file:689788kB
> inactive_file:754172kB unevictable:32kB present:2997292kB
> pages_scanned:0 all_unreclaimable? no
> kernel: lowmem_reserve[]: 0 0 0 0
> kernel: Node 0 DMA: 3*4kB 1*8kB 3*16kB 3*32kB 3*64kB 3*128kB 3*256kB 0*512kB
>
> 2*1024kB 0*2048kB 0*4096kB = 3556kB
> kernel: Node 0 DMA32: 24463*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB
> 1*512kB 0*1024kB 0*2048kB 0*4096kB = 98620kB
> kernel: 361741 total pagecache pages
> kernel: 587 pages in swap cache
> kernel: Swap cache stats: add 1474, delete 887, find 41242/41313
> kernel: Free swap = 2100784kB
> kernel: Total swap = 2104444kB
> kernel: 769872 pages RAM
> kernel: 20217 pages reserved
> kernel: 262368 pages shared
> kernel: 519070 pages non-shared
>
> These errors continued until I removed p54usb.
>
> My kernel is wireless-testing 2.6.28-rc8-wl (x86_64) with the "move statistics
> timer update", "update ACK failure counter", "remove free on tx" and "more
> accurate rssi to dBm" p54 patches applied.
>
> The call that fails in p54u_rx_cb() is in this code block:
>
> if (p54_rx(dev, skb)) {
> skb = dev_alloc_skb(priv->common.rx_mtu + 32);
> if (unlikely(!skb)) {
> /* TODO check rx queue length and refill *somewhere* */
> return;
> }
>
> If I read this correctly, there are no 8KB DMA32 pages available. Normally, I
> would have suspected a memory management problem in the kernel; however, rtl8187
> passes this ping test successfully. I suspect a memory leak in p54usb.
>
> I will continue to investigate this problem, but I wanted to give you a warning
> that there is a problem.
>

Alright, I'll look in that too... just one thing:
> kernel: Node 0 DMA32: 24463*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB
> 1*512kB 0*1024kB 0*2048kB 0*4096kB = 98620kB

look at the high 4KB count?! it looks like your memory is highly fragmented...

rx_mtu is usually around 3240 bytes on new firmwares...
but for a quick test it can limited to something much less like the
RTS/Fragmentation threshold (somewhere around 2356?)...
just add:
priv->rx_mtu = 2356; at line 198 in p54common.c

In the mean time, I've to get rid of my RAM ;-)

Regards,
Chr