Return-path: Received: from fmmailgate03.web.de ([217.72.192.234]:53539 "EHLO fmmailgate03.web.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753642AbZGEOAw (ORCPT ); Sun, 5 Jul 2009 10:00:52 -0400 From: Christian Lamparter To: Max Filippov Subject: Re: [WIP] p54: deal with allocation failures in rx path Date: Sun, 5 Jul 2009 16:00:55 +0200 Cc: "linux-wireless" , Larry Finger References: <200907040053.05654.chunkeey@web.de> <200907050457.00689.jcmvbkbc@gmail.com> In-Reply-To: <200907050457.00689.jcmvbkbc@gmail.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Message-Id: <200907051600.55958.chunkeey@web.de> Sender: linux-wireless-owner@vger.kernel.org List-ID: On Sunday 05 July 2009 02:56:59 Max Filippov wrote: > > This patch tries to address a long standing issue: > > how to survive serve memory starvation situations, > > without losing the device due to missing transfer-buffers. > > > > And with a flick of __GFP_NOWARN, we're able to handle ?all? memory > > allocation failures on the rx-side during operation without much fuss. > > > > However, there is still an issue within the xmit-part. > > This is likely due to p54's demand for a large free headroom for > > every outgoing frame: > > > > + transport header (differs from device to device) > > -> 16 bytes transport header (USB 1st gen) > > -> 8 bytes for (USB 2nd gen) > > -> 0 bytes for spi & pci > > + 12 bytes for p54_hdr > > + 44 bytes for p54_tx_data > > + up to 3 bytes for alignment > > (+ 802.11 header as well? ) > > > > and this is where ieee80211_skb_resize comes into the play... > > which will try to _relocate_ (alloc new, copy, free old) frame data, > > as the headroom is most of the time simply not enough. > > => > > Call Trace: (from Larry - Bug #13319 ) > > [] __alloc_pages_internal+0x43d/0x45e > > [] alloc_pages_current+0xbe/0xc6 > > [] new_slab+0xcf/0x28b > > [] ? unfreeze_slab+0x4c/0xbd > > [] __slab_alloc+0x210/0x44c > > [] ? pskb_expand_head+0x52/0x166 > > [] ? pskb_expand_head+0x52/0x166 > > [] __kmalloc+0x119/0x194 > > [] pskb_expand_head+0x52/0x166 > > [] ieee80211_skb_resize+0x91/0xc7 [mac80211] > > [] ieee80211_master_start_xmit+0x298/0x319 [mac80211] > > [] dev_hard_start_xmit+0x229/0x2a8 > > (sl*b debug option will help to bloat even more.) > > > > So?! how to prevent ieee80211_skb_resize from raping > > the bits of memory left? > > > > the simplest answer is probably this one: > > https://dev.openwrt.org/changeset/15761 > > -- > > > > back to rx failures. > > the attached code below was only usb was tested so far! > > you have been warned! > > > > regards, > > chr > > > > btw: max what do you think about the p54spi changes, are they total ****? > > Christian, I'm trying to test it, but it seems that many things have changed since 2.6.28. > Right now I see this: > > [ 416.738586] Freeing init memory: 140K > [ 417.208801] cx3110x spi2.0: firmware: requesting 3826.arm > [ 417.272094] hub 1-0:1.0: hub_suspend > [ 417.272155] usb usb1: bus auto-suspend > [ 417.295501] phy0: p54 detected a LM20 firmware > [ 417.298034] p54: rx_mtu reduced from 3240 to 2376 > [ 417.300598] phy0: FW rev 2.13.0.0.a.22.8 - Softmac protocol 5.6 > [ 417.303558] phy0: cryptographic accelerator WEP:YES, TKIP:YES, CCMP:YES > [ 417.306732] cx3110x spi2.0: firmware: requesting 3826.eeprom > [ 417.385742] firmware spi2.0: firmware_loading_store: vmap() failed > [ 417.391540] cx3110x spi2.0: loading default eeprom... > [ 417.395568] phy0: hwaddr 00:02:ee:c0:ff:ee, MAC:isl3820 RF:Longbow > [ 417.468841] phy0: Selected rate control algorithm 'minstrel' > [ 417.473693] cx3110x spi2.0: is registered as 'phy0' > [ 419.150909] g_ether gadget: notify connect false > [ 419.182891] g_ether gadget: notify speed 425984000 > [ 420.409210] usb0: eth_open > [ 420.409240] usb0: eth_start > [ 420.409423] g_ether gadget: ecm_open > [ 420.409454] g_ether gadget: notify connect true > [ 420.430908] g_ether gadget: notify speed 425984000 > [ 421.186340] phy0: device now idle > [ 421.200958] skb_over_panic: text:bf000498 len:2 put:2 head:c793a200 data:c793a220 tail:0xc793a222 end:0xc793a220 dev: > [ 421.211669] kernel BUG at net/core/skbuff.c:127! > [ 421.217407] Unable to handle kernel NULL pointer dereference at virtual address 00000000 > [ 421.223571] pgd = c0004000 > [ 421.229797] [00000000] *pgd=00000000 > [ 421.236236] Internal error: Oops: 817 [#1] > [ 421.242736] Modules linked in: p54spi > [ 421.249420] CPU: 0 Not tainted (2.6.31-rc1-omap1-wl #4) > [ 421.256378] PC is at __bug+0x1c/0x28 > [ 421.263458] LR is at __bug+0x18/0x28 > [ 421.270538] pc : [] lr : [] psr: 60000113 > [ 421.270568] sp : c798ff20 ip : 00000000 fp : 00000000 > [ 421.284851] r10: 00000000 r9 : 00000000 r8 : c7976b34 > [ 421.291870] r7 : c793a220 r6 : c793a222 r5 : c793a220 r4 : c793a200 > [ 421.298980] r3 : 00000000 r2 : c033cb84 r1 : 000045b2 r0 : 0000003a > [ 421.306091] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel > [ 421.313323] Control: 00c5387d Table: 87fe0000 DAC: 00000017 > [ 421.320526] Process phy0 (pid: 426, stack limit = 0xc798e268) > [ 421.327697] Stack: (0xc798ff20 to 0xc7990000) > [ 421.334747] ff20: 00000002 c01dc03c c793a200 c793a220 c793a222 c793a220 c02fba60 00000000 > [ 421.342346] ff40: c798ff40 c784abc0 c793a220 bf000498 c784abc0 c01dd1a8 00000058 c7976940 > [ 421.349975] ff60: c798ff6e bf000498 c798e000 80000058 c7976afc c7976940 50000000 c7976b0c > [ 421.357574] ff80: 00000000 bf0008ec c796fd20 10000000 c0060510 bf000808 c796fd20 c798e000 > [ 421.365173] ffa0: c0060510 c0060650 c798ffd4 00000000 c78cc9a0 c00636ac c798ffb8 c798ffb8 > [ 421.372558] ffc0: c798ffd4 c7951d98 c796fd20 c0063440 00000000 00000000 c798ffd8 c798ffd8 > [ 421.379730] ffe0: 00000000 00000000 00000000 00000000 00000000 c002cca8 53384842 4e86725f > [ 421.386993] Code: e1a01000 e59f000c eb0088c3 e3a03000 (e5833000) > [ 421.394104] ---[ end trace 75ac12f5b28efc30 ]--- > > Looks like something's wrong with firmware loading. > I hope to fix it tomorrow and see how your changes work. hmm, looks like someone tries to skb_push on a NULL skb. hmmmm, can you please enable ksym, it's a bit hard to see the obvious bug here. Regards, Chr