Return-path: Received: from ey-out-2122.google.com ([74.125.78.24]:1886 "EHLO ey-out-2122.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751677AbYK0I5N (ORCPT ); Thu, 27 Nov 2008 03:57:13 -0500 Received: by ey-out-2122.google.com with SMTP id 6so370979eyi.37 for ; Thu, 27 Nov 2008 00:57:11 -0800 (PST) Message-ID: (sfid-20081127_095718_603571_E6C59845) Date: Thu, 27 Nov 2008 09:57:11 +0100 From: "Stefan Steuerwald" To: "Christian Lamparter" Subject: Re: p54: AP mode: no data frame despite traffic indication set in TIM Cc: "Johannes Berg" , linux-wireless@vger.kernel.org, "John W Linville" In-Reply-To: <200811262213.03751.chunkeey@web.de> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 References: <200811242124.16358.chunkeey@web.de> <200811262213.03751.chunkeey@web.de> Sender: linux-wireless-owner@vger.kernel.org List-ID: Seems solved! Maybe a little premature, but I need to blurt this out... Don't know why, but I think I was not thorough enough in my kernel config: - I have a so-called alix board featuring a Geode LX 800 - I had tried to set processor type to Geode GX/LX, but that did not boot (hangs somewhere) - I didn't bother to find out why, but compiled for 486 instead (worked) - In the meantime, I copied/merged a kernel .config from another branch with processor type = 586/686/etc, which went unnoticed by me, but seemed to work all the time (except maybe for that last kernel BUG) - Now I compiled for Geode GX/LX again, and set CONFIG_GEODE_MFGPT_TIMER=n (as per this info here: https://kerneltrap.org/mailarchive/linux-kernel/2008/1/20/585236) which makes my kernel boot and SEEMS TO MAKE THAT CRASH GO AWAY!!! At least I haven't observed the crash in the last 60 minutes, whereas before it took only 1-2 minutes every time to turn it up. Will test this all day. The three patches mentioned before are applied, and my app-level timeout is still gone, and the "dropped filtered TX" messages are gone as well. Christian, should I actually test your p54-sta-flags-v3 patch? Regards, Stefan. 2008/11/26 Christian Lamparter : > On Wednesday 26 November 2008 14:38:59 Stefan Steuerwald wrote: >> console [netcon0] enabled >> netconsole: network logging started >> BUG: unable to handle kernel NULL pointer dereference at 00000038 >> IP: [] p54_assign_address+0x67/0x14b [p54common] >> *pde = 00000000 >> Oops: 0000 [#1] >> last sysfs file: /sys/class/net/lo/operstate >> Modules linked in: netconsole ipv6 loop evdev ehci_hcd ohci_hcd >> rtc_cmos rtc_core pcspkr rtc_lib p54pci usbcore via_rhine p54common >> geode_aes mii [last unloaded: netconsole] >> >> Pid: 0, comm: swapper Not tainted (2.6.28-rc6-wl #16) >> EIP: 0060:[] EFLAGS: 00010002 CPU: 0 >> EIP is at p54_assign_address+0x67/0x14b [p54common] >> EAX: cf98b178 EBX: cf86ee40 ECX: 00000000 EDX: 00000000 >> ESI: 000000f8 EDI: 00000000 EBP: 0002027c ESP: c03f9c4c >> DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 >> Process swapper (pid: 0, ti=c03f8000 task=c03c4380 task.ti=c03f8000) >> Stack: >> 00000002 ce4d5880 ce4c48b4 cf86e1a0 00000000 00000038 00020200 00000286 >> cf86ee40 00000004 ce4d58b2 ce4d588c d0826fd7 00000090 014c48d4 ce4c48b4 >> cf86e1a0 0086ee40 00000004 02000282 ce4c48d4 cf86ef10 cf86ee40 ce4d5880 >> Call Trace: >> [] p54_tx+0x416/0x482 [p54common] >> [] __ieee80211_tx+0x35/0xf8 >> [] ieee80211_master_start_xmit+0x2ab/0x396 >> [] common_interrupt+0x23/0x30 >> [] dev_hard_start_xmit+0x16e/0x1c9 >> [] __qdisc_run+0xa2/0x15c >> [] dev_queue_xmit+0x2f5/0x3c5 >> [] ieee80211_invoke_rx_handlers+0x488/0x1486 >> [] bictcp_cong_avoid+0x10/0x160 >> [] tcp_ack+0x16f0/0x1850 >> [] enqueue_task_fair+0x12a/0x16b >> [] tcp_current_mss+0x6b/0xe4 >> [] __ieee80211_rx_handle_packet+0x54a/0x56d >> [] __ieee80211_rx+0x491/0x4e3 >> [] ieee80211_tasklet_handler+0x60/0xd6 >> [] tasklet_action+0x3e/0x64 >> [] __do_softirq+0x4a/0xbc >> [] do_softirq+0x22/0x26 >> [] irq_exit+0x25/0x55 >> [] do_IRQ+0x5a/0x6c >> [] common_interrupt+0x23/0x30 >> [] default_idle+0x25/0x38 >> [] cpu_idle+0x41/0x5b >> Code: 0f 84 01 01 00 00 9c 8f 44 24 1c fa 8b 53 10 31 ff 89 6c 24 18 >> 89 14 24 31 d2 eb 3f 8b 4c 24 10 83 c1 38 89 4c 24 14 8b 4c 24 10 <8b> >> 41 38 29 e8 85 d2 75 0d 39 f0 72 09 8b 51 04 29 f0 89 6c 24 >> EIP: [] p54_assign_address+0x67/0x14b [p54common] SS:ESP 0068:c03f9c4c >> Kernel panic - not syncing: Fatal exception in interrupt >> > wt*, this bug is "impossible": > > The bug happens when p54_assign_address looks for a free space for a new frame: > here's the code: > [...] > if (!skb) > return -EINVAL; <--- we don't accept "null" skbs > > spin_lock_irqsave(&priv->tx_queue.lock, flags); <--- we are under a spin_lock with irq disabled > left = skb_queue_len(&priv->tx_queue); > while (left--) { > u32 hole_size; > info = IEEE80211_SKB_CB(entry); <--- Here it BUGs, > [...] > > your binary module said that skb->cb is at 0x38, > so our "entry" is really NULL right when it BUGS. > And this only happens means that the queue was > modified "outside" of our driver. > > Since we always take the spin_lock_irqsave (of course, > only of "our" tx_queue). if we need to do anything with the data in the queue, > > Of course, since the package as queued while the station was sleeping > somewhere mac80211, so maybe it still holds a reference to, but then > other drivers would have already spotted this misbehaviour long time ago... > > So? back to square one... I guess. > > Regards, > Chr >