Return-path: Received: from mail.candelatech.com ([208.74.158.172]:52462 "EHLO ns3.lanforge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751486Ab0JGVwI (ORCPT ); Thu, 7 Oct 2010 17:52:08 -0400 Message-ID: <4CAE4101.2020600@candelatech.com> Date: Thu, 07 Oct 2010 14:52:01 -0700 From: Ben Greear MIME-Version: 1.0 To: "Luis R. Rodriguez" CC: Johannes Berg , "linux-wireless@vger.kernel.org" Subject: Re: memory clobber in rx path, maybe related to ath9k. References: <4CAB59B2.5050106@candelatech.com> <4CAB5F3D.9060201@candelatech.com> <4CAB627F.8020804@candelatech.com> <4CAB64AD.4080105@candelatech.com> <4CAB6B08.4050801@candelatech.com> <4CAE0474.4090605@candelatech.com> <1286475250.20974.22.camel@jlt3.sipsolutions.net> <4CAE13F6.2010003@candelatech.com> <4CAE1DFB.303@candelatech.com> <1286479642.20974.32.camel@jlt3.sipsolutions.net> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Sender: linux-wireless-owner@vger.kernel.org List-ID: On 10/07/2010 02:31 PM, Luis R. Rodriguez wrote: > On Thu, Oct 7, 2010 at 12:27 PM, Johannes Berg > wrote: >> On Thu, 2010-10-07 at 12:22 -0700, Ben Greear wrote: >> >>> After reboot, and re-run of the script, >>> I saw this in the logs, and shortly after, >>> the SLUB poison warning dumped to screen. >>> >>> Maybe those DMA errors are serious? >> >>> ath: Failed to stop TX DMA in 100 msec after killing last frame >>> ath: Failed to stop TX DMA. Resetting hardware! >> >> That's TX DMA, it can hardly result in invalid memory writes like the >> ones you've been seeing. >> >> I'm still convinced something is wrong with ath9k RX DMA, as you've seen >> the contents of frames written to already freed memory regions. Since I >> don't know anything about ath9k, you should probably not rely on me >> though :-) > > I'm on this now. Lets play. > > I had to remove /lib/udev/rules.d/75-persistent-net-generator.rules > to avoid Ubuntu trying to remember the device names and it creating > stax_rename names. Right, we disable udev for 'sta*' devices. > I just ran your script with some modifications. You can find it here: > > http://www.kernel.org/pub/linux/kernel/people/mcgrof/scripts/poo.pl Can you post your kernel .config somewhere, and confirm which kernel you are using? Also, what ath9k NIC, platform, etc? We see the problem on two different systems (haven't tried more). I can figure out the brands of the NICs if that helps, and have included lspci information below. I've uploaded my kernel config to here: http://www.candelatech.com/~greearb/ctwl_kernel.cfg * Dual core Intel Pentium-D 32-bit 2GB RAM Fedora 13 (but with custom compiled top-of-tree iw, hostap, libnl Atheros NIC: from lspci -vv: 08:01.0 Network controller: Atheros Communications Inc. AR922X Wireless Network Adapter (rev 01) Subsystem: Device 0777:4002 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B+ DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- > I then ran: > > for i in $(seq 0 31) ; do sudo dhclient seq$i; done > > It took about 10 minutes to get IP addresses for all interfaces but it > got there eventually. Odd enough I was unable to ping the AP from any > interface though. Not sure what that was about. But I got no oops, no > slub dump. All I got was some more delba warnings which seems to > indicate we haven't caught all the cases for its use: If you just create one or two interfaces, can you ping as expected? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com