Return-path: Received: from mail.candelatech.com ([208.74.158.172]:49944 "EHLO ns3.lanforge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754337Ab0JLD10 (ORCPT ); Mon, 11 Oct 2010 23:27:26 -0400 Message-ID: <4CB3D598.7050904@candelatech.com> Date: Mon, 11 Oct 2010 20:27:20 -0700 From: Ben Greear MIME-Version: 1.0 To: "Luis R. Rodriguez" CC: Johannes Berg , "linux-wireless@vger.kernel.org" Subject: Re: memory clobber in rx path, maybe related to ath9k. References: <4CAB59B2.5050106@candelatech.com> <4CAB5F3D.9060201@candelatech.com> <4CAB627F.8020804@candelatech.com> <4CAB64AD.4080105@candelatech.com> <4CAB6B08.4050801@candelatech.com> <4CAE0474.4090605@candelatech.com> <1286475250.20974.22.camel@jlt3.sipsolutions.net> <4CAE13F6.2010003@candelatech.com> <4CAE1DFB.303@candelatech.com> <1286479642.20974.32.camel@jlt3.sipsolutions.net> <4CB378CD.1080800@candelatech.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Sender: linux-wireless-owner@vger.kernel.org List-ID: On 10/11/2010 06:03 PM, Luis R. Rodriguez wrote: > On Mon, Oct 11, 2010 at 1:51 PM, Ben Greear wrote: >> On 10/07/2010 02:59 PM, Luis R. Rodriguez wrote: >>> >>> On Thu, Oct 7, 2010 at 2:36 PM, Luis R. Rodriguez >>> wrote: >> >>>>> But other than this I got nothing. I left the box sit there for about >>>>> 1 hour and came back and it was still going with no issues. Mind you, >>>>> I can't ping but that seems like another issue. >>>>> >>>>> You can find my logs here: >>>>> >>>>> >>>>> http://www.kernel.org/pub/linux/kernel/people/mcgrof/logs/2010/10-07-stress-sta-01/ >>>> >>>> Doh, I did not have CONFIG_SLUB_DEBUG_ON=y so building now with that. >>> >>> Yay I can reproduce now. I'll be back, going to dig now. >> >> Any luck tracking this down? > > No, today for example I just finished reading e-mail and its already > 6pm PST... But Friday I did get do do a lot of work and testing on > this. The only pattern I see so far is that skb_copy() is used on the > poison all the time. I am not sure if its because skb_copy() happens > to run the poison check or what. I'll work on this tomorrow. I know how that goes. Do you happen to have any magic tools that could be instrumented to show when DMA was happening in the chip, and to see if it somehow happens to dma to something after it is supposedly un-mapped? Another thing I was thinking about: Maybe the queue of skbs and dma addresses in ath9k is getting corrupted by multiple VIFs trying to write at once? Maybe some locking is needed in the xmit path? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com