Return-path: Received: from mail-iw0-f174.google.com ([209.85.214.174]:37417 "EHLO mail-iw0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751083Ab0JMR3d convert rfc822-to-8bit (ORCPT ); Wed, 13 Oct 2010 13:29:33 -0400 Received: by iwn9 with SMTP id 9so47102iwn.19 for ; Wed, 13 Oct 2010 10:29:32 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <4CB5E885.7090305@candelatech.com> References: <4CAB59B2.5050106@candelatech.com> <4CAB64AD.4080105@candelatech.com> <4CAB6B08.4050801@candelatech.com> <4CAE0474.4090605@candelatech.com> <1286475250.20974.22.camel@jlt3.sipsolutions.net> <4CAE13F6.2010003@candelatech.com> <4CAE1DFB.303@candelatech.com> <1286479642.20974.32.camel@jlt3.sipsolutions.net> <4CB378CD.1080800@candelatech.com> <4CB3D598.7050904@candelatech.com> <4CB4AA89.1070009@candelatech.com> <4CB5E885.7090305@candelatech.com> From: "Luis R. Rodriguez" Date: Wed, 13 Oct 2010 10:29:10 -0700 Message-ID: Subject: Re: memory clobber in rx path, maybe related to ath9k. To: Ben Greear Cc: Johannes Berg , "linux-wireless@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-wireless-owner@vger.kernel.org List-ID: On Wed, Oct 13, 2010 at 10:12 AM, Ben Greear wrote: > On 10/12/2010 11:40 AM, Luis R. Rodriguez wrote: >> >> On Tue, Oct 12, 2010 at 11:35 AM, Ben Greear >>  wrote: >>> >>> On 10/11/2010 11:10 PM, Luis R. Rodriguez wrote: >>>> >>>> On Mon, Oct 11, 2010 at 8:27 PM, Ben Greear >>>>  wrote: >>> >>>>> Another thing I was thinking about:  Maybe the queue of skbs and dma >>>>> addresses >>>>> in ath9k is getting corrupted by multiple VIFs trying to write at once? >>>>>  Maybe >>>>> some locking is needed in the xmit path? >>>> >>>> That was my second hunch. My first shot was to use spin_lock_irqsave() >>>> over the the uses of the rxbuf list and that seemed to help but I >>>> still managed to get a poison eventually. My next item to check for is >>>> of the permissibility of creating too much pressure to the point we >>>> end up looping over the rxbuf list and race against mac80211 free'ing >>>> a buffer. Will test that tomorrow if nothing else comes up creeping my >>>> priority queue. >>> >>> This code looks weird to me.  One of the paprd branches >>> deletes the skb, the other doesn't appear to.  Neither >>> null out bf->bf_mpdu, which would appear to leave a dangling >>> pointer in at least the dev_kfree_skb_any() branch. >>> >>> ath_tx_complete frees it's skb in all cases, so another >>> bf->bf_mpdu dangling pointer issue. >>> >>> Maybe at the least we should null out bf->bf_mpdu when >>> skb is consumed? >> >> You're reading my mind, that was what I was going to test today. Still >> doing e-mail sweep though. > > At least in the xmit path, it seems cards that have EDMA support do > things a bit different.  Out of curiosity, on the system(s), you reproduce > this, are any of yours supporting EDMA?  Mine appear to not support EDMA. EDMA is used on >= AR9003 families by Atheros. And no, I am not testing with an EDMA card, I am testing with an AR9002 family card, the AR9280 card. I am going to disregard the TX stuff as the bug is an RX issue :) I was able to more easily reproduce by doing an skb_copy() and free'ing the buffer right afterwards on the ath_send_to_mac80211() thingy, So it does appear that the poison check just happens more often when we do an skb_copy(). One reason this is easy to reproduce with multiple STAs is mac80211 uses skb_copy() to process each received skb for each STA. In my tests so far, protecting the rxbuf list with spin_lock_irqsave() did not help, and the wmb(); didn't either, something else is going on here. It would be nice to hack slab to keep an entire trace of the place the buffer was last free'd at instead of just the caller that freed it. I haven't yet found a pattern on how this happens yet. Luis