Return-path: Received: from mail-yw0-f46.google.com ([209.85.213.46]:39563 "EHLO mail-yw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752167Ab1K0T7U convert rfc822-to-8bit (ORCPT ); Sun, 27 Nov 2011 14:59:20 -0500 Received: by ywa9 with SMTP id 9so2307091ywa.19 for ; Sun, 27 Nov 2011 11:59:19 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: <1322246431-10825-1-git-send-email-mickflemm@gmail.com> <1322246431-10825-2-git-send-email-mickflemm@gmail.com> Date: Sun, 27 Nov 2011 21:59:18 +0200 Message-ID: (sfid-20111127_205930_511299_CD370874) Subject: Re: [PATCH v2 01/12] ath5k: Switch from read-and-clear to write-to-clear method when handling PISR/SISR registers From: Nick Kossifidis To: Adrian Chadd Cc: ath5k-devel@lists.ath5k.org, linux-wireless@vger.kernel.org, linville@tuxdriver.com, me@bobcopeland.com, mcgrof@gmail.com, nbd@openwrt.org, jirislaby@gmail.com Content-Type: text/plain; charset=UTF-8 Sender: linux-wireless-owner@vger.kernel.org List-ID: 2011/11/26 Adrian Chadd : > Hi, > > Review #2: > > One thing that I discovered whilst debugging some silly races in > FreeBSD was that the update of the txq active mask wasn't done > atomically. > > Ie, when you check this: > > +++ b/drivers/net/wireless/ath/ath5k/base.c > @@ -1689,7 +1689,7 @@ ath5k_tasklet_tx(unsigned long data) >       struct ath5k_hw *ah = (void *)data; > >       for (i = 0; i < AR5K_NUM_TX_QUEUES; i++) > -               if (ah->txqs[i].setup && (ah->ah_txq_isr & BIT(i))) > +               if (ah->txqs[i].setup && (ah->ah_txq_isr_txok_all & BIT(i))) >                       ath5k_tx_processq(ah, &ah->txqs[i]); > > The way that the original code did it was: > > * update the HAL TX mask bits inside the HAL getisr routine, so it > would just update the HAL idea of txqactive; > * Then, ath_hal_gettxintrtxqs() would pass back to the driver the set > of txq bits currently active, and reset those that are active (with a > mask, so you can read only a few) > * the ath_intr() routine would simply schedule a tasklet call when a > TX interrupt came in; > * .. and then the TX completion tasklet would call > ath_hal_gettxintrtxqs() for each task bit to see if it needed > servicing. I'm not aware of ath_hal_gettxintrqs etc, this was the patch I wrote on this... http://www.mail-archive.com/ath5k-devel@lists.ath5k.org/msg01312.html > Now, in previous generations of hardware/kernel, this may not have > raced - ath_intr() may have been a "fast" interrupt handler and thus > would occur atomically on a single CPU - so you don't have any races. > But these days, ath_intr() is just a software interrupt and can occur > in parallel with another CPU executing the completion handler. > > So since there was no locking around the TXQ bit updates and clear, it > was very possible that a very busy TX device would "miss" TX queue > notifications, as the CPU running the TX completion would > read-and-clear the TXQ active bits in the HAL, whilst another CPU was > running ath_intr() which was trying to do a read/modify/write of those > same bits. > > I _think_ felix fixed this in ath9k by hiding this stuff behind the > PCU lock. I've also fixed it in FreeBSD. I think it's worth > re-evaluating what is going on in ath5k and maybe add some similar > locking. > > Oh wait. Where are you clearing these bits? It doesn't look like > you're even clearing the ah_txq_isr bits in the most recent ath5k tree > I have here. This means you're likely not missing events; but you're > likely always scanning those queues. Making this code a bit pointless. > :) Yup we don't clean them but that's not a bug, we still only check active queues that produced interrupts, not everything. Anyway we have txq->lock already so it's not a big deal to implement this I'll post a patch on top of this patchset. > Anyway, I'd tidy this all up in a subsequent patchset. Get the > interrupt changes in, then figure out how to properly lock the > interrupt TXQ bit set path (the isr) and the clear path (once you've > tested whether the TXQ needed servicing and then cleared the bit.) > > HTH, > > > adrian > Thanks again for your comments ;-) -- GPG ID: 0xEE878588 As you read this post global entropy rises. Have Fun ;-) Nick