MIME-Version: 1.0
In-Reply-To: <CAJ-Vmom6LyS173ZgH_6kODx5pmG0k_8CCgBKmCPFd_8=vwiJPw@mail.gmail.com>
References: <1322246431-10825-1-git-send-email-mickflemm@gmail.com>
	<1322246431-10825-2-git-send-email-mickflemm@gmail.com>
	<CAJ-Vmom6LyS173ZgH_6kODx5pmG0k_8CCgBKmCPFd_8=vwiJPw@mail.gmail.com>
Date: Sun, 27 Nov 2011 21:59:18 +0200
Message-ID: <CAFtRNNwqhUXQqxnNKq_Cm7dC_mkw_RSa8DUMXh9y=YYk_yDb2g@mail.gmail.com> (sfid-20111127_205930_511299_CD370874)
Subject: Re: [PATCH v2 01/12] ath5k: Switch from read-and-clear to
 write-to-clear method when handling PISR/SISR registers
From: Nick Kossifidis <mickflemm@gmail.com>
To: Adrian Chadd <adrian@freebsd.org>
Cc: ath5k-devel@lists.ath5k.org, linux-wireless@vger.kernel.org,
	linville@tuxdriver.com, me@bobcopeland.com, mcgrof@gmail.com,
	nbd@openwrt.org, jirislaby@gmail.com
Content-Type: text/plain; charset=UTF-8
Sender: linux-wireless-owner@vger.kernel.org

2011/11/26 Adrian Chadd <adrian@freebsd.org>:
> Hi,
>
> Review #2:
>
> One thing that I discovered whilst debugging some silly races in
> FreeBSD was that the update of the txq active mask wasn't done
> atomically.
>
> Ie, when you check this:
>
> +++ b/drivers/net/wireless/ath/ath5k/base.c
> @@ -1689,7 +1689,7 @@ ath5k_tasklet_tx(unsigned long data)
>       struct ath5k_hw *ah = (void *)data;
>
>       for (i = 0; i < AR5K_NUM_TX_QUEUES; i++)
> -               if (ah->txqs[i].setup && (ah->ah_txq_isr & BIT(i)))
> +               if (ah->txqs[i].setup && (ah->ah_txq_isr_txok_all & BIT(i)))
>                       ath5k_tx_processq(ah, &ah->txqs[i]);
>
> The way that the original code did it was:
>
> * update the HAL TX mask bits inside the HAL getisr routine, so it
> would just update the HAL idea of txqactive;
> * Then, ath_hal_gettxintrtxqs() would pass back to the driver the set
> of txq bits currently active, and reset those that are active (with a
> mask, so you can read only a few)
> * the ath_intr() routine would simply schedule a tasklet call when a
> TX interrupt came in;
> * .. and then the TX completion tasklet would call
> ath_hal_gettxintrtxqs() for each task bit to see if it needed
> servicing.

I'm not aware of ath_hal_gettxintrqs etc, this was the patch I wrote on this...

http://www.mail-archive.com/ath5k-devel@lists.ath5k.org/msg01312.html

> Now, in previous generations of hardware/kernel, this may not have
> raced - ath_intr() may have been a "fast" interrupt handler and thus
> would occur atomically on a single CPU - so you don't have any races.
> But these days, ath_intr() is just a software interrupt and can occur
> in parallel with another CPU executing the completion handler.
>
> So since there was no locking around the TXQ bit updates and clear, it
> was very possible that a very busy TX device would "miss" TX queue
> notifications, as the CPU running the TX completion would
> read-and-clear the TXQ active bits in the HAL, whilst another CPU was
> running ath_intr() which was trying to do a read/modify/write of those
> same bits.
>
> I _think_ felix fixed this in ath9k by hiding this stuff behind the
> PCU lock. I've also fixed it in FreeBSD. I think it's worth
> re-evaluating what is going on in ath5k and maybe add some similar
> locking.
>
> Oh wait. Where are you clearing these bits? It doesn't look like
> you're even clearing the ah_txq_isr bits in the most recent ath5k tree
> I have here. This means you're likely not missing events; but you're
> likely always scanning those queues. Making this code a bit pointless.
> :)

Yup we don't clean them but that's not a bug, we still only check
active queues that produced interrupts, not everything. Anyway we have
txq->lock already so it's not a big deal to implement this I'll post a
patch on top of this patchset.

> Anyway, I'd tidy this all up in a subsequent patchset. Get the
> interrupt changes in, then figure out how to properly lock the
> interrupt TXQ bit set path (the isr) and the clear path (once you've
> tested whether the TXQ needed servicing and then cleared the bit.)
>
> HTH,
>
>
> adrian
>

Thanks again for your comments ;-)


-- 
GPG ID: 0xEE878588
As you read this post global entropy rises. Have Fun ;-)
Nick