DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 247042087E
Message-ID: <3e8017792ea5eaa63f7795d9d37e7dcf85cdc612.camel@sipsolutions.net>
Subject: Re: rt2800 tx frame dropping issue.
From:   Johannes Berg <johannes@sipsolutions.net>
To:     Daniel Santos <daniel.santos@pobox.com>,
        Stanislaw Gruszka <sgruszka@redhat.com>,
        linux-wireless@vger.kernel.org
Date:   Fri, 23 Nov 2018 20:45:54 +0100
In-Reply-To: <bc0bc5ea-dd10-a5a5-87ff-00a93bcf814c@pobox.com> (sfid-20181120_222138_372708_BE6DFB8B)
References: <bc0bc5ea-dd10-a5a5-87ff-00a93bcf814c@pobox.com>
         (sfid-20181120_222138_372708_BE6DFB8B)
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-wireless-owner@vger.kernel.org
Precedence: bulk

On Tue, 2018-11-20 at 15:20 -0600, Daniel Santos wrote:

> I believe I have the answer as to why we're getting frames after we stop
> the queue.  I had a little chat about this in #kernelnewbies and some
> other devs believe it is intentional.
> 
> There is a race in ieee80211_tx_frags (net/mac80211/tx.c) between
> releasing local->queue_stop_reason_lock and calling the driver's tx
> until we lock queue->tx_lock in rt2x00queue_write_tx_frame -- in between
> these two points the thread can be preempted.  So while we stop the
> queue in one thread, there can be 20 other threads in between that have
> already checked that the queue was running and have a frame buffer
> sitting on their stacks.

Not 20, only 1 per netdev queue. I suspect that means just 1 per
hardware queue, but I don't know how rt2x00 maps netdev queues to
hardware queues. If you have lots of netdevs, that might actually be 20,
but I suspect that's not what's going on here.

>   I think it could be fixed with the below
> patch, but I'm being told that it doesn't need it and that the driver
> should just *quietly* drop the frame:

[snip patch]

> Could anybody illuminate for me why this isn't done?  I know that there
> has to be a point where we realize there are too many items in the queue
> and we can't keep up, but this would seem to be a terrible way to do
> that.  Is it one of those things where it isn't worth the performance
> degradation?  Any insights would be most appreciated!! :)

There's just not much point, and doing the locking here will mean it's
across _all_ queues, whereas if you have multiple hardware queues you
wouldn't really need it.

Note that with internal TXQs with fq/codel support etc. we actually have
the fq->lock and it's global, and it turns out that's still a better
deal than actually doing parallel TX. So this may not matter so much.

In any case, while this might solve the problem for the specific case
you're thinking about, as soon as you have something else starting or
stopping queues from outside the TX path it still wouldn't actually
help.

By the way, one thing that can likely exacerbate the problem is
fragmentation, once you have that you're more likely to trigger lots of
frames in close succession.

What I would suggest doing in the driver is actually stop the queues
once a *threshold* is reached, rather than being full. Say if you have
128 entries in the HW queue, you can stop once you reach 120 (you
probably don't really need the other 8 places). If you get a 121st
frame, then you can still put it on the queue (until you filled 128),
but you can also stop the queue again (stopping queues is idempotent).

johannes