2008-07-23 21:42:00

by Tomas Winkler

[permalink] [raw]
Subject: Another fragmentation multiqueue kludge

Here is the scenario. This is happening with what we have in
wireless-testing code

When hw queue get full we call
ieeee80211_stop_queue(dev, q);

When this is happens in the middle of the fragmentation . This is
easiest to reproduces with fragmentation but probably happens also in
other cases.

int __ieee80211_tx()
....

if (netif_subqueue_stopped(local->mdev,
tx->extra_frag[i]))
return IEEE80211_TX_FRAG_AGAIN;


The fragment should stored into local->queues_pending[queue]
(Actually I'm not sure it is)

now when we HW queue got cleared a bit we call wake queues where
tx_pending_tasklet is woken up ( queue is not woken up here)

void ieee80211_wake_queue(struct ieee80211_hw *hw, int queue)
{
struct ieee80211_local *local = hw_to_local(hw);

if (test_bit(queue, local->queues_pending)) {
tasklet_schedule(&local->tx_pending_tasklet);
}
..... <snip>
}

The check if the queue is open causes that queue is actually never
woken up so traffic is stalled. However removing this line causes some
other panic I didn't have time to process.

void ieee80211_tx_pending(unsigned long data)
{
<snip>

netif_tx_lock_bh(dev);
for (i = 0; i < ieee80211_num_regular_queues(&local->hw); i++) {
/* Check that this queue is ok */
if (__netif_subqueue_stopped(local->mdev, i))
--- This line causes that fact that the actual wake queue is never
called.
continue;

if (!test_bit(i, local->queues_pending)) {
ieee80211_wake_queue(&local->hw, i);
continue;
}
</snip>


2008-07-24 14:46:30

by Johannes Berg

[permalink] [raw]
Subject: Re: Another fragmentation multiqueue kludge

On Thu, 2008-07-24 at 17:37 +0300, Tomas Winkler wrote:

> >> - struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
> >> + struct ieee80211_tx_info *info;
> >> int ret, i;
> >>
> >> - if (netif_subqueue_stopped(local->mdev, skb))
> >> - return IEEE80211_TX_AGAIN;
> >>
> >> if (skb) {
> >> + if (netif_subqueue_stopped(local->mdev, skb))
> >> + return IEEE80211_TX_AGAIN;
> >> + info = IEEE80211_SKB_CB(skb);
> >> +
> >
> > This isn't right. It means that if you have a stopped queue and pending
> > fragments, you still transmit them, which is not something the driver
> > should have to expect.
>
> This logic was wrong you cannot access skb and then ask if'ts not
> NULL. (it actually creates nice panic in fragmentation)

Oh, indeed, and since the same check is actually in the if
(tx->extra_frag) part, this is fine.

> >> if (test_bit(queue, local->queues_pending)) {
> >> + /* don't call scheduler just open the queue */
> >> + if (ieee80211_is_multiqueue(local)) {
> >> + netif_start_subqueue(local->mdev, queue);
> >> + } else {
> >> + WARN_ON(queue != 0);
> >> + netif_start_queue(local->mdev);
> >> + }
> >> tasklet_schedule(&local->tx_pending_tasklet);
> >> } else {
> >> if (ieee80211_is_multiqueue(local)) {
> >
> > That's not right either, if you have pending fragments/packets then you
> > cannot start the queue.
>
> Exactly this what I wrote are my concern on the other hand
> that's the kludge because the __ieee80211_tx checks whether the queue
> is started so there is no way
> the pending packet is transmitted if you don't start the queue. I have
> new version of the patch that starts
> the queue only in the tasklet. Tasklet is locks tx so the next packet
> will come only after the pending packet is transmitted.

What happens if you just invert the
if (__netif_subqueue_stopped(local->mdev, i))
continue;

check in ieee80211_tx_pending to read
if (!__netif_subqueue_stopped(local->mdev, i))
continue;

as I suggested yesterday?

> I actually can do. I can leave extra 16 TFDs (4 bits for frag num) in
> the queue that wouldn't be a problem, I'm just not sure that iwlwifi
> is the only driers that does it. I didn't encountered such scenario
> but this code looks like also non fragments can be put to pending. Am
> I wrong?

non-fragments can only be put in there if your driver is buggy, because
if you don't have fragmentation turned on then you should know when your
queue is filling up and be able to stop it before you get another frame.

The problem with fragmentation is that mac80211 creates multiple frames
from a single one, while if you don't have fragmentation that doesn't
happen since once you stop the queue there are no pending packets.

If we require drivers to always keep space for 8 or 9 fragments then we
can remove all this requeuing stuff, the only trouble with that may be
in adm8211 since that has to send frames one by one, but we can fix it
in that driver, it only has a single queue so it's much easier.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-07-24 14:59:35

by Johannes Berg

[permalink] [raw]
Subject: Re: Another fragmentation multiqueue kludge

On Thu, 2008-07-24 at 17:55 +0300, Tomas Winkler wrote:

> > What happens if you just invert the
> > if (__netif_subqueue_stopped(local->mdev, i))
> > continue;
>
> > check in ieee80211_tx_pending to read
> > if (!__netif_subqueue_stopped(local->mdev, i))
> > continue;
> >
> > as I suggested yesterday?
>
> This is wrong since you it will actually starts queues that driver
> didn't ask to start in the next line.

Ok.

> As I see it any failure in the driver's tx path will cause moving to
> pending queue, except packets on AMPDU
> queues that will be dropped.

> That's correct for stopping because of queue overhead, not for errors.

Tough luck, just drop packets on errors. If that matters to you, fix the
driver to not run into these errors.

> So till someone fix adm driver consider this patch It worked quite
> well. I will also send patch that fixes the behavior in the iwlwifi
> driver.

If you fix iwlwifi we don't need this patch, and then I can instead
remove the code from mac80211, but we can just as well merge this first
since we'll get rid of it again.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-07-24 15:02:42

by Tomas Winkler

[permalink] [raw]
Subject: Re: Another fragmentation multiqueue kludge

On Thu, Jul 24, 2008 at 5:59 PM, Johannes Berg
<[email protected]> wrote:
> On Thu, 2008-07-24 at 17:55 +0300, Tomas Winkler wrote:
>
>> > What happens if you just invert the
>> > if (__netif_subqueue_stopped(local->mdev, i))
>> > continue;
>>
>> > check in ieee80211_tx_pending to read
>> > if (!__netif_subqueue_stopped(local->mdev, i))
>> > continue;
>> >
>> > as I suggested yesterday?
>>
>> This is wrong since you it will actually starts queues that driver
>> didn't ask to start in the next line.
>
> Ok.
>
>> As I see it any failure in the driver's tx path will cause moving to
>> pending queue, except packets on AMPDU
>> queues that will be dropped.
>
>> That's correct for stopping because of queue overhead, not for errors.
>
> Tough luck, just drop packets on errors. If that matters to you, fix the
> driver to not run into these errors.

I don't really mind. I just want to be environment friendly and don't
want to be asked by other driver maintainers to fix their drivers :)

>> So till someone fix adm driver consider this patch It worked quite
>> well. I will also send patch that fixes the behavior in the iwlwifi
>> driver.
>
> If you fix iwlwifi we don't need this patch, and then I can instead
> remove the code from mac80211, but we can just as well merge this first
> since we'll get rid of it again.

Okay I will post both patches.
Thanks for review
Tomas

2008-07-24 14:37:01

by Tomas Winkler

[permalink] [raw]
Subject: Re: Another fragmentation multiqueue kludge

On Thu, Jul 24, 2008 at 5:16 PM, Johannes Berg
<[email protected]> wrote:
> On Thu, 2008-07-24 at 16:01 +0300, Tomas Winkler wrote:
>
>> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
>> index 2b912cf..db6540e 100644
>> --- a/net/mac80211/tx.c
>> +++ b/net/mac80211/tx.c
>> @@ -1061,13 +1061,15 @@ static int ieee80211_tx_prepare(struct
>> ieee80211_tx_data *tx,
>> static int __ieee80211_tx(struct ieee80211_local *local, struct sk_buff *skb,
>> struct ieee80211_tx_data *tx)
>> {
>> - struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
>> + struct ieee80211_tx_info *info;
>> int ret, i;
>>
>> - if (netif_subqueue_stopped(local->mdev, skb))
>> - return IEEE80211_TX_AGAIN;
>>
>> if (skb) {
>> + if (netif_subqueue_stopped(local->mdev, skb))
>> + return IEEE80211_TX_AGAIN;
>> + info = IEEE80211_SKB_CB(skb);
>> +
>
> This isn't right. It means that if you have a stopped queue and pending
> fragments, you still transmit them, which is not something the driver
> should have to expect.

This logic was wrong you cannot access skb and then ask if'ts not
NULL. (it actually creates nice panic in fragmentation)

>
>> if (test_bit(queue, local->queues_pending)) {
>> + /* don't call scheduler just open the queue */
>> + if (ieee80211_is_multiqueue(local)) {
>> + netif_start_subqueue(local->mdev, queue);
>> + } else {
>> + WARN_ON(queue != 0);
>> + netif_start_queue(local->mdev);
>> + }
>> tasklet_schedule(&local->tx_pending_tasklet);
>> } else {
>> if (ieee80211_is_multiqueue(local)) {
>
> That's not right either, if you have pending fragments/packets then you
> cannot start the queue.

Exactly this what I wrote are my concern on the other hand
that's the kludge because the __ieee80211_tx checks whether the queue
is started so there is no way
the pending packet is transmitted if you don't start the queue. I have
new version of the patch that starts
the queue only in the tasklet. Tasklet is locks tx so the next packet
will come only after the pending packet is transmitted.

>
> I think what you're trying to do here is implement, in mac80211, to not
> stop queues in the middle of a fragmented MSDU, but you really should do
> that in the driver and we just remove all this code.

I actually can do. I can leave extra 16 TFDs (4 bits for frag num) in
the queue that wouldn't be a problem, I'm just not sure that iwlwifi
is the only driers that does it. I didn't encountered such scenario
but this code looks like also non fragments can be put to pending. Am
I wrong?

Thanks
Tomas

2008-07-24 14:55:48

by Tomas Winkler

[permalink] [raw]
Subject: Re: Another fragmentation multiqueue kludge

On Thu, Jul 24, 2008 at 5:46 PM, Johannes Berg
<[email protected]> wrote:
> On Thu, 2008-07-24 at 17:37 +0300, Tomas Winkler wrote:
>
>> >> - struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
>> >> + struct ieee80211_tx_info *info;
>> >> int ret, i;
>> >>
>> >> - if (netif_subqueue_stopped(local->mdev, skb))
>> >> - return IEEE80211_TX_AGAIN;
>> >>
>> >> if (skb) {
>> >> + if (netif_subqueue_stopped(local->mdev, skb))
>> >> + return IEEE80211_TX_AGAIN;
>> >> + info = IEEE80211_SKB_CB(skb);
>> >> +
>> >
>> > This isn't right. It means that if you have a stopped queue and pending
>> > fragments, you still transmit them, which is not something the driver
>> > should have to expect.
>>
>> This logic was wrong you cannot access skb and then ask if'ts not
>> NULL. (it actually creates nice panic in fragmentation)
>
> Oh, indeed, and since the same check is actually in the if
> (tx->extra_frag) part, this is fine.
>
>> >> if (test_bit(queue, local->queues_pending)) {
>> >> + /* don't call scheduler just open the queue */
>> >> + if (ieee80211_is_multiqueue(local)) {
>> >> + netif_start_subqueue(local->mdev, queue);
>> >> + } else {
>> >> + WARN_ON(queue != 0);
>> >> + netif_start_queue(local->mdev);
>> >> + }
>> >> tasklet_schedule(&local->tx_pending_tasklet);
>> >> } else {
>> >> if (ieee80211_is_multiqueue(local)) {
>> >
>> > That's not right either, if you have pending fragments/packets then you
>> > cannot start the queue.
>>
>> Exactly this what I wrote are my concern on the other hand
>> that's the kludge because the __ieee80211_tx checks whether the queue
>> is started so there is no way
>> the pending packet is transmitted if you don't start the queue. I have
>> new version of the patch that starts
>> the queue only in the tasklet. Tasklet is locks tx so the next packet
>> will come only after the pending packet is transmitted.
>
> What happens if you just invert the
> if (__netif_subqueue_stopped(local->mdev, i))
> continue;

> check in ieee80211_tx_pending to read
> if (!__netif_subqueue_stopped(local->mdev, i))
> continue;
>
> as I suggested yesterday?

This is wrong since you it will actually starts queues that driver
didn't ask to start in the next line.



>> I actually can do. I can leave extra 16 TFDs (4 bits for frag num) in
>> the queue that wouldn't be a problem, I'm just not sure that iwlwifi
>> is the only driers that does it. I didn't encountered such scenario
>> but this code looks like also non fragments can be put to pending. Am
>> I wrong?
>
> non-fragments can only be put in there if your driver is buggy, because
> if you don't have fragmentation turned on then you should know when your
> queue is filling up and be able to stop it before you get another frame.
>

As I see it any failure in the driver's tx path will cause moving to
pending queue, except packets on AMPDU
queues that will be dropped.

ret = __ieee80211_tx(local, skb, &tx);
if (ret) {
struct ieee80211_tx_stored_packet *store;

/*
* Since there are no fragmented frames on A-MPDU
* queues, there's no reason for a driver to reject
* a frame there, warn and drop it.
*/
if (WARN_ON(queue >= ieee80211_num_regular_queues(&local->hw)))
goto drop;

store = &local->pending_packet[queue];

> The problem with fragmentation is that mac80211 creates multiple frames
> from a single one, while if you don't have fragmentation that doesn't
> happen since once you stop the queue there are no pending packets.

That's correct for stopping because of queue overhead, not for errors.

> If we require drivers to always keep space for 8 or 9 fragments then we
> can remove all this requeuing stuff, the only trouble with that may be
> in adm8211 since that has to send frames one by one, but we can fix it
> in that driver, it only has a single queue so it's much easier.

So till someone fix adm driver consider this patch It worked quite
well. I will also send patch that fixes the behavior in the iwlwifi
driver.

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 3ac23b8..98e0283 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -580,6 +580,7 @@ struct ieee80211_local {
struct timer_list sta_cleanup;

unsigned long queues_pending[BITS_TO_LONGS(IEEE80211_MAX_QUEUES)];
+ unsigned long queues_pending_run[BITS_TO_LONGS(IEEE80211_MAX_QUEUES)];
struct ieee80211_tx_stored_packet pending_packet[IEEE80211_MAX_QUEUES];
struct tasklet_struct tx_pending_tasklet;

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 2b912cf..4d9f9cb 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1061,13 +1061,15 @@ static int ieee80211_tx_prepare(struct
ieee80211_tx_data *tx,
static int __ieee80211_tx(struct ieee80211_local *local, struct sk_buff *skb,
struct ieee80211_tx_data *tx)
{
- struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
+ struct ieee80211_tx_info *info;
int ret, i;

- if (netif_subqueue_stopped(local->mdev, skb))
- return IEEE80211_TX_AGAIN;

if (skb) {
+ if (netif_subqueue_stopped(local->mdev, skb))
+ return IEEE80211_TX_AGAIN;
+ info = IEEE80211_SKB_CB(skb);
+
ieee80211_dump_frame(wiphy_name(local->hw.wiphy),
"TX to low-level driver", skb);
ret = local->ops->tx(local_to_hw(local), skb);
@@ -1721,14 +1725,19 @@ void ieee80211_tx_pending(unsigned long data)
netif_tx_lock_bh(dev);
for (i = 0; i < ieee80211_num_regular_queues(&local->hw); i++) {
/* Check that this queue is ok */
- if (__netif_subqueue_stopped(local->mdev, i))
+ if (__netif_subqueue_stopped(local->mdev, i) &&
+ !test_bit(i, local->queues_pending_run))
continue;

if (!test_bit(i, local->queues_pending)) {
+ clear_bit(i, local->queues_pending_run);
ieee80211_wake_queue(&local->hw, i);
continue;
}

+ clear_bit(i, local->queues_pending_run);
+ netif_start_subqueue(local->mdev, i);
+
store = &local->pending_packet[i];

tx.extra_frag = store->extra_frag;
tx.num_extra_frag = store->num_extra_frag;
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 89ce4e0..02af3e8 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -361,6 +361,7 @@ void ieee80211_wake_queue(struct ieee80211_hw *hw,
int queue)
struct ieee80211_local *local = hw_to_local(hw);

if (test_bit(queue, local->queues_pending)) {
+ set_bit(queue, local->queues_pending_run);
tasklet_schedule(&local->tx_pending_tasklet);
} else {
if (ieee80211_is_multiqueue(local)) {
--
1.5.4.1

Tomas.

2008-07-24 13:01:05

by Tomas Winkler

[permalink] [raw]
Subject: Re: Another fragmentation multiqueue kludge

On Thu, Jul 24, 2008 at 12:41 AM, Tomas Winkler <[email protected]> wrote:
> Here is the scenario. This is happening with what we have in
> wireless-testing code
>
> When hw queue get full we call
> ieeee80211_stop_queue(dev, q);
>
> When this is happens in the middle of the fragmentation . This is
> easiest to reproduces with fragmentation but probably happens also in
> other cases.
>
> int __ieee80211_tx()
> ....
>
> if (netif_subqueue_stopped(local->mdev,
> tx->extra_frag[i]))
> return IEEE80211_TX_FRAG_AGAIN;
>
>
> The fragment should stored into local->queues_pending[queue]
> (Actually I'm not sure it is)
>
> now when we HW queue got cleared a bit we call wake queues where
> tx_pending_tasklet is woken up ( queue is not woken up here)
>
> void ieee80211_wake_queue(struct ieee80211_hw *hw, int queue)
> {
> struct ieee80211_local *local = hw_to_local(hw);
>
> if (test_bit(queue, local->queues_pending)) {
> tasklet_schedule(&local->tx_pending_tasklet);
> }
> ..... <snip>
> }
>
> The check if the queue is open causes that queue is actually never
> woken up so traffic is stalled. However removing this line causes some
> other panic I didn't have time to process.
>
> void ieee80211_tx_pending(unsigned long data)
> {
> <snip>
>
> netif_tx_lock_bh(dev);
> for (i = 0; i < ieee80211_num_regular_queues(&local->hw); i++) {
> /* Check that this queue is ok */
> if (__netif_subqueue_stopped(local->mdev, i))
> --- This line causes that fact that the actual wake queue is never
> called.
> continue;
>
> if (!test_bit(i, local->queues_pending)) {
> ieee80211_wake_queue(&local->hw, i);
> continue;
> }
> </snip>



This patch solves the problem. Just I'm not sure about not locking a cross
netif_start_subqueue(local->mdev, queue);
pending assumes that no new packets are to be transmitted before
pending packets are cleared out.


This patch make mac80211 transmit correctly fragmented packet after
stopping queues.

Signed-off-by: Tomas Winkler <[email protected]>
---
net/mac80211/tx.c | 8 +++++---
net/mac80211/util.c | 7 +++++++
2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 2b912cf..db6540e 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1061,13 +1061,15 @@ static int ieee80211_tx_prepare(struct
ieee80211_tx_data *tx,
static int __ieee80211_tx(struct ieee80211_local *local, struct sk_buff *skb,
struct ieee80211_tx_data *tx)
{
- struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
+ struct ieee80211_tx_info *info;
int ret, i;

- if (netif_subqueue_stopped(local->mdev, skb))
- return IEEE80211_TX_AGAIN;

if (skb) {
+ if (netif_subqueue_stopped(local->mdev, skb))
+ return IEEE80211_TX_AGAIN;
+ info = IEEE80211_SKB_CB(skb);
+
ieee80211_dump_frame(wiphy_name(local->hw.wiphy),
"TX to low-level driver", skb);
ret = local->ops->tx(local_to_hw(local), skb);
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 89ce4e0..8028744 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -361,6 +361,13 @@ void ieee80211_wake_queue(struct ieee80211_hw
*hw, int queue)
struct ieee80211_local *local = hw_to_local(hw);

if (test_bit(queue, local->queues_pending)) {
+ /* don't call scheduler just open the queue */
+ if (ieee80211_is_multiqueue(local)) {
+ netif_start_subqueue(local->mdev, queue);
+ } else {
+ WARN_ON(queue != 0);
+ netif_start_queue(local->mdev);
+ }
tasklet_schedule(&local->tx_pending_tasklet);
} else {
if (ieee80211_is_multiqueue(local)) {




Tomas

2008-07-24 14:16:23

by Johannes Berg

[permalink] [raw]
Subject: Re: Another fragmentation multiqueue kludge

On Thu, 2008-07-24 at 16:01 +0300, Tomas Winkler wrote:

> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 2b912cf..db6540e 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -1061,13 +1061,15 @@ static int ieee80211_tx_prepare(struct
> ieee80211_tx_data *tx,
> static int __ieee80211_tx(struct ieee80211_local *local, struct sk_buff *skb,
> struct ieee80211_tx_data *tx)
> {
> - struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
> + struct ieee80211_tx_info *info;
> int ret, i;
>
> - if (netif_subqueue_stopped(local->mdev, skb))
> - return IEEE80211_TX_AGAIN;
>
> if (skb) {
> + if (netif_subqueue_stopped(local->mdev, skb))
> + return IEEE80211_TX_AGAIN;
> + info = IEEE80211_SKB_CB(skb);
> +

This isn't right. It means that if you have a stopped queue and pending
fragments, you still transmit them, which is not something the driver
should have to expect.

> if (test_bit(queue, local->queues_pending)) {
> + /* don't call scheduler just open the queue */
> + if (ieee80211_is_multiqueue(local)) {
> + netif_start_subqueue(local->mdev, queue);
> + } else {
> + WARN_ON(queue != 0);
> + netif_start_queue(local->mdev);
> + }
> tasklet_schedule(&local->tx_pending_tasklet);
> } else {
> if (ieee80211_is_multiqueue(local)) {

That's not right either, if you have pending fragments/packets then you
cannot start the queue.

I think what you're trying to do here is implement, in mac80211, to not
stop queues in the middle of a fragmented MSDU, but you really should do
that in the driver and we just remove all this code.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part