2012-12-07 15:27:42

by Cedric VONCKEN

[permalink] [raw]
Subject: [RFC] ATH9K: infinite loop in Tasklet

??????????????? Dear mailling list,
???????????????
??????????????? I think there is a possible infinite loop in ATH9K tasklet, and this loop block the linux kernel.

In my test based on freescale MPC8315 cpu, the Wireless card is bridged to Ethernet driver (Gianfar). When I receive data from Wifi (iperf -c xx -b50M) to Ethernet, the Gianfar NAPI poll function is never called because the ath_rx_tasklet always loops.

??????????????? In the ath_rx_tasklet(...), the "while" loops until ath_get_next_rx_buf() or ath_edma_get_next_rx_buf returns a buffer.

??????????????? If these functions always ?return a buffer, this tasklet always runs. While this tasklet runs, all other tasklets are blocked.
To fix it I applied this patch:

--- recv.c???????????? 2012-12-07 14:30:26.000000000 +0100
+++ recv.c.new??????????????? 2012-12-07 14:30:05.364591961 +0100
@@ -1067,6 +1067,7 @@
?????????????? u64 tsf = 0;
?????????????? u32 tsf_lower = 0;
?????????????? unsigned long flags;
+???????????? int count = 150;

??????????????? if (edma)
????????????????????????????? dma_type = DMA_BIDIRECTIONAL;
@@ -1085,6 +1086,10 @@
????????????????????????????? if (test_bit(SC_OP_RXFLUSH, &sc->sc_flags) && (flush == 0))
????????????????????????????????????????????? break;

+???????????????????????????? if(count <= 0)
+??????????????????????????????????????????? break;
+
+???????????????????????????? count --;
????????????????????????????? memset(&rs, 0, sizeof(rs));
????????????????????????????? if (edma)
????????????????????????????????????????????? bf = ath_edma_get_next_rx_buf(sc, &rs, qtype);

I think this solution is not correct, because the tasklet is not rescheduled and we always have data in Rx buffer.

??????????????? Have you any suggestion to fix this issue correctly?

??????????????? Thanks for your help.

??????????????? Best regards.



2012-12-30 14:19:29

by Mohammed Shafi

[permalink] [raw]
Subject: Re: [RFC] ATH9K: infinite loop in Tasklet

On Tue, Dec 11, 2012 at 3:31 PM, Cedric VONCKEN
<[email protected]> wrote:
> Any suggestion to fix correctly this issue ?

having a count to break out from the loop is bit hacky. I would
suggest to enable as many prints and see
why the infinite loop counts(ofcourse you can have a loop count to
figure out and break the loop).

>
> Best regards.
>
> -----Message d'origine-----
> De : Cedric VONCKEN
> Envoy? : vendredi 7 d?cembre 2012 16:28
> ? : '[email protected]'
> Objet : [RFC] ATH9K: infinite loop in Tasklet
>
> Dear mailling list,
>
> I think there is a possible infinite loop in ATH9K tasklet, and this loop block the linux kernel.
>
> In my test based on freescale MPC8315 cpu, the Wireless card is bridged to Ethernet driver (Gianfar). When I receive data from Wifi (iperf -c xx -b50M) to Ethernet, the Gianfar NAPI poll function is never called because the ath_rx_tasklet always loops.
>
> In the ath_rx_tasklet(...), the "while" loops until ath_get_next_rx_buf() or ath_edma_get_next_rx_buf returns a buffer.
>
> If these functions always return a buffer, this tasklet always runs. While this tasklet runs, all other tasklets are blocked.
> To fix it I applied this patch:
>
> --- recv.c 2012-12-07 14:30:26.000000000 +0100
> +++ recv.c.new 2012-12-07 14:30:05.364591961 +0100
> @@ -1067,6 +1067,7 @@
> u64 tsf = 0;
> u32 tsf_lower = 0;
> unsigned long flags;
> + int count = 150;
>
> if (edma)
> dma_type = DMA_BIDIRECTIONAL;
> @@ -1085,6 +1086,10 @@
> if (test_bit(SC_OP_RXFLUSH, &sc->sc_flags) && (flush == 0))
> break;
>
> + if(count <= 0)
> + break;
> +
> + count --;
> memset(&rs, 0, sizeof(rs));
> if (edma)
> bf = ath_edma_get_next_rx_buf(sc, &rs, qtype);
>
> I think this solution is not correct, because the tasklet is not rescheduled and we always have data in Rx buffer.
>
> Have you any suggestion to fix this issue correctly?
>
> Thanks for your help.
>
> Best regards.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html



--
thanks,
shafi

2012-12-11 10:01:39

by Cedric VONCKEN

[permalink] [raw]
Subject: RE: [RFC] ATH9K: infinite loop in Tasklet

Any suggestion to fix correctly this issue ?

Best regards.

-----Message d'origine-----
De?: Cedric VONCKEN
Envoy??: vendredi 7 d?cembre 2012 16:28
??: '[email protected]'
Objet?: [RFC] ATH9K: infinite loop in Tasklet

??????????????? Dear mailling list,
???????????????
??????????????? I think there is a possible infinite loop in ATH9K tasklet, and this loop block the linux kernel.

In my test based on freescale MPC8315 cpu, the Wireless card is bridged to Ethernet driver (Gianfar). When I receive data from Wifi (iperf -c xx -b50M) to Ethernet, the Gianfar NAPI poll function is never called because the ath_rx_tasklet always loops.

??????????????? In the ath_rx_tasklet(...), the "while" loops until ath_get_next_rx_buf() or ath_edma_get_next_rx_buf returns a buffer.

??????????????? If these functions always ?return a buffer, this tasklet always runs. While this tasklet runs, all other tasklets are blocked.
To fix it I applied this patch:

--- recv.c???????????? 2012-12-07 14:30:26.000000000 +0100
+++ recv.c.new??????????????? 2012-12-07 14:30:05.364591961 +0100
@@ -1067,6 +1067,7 @@
?????????????? u64 tsf = 0;
?????????????? u32 tsf_lower = 0;
?????????????? unsigned long flags;
+???????????? int count = 150;

??????????????? if (edma)
????????????????????????????? dma_type = DMA_BIDIRECTIONAL;
@@ -1085,6 +1086,10 @@
????????????????????????????? if (test_bit(SC_OP_RXFLUSH, &sc->sc_flags) && (flush == 0))
????????????????????????????????????????????? break;

+???????????????????????????? if(count <= 0)
+??????????????????????????????????????????? break;
+
+???????????????????????????? count --;
????????????????????????????? memset(&rs, 0, sizeof(rs));
????????????????????????????? if (edma)
????????????????????????????????????????????? bf = ath_edma_get_next_rx_buf(sc, &rs, qtype);

I think this solution is not correct, because the tasklet is not rescheduled and we always have data in Rx buffer.

??????????????? Have you any suggestion to fix this issue correctly?

??????????????? Thanks for your help.

??????????????? Best regards.


2013-01-02 20:13:06

by Adrian Chadd

[permalink] [raw]
Subject: Re: [RFC] ATH9K: infinite loop in Tasklet

On 2 January 2013 09:15, voncken <[email protected]> wrote:
> Yes,
>
> I used an mpc8314 at 400Mhz, with 128 Mbit of RAM for my test.

Ah!

> To find it, I measured the time elapsing in ath_rxbuf_alloc(..) with
> the kernel function local_clock().
>
> With SLOB I found around 123 us (and I have an infinite loop in
> ath9k tasklet, because when I have consumed one packet the next packet is
> ready to rx process)
> With SLAB I found around 22 us
> With SLUB I found around 10 us

Cool. That's significantly more CPU.. :-)

So yes, the right thing to do here is to break out after a limit is
reached, and if we hit that limit, re-schedule that tasklet to run.


Adrian

2013-01-02 20:48:05

by Felix Fietkau

[permalink] [raw]
Subject: Re: [RFC] ATH9K: infinite loop in Tasklet

On 2013-01-02 9:13 PM, Adrian Chadd wrote:
> On 2 January 2013 09:15, voncken <[email protected]> wrote:
>> Yes,
>>
>> I used an mpc8314 at 400Mhz, with 128 Mbit of RAM for my test.
>
> Ah!
>
>> To find it, I measured the time elapsing in ath_rxbuf_alloc(..) with
>> the kernel function local_clock().
>>
>> With SLOB I found around 123 us (and I have an infinite loop in
>> ath9k tasklet, because when I have consumed one packet the next packet is
>> ready to rx process)
>> With SLAB I found around 22 us
>> With SLUB I found around 10 us
>
> Cool. That's significantly more CPU.. :-)
>
> So yes, the right thing to do here is to break out after a limit is
> reached, and if we hit that limit, re-schedule that tasklet to run.
I don't think it's worth adding such a band-aid. NAPI handles this much
better, as the network stack specifies a limit for the number of frames
to be accepted and doesn't poll if it doesn't have room for more, thus
solving this in a much better way.

- Felix


2013-01-02 16:21:58

by Adrian Chadd

[permalink] [raw]
Subject: Re: [RFC] ATH9K: infinite loop in Tasklet

... interesting. The allocator choice is enough to cause your CPU to
run too far behind the workload?



Adrian

On 2 January 2013 05:17, voncken <[email protected]> wrote:
> Thanks for your answer.
>
> I found a workaround for my problem.
>
> In my Linux kernel, the default SLAB allocator was changed in
> default settings.
> SLOB was used in place of SLAB.
> I select the SLAB allocator and that fix the default because it runs
> faster.
>
> However the infinite loop can still happen if the cpu is heavily
> loaded with other tasklets.
>
> Best regards.
>
> Cedric Voncken
> -----Message d'origine-----
> De : [email protected] [mailto:[email protected]] De la part de
> Adrian Chadd
> Envoy? : mercredi 2 janvier 2013 00:41
> ? : Felix Fietkau
> Cc : Mohammed Shafi; Cedric VONCKEN; [email protected]
> Objet : Re: [RFC] ATH9K: infinite loop in Tasklet
>
> On 1 January 2013 07:18, Felix Fietkau <[email protected]> wrote:
>
>> I think the best way to properly fix this issue is to implement NAPI
>> support (which mac80211 already supports).
>
> Sure, but there may be some underlying issues (maybe even on just that
> particular platform) that need to be addressed.
>
> Migrating to NAPI / breaking out of the tasklet early should also be done,
> but it's potentially orthogonal to the OP's problem.
>
>
> Adrian
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-01-01 23:40:55

by Adrian Chadd

[permalink] [raw]
Subject: Re: [RFC] ATH9K: infinite loop in Tasklet

On 1 January 2013 07:18, Felix Fietkau <[email protected]> wrote:

> I think the best way to properly fix this issue is to implement NAPI
> support (which mac80211 already supports).

Sure, but there may be some underlying issues (maybe even on just that
particular platform) that need to be addressed.

Migrating to NAPI / breaking out of the tasklet early should also be
done, but it's potentially orthogonal to the OP's problem.


Adrian

2013-01-02 17:15:46

by Cedric VONCKEN

[permalink] [raw]
Subject: RE: [RFC] ATH9K: infinite loop in Tasklet

Yes,

I used an mpc8314 at 400Mhz, with 128 Mbit of RAM for my test.

To find it, I measured the time elapsing in ath_rxbuf_alloc(..) with
the kernel function local_clock().

With SLOB I found around 123 us (and I have an infinite loop in
ath9k tasklet, because when I have consumed one packet the next packet is
ready to rx process)
With SLAB I found around 22 us
With SLUB I found around 10 us

Cedric Voncken

-----Message d'origine-----
De?: [email protected] [mailto:[email protected]] De la part de
Adrian Chadd
Envoy??: mercredi 2 janvier 2013 17:22
??: voncken
Cc?: Felix Fietkau; Mohammed Shafi; [email protected]
Objet?: Re: [RFC] ATH9K: infinite loop in Tasklet

... interesting. The allocator choice is enough to cause your CPU to run too
far behind the workload?



Adrian

On 2 January 2013 05:17, voncken <[email protected]> wrote:
> Thanks for your answer.
>
> I found a workaround for my problem.
>
> In my Linux kernel, the default SLAB allocator was changed in
> default settings.
> SLOB was used in place of SLAB.
> I select the SLAB allocator and that fix the default because
> it runs faster.
>
> However the infinite loop can still happen if the cpu is
> heavily loaded with other tasklets.
>
> Best regards.
>
> Cedric Voncken
> -----Message d'origine-----
> De : [email protected] [mailto:[email protected]] De la part
> de Adrian Chadd Envoy? : mercredi 2 janvier 2013 00:41 ? : Felix
> Fietkau Cc : Mohammed Shafi; Cedric VONCKEN;
> [email protected] Objet : Re: [RFC] ATH9K: infinite loop
> in Tasklet
>
> On 1 January 2013 07:18, Felix Fietkau <[email protected]> wrote:
>
>> I think the best way to properly fix this issue is to implement NAPI
>> support (which mac80211 already supports).
>
> Sure, but there may be some underlying issues (maybe even on just that
> particular platform) that need to be addressed.
>
> Migrating to NAPI / breaking out of the tasklet early should also be
> done, but it's potentially orthogonal to the OP's problem.
>
>
> Adrian
>
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-wireless" in the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2013-01-02 13:17:53

by Cedric VONCKEN

[permalink] [raw]
Subject: RE: [RFC] ATH9K: infinite loop in Tasklet

Thanks for your answer.

I found a workaround for my problem.

In my Linux kernel, the default SLAB allocator was changed in
default settings.
SLOB was used in place of SLAB.
I select the SLAB allocator and that fix the default because it runs
faster.

However the infinite loop can still happen if the cpu is heavily
loaded with other tasklets.

Best regards.

Cedric Voncken
-----Message d'origine-----
De?: [email protected] [mailto:[email protected]] De la part de
Adrian Chadd
Envoy??: mercredi 2 janvier 2013 00:41
??: Felix Fietkau
Cc?: Mohammed Shafi; Cedric VONCKEN; [email protected]
Objet?: Re: [RFC] ATH9K: infinite loop in Tasklet

On 1 January 2013 07:18, Felix Fietkau <[email protected]> wrote:

> I think the best way to properly fix this issue is to implement NAPI
> support (which mac80211 already supports).

Sure, but there may be some underlying issues (maybe even on just that
particular platform) that need to be addressed.

Migrating to NAPI / breaking out of the tasklet early should also be done,
but it's potentially orthogonal to the OP's problem.


Adrian


2013-01-01 15:19:00

by Felix Fietkau

[permalink] [raw]
Subject: Re: [RFC] ATH9K: infinite loop in Tasklet

On 2013-01-01 5:20 AM, Adrian Chadd wrote:
> On 30 December 2012 06:19, Mohammed Shafi <[email protected]> wrote:
>> On Tue, Dec 11, 2012 at 3:31 PM, Cedric VONCKEN
>> <[email protected]> wrote:
>>> Any suggestion to fix correctly this issue ?
>>
>> having a count to break out from the loop is bit hacky. I would
>> suggest to enable as many prints and see
>> why the infinite loop counts(ofcourse you can have a loop count to
>> figure out and break the loop).
>
> IMHO both need to be done. It's possible that it's taking way too long
> to handle frames in the driver layer. The ath/ath9k driver layer
> assumes that it can dequeue frames from the hardware FASTER than it
> can queue those frames to the network stack layer. If the latter falls
> behind (or both, really) then you can hit this condition.
>
> FWIW, I've done this in FreeBSD's ath(4) driver in both the TX and RX
> path, to ensure that I give the tasklets all a fair crack at being
> scheduled.
I think the best way to properly fix this issue is to implement NAPI
support (which mac80211 already supports).

- Felix


2013-01-02 23:22:31

by Felix Fietkau

[permalink] [raw]
Subject: Re: [RFC] ATH9K: infinite loop in Tasklet

On 2013-01-03 12:12 AM, Adrian Chadd wrote:
> .. in the longer term, sure. Right now, is the driver setup for the
> interrupt hijinx you may need for NAPI? (ie, killing and restarting RX
> interrupts as needed?)
Should be easy enough to add. I'll look into this when I find the time.

- Felix


2013-01-02 23:12:20

by Adrian Chadd

[permalink] [raw]
Subject: Re: [RFC] ATH9K: infinite loop in Tasklet

.. in the longer term, sure. Right now, is the driver setup for the
interrupt hijinx you may need for NAPI? (ie, killing and restarting RX
interrupts as needed?)


Adrian

2013-01-03 01:36:35

by Adrian Chadd

[permalink] [raw]
Subject: Re: [RFC] ATH9K: infinite loop in Tasklet

On 2 January 2013 15:22, Felix Fietkau <[email protected]> wrote:
> On 2013-01-03 12:12 AM, Adrian Chadd wrote:
>> .. in the longer term, sure. Right now, is the driver setup for the
>> interrupt hijinx you may need for NAPI? (ie, killing and restarting RX
>> interrupts as needed?)
> Should be easy enough to add. I'll look into this when I find the time.

Cool, thanks. I'll go back and lurk, at least until there's more
spectral/DFS/IBSS stuff to figure out.



Adrian

2013-01-01 04:25:42

by Adrian Chadd

[permalink] [raw]
Subject: Re: [RFC] ATH9K: infinite loop in Tasklet

On 30 December 2012 06:19, Mohammed Shafi <[email protected]> wrote:
> On Tue, Dec 11, 2012 at 3:31 PM, Cedric VONCKEN
> <[email protected]> wrote:
>> Any suggestion to fix correctly this issue ?
>
> having a count to break out from the loop is bit hacky. I would
> suggest to enable as many prints and see
> why the infinite loop counts(ofcourse you can have a loop count to
> figure out and break the loop).

IMHO both need to be done. It's possible that it's taking way too long
to handle frames in the driver layer. The ath/ath9k driver layer
assumes that it can dequeue frames from the hardware FASTER than it
can queue those frames to the network stack layer. If the latter falls
behind (or both, really) then you can hit this condition.

FWIW, I've done this in FreeBSD's ath(4) driver in both the TX and RX
path, to ensure that I give the tasklets all a fair crack at being
scheduled.



Adrian