2018-02-06 23:05:28

by Ben Greear

[permalink] [raw]
Subject: [PATCH] ath9k: break out of irq handler after 5 jiffies

From: Ben Greear <[email protected]>

In case where the system is sluggish, we should probably break out
early. Maybe this will fix issues where the OS thinks the IRQ handler
is not responding and disables the IRQ because 'nobody cared'

Signed-off-by: Ben Greear <[email protected]>
---
drivers/net/wireless/ath/ath9k/recv.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c
index b90ea2b..274814c 100644
--- a/drivers/net/wireless/ath/ath9k/recv.c
+++ b/drivers/net/wireless/ath/ath9k/recv.c
@@ -1084,6 +1084,7 @@ int ath_rx_tasklet(struct ath_softc *sc, int flush, bool hp)
dma_addr_t new_buf_addr;
unsigned int budget = 512;
struct ieee80211_hdr *hdr;
+ unsigned long expires_jiffies = jiffies + 5;

if (edma)
dma_type = DMA_BIDIRECTIONAL;
@@ -1241,6 +1242,9 @@ int ath_rx_tasklet(struct ath_softc *sc, int flush, bool hp)

if (!budget--)
break;
+
+ if (time_is_before_jiffies(expires_jiffies))
+ break;
} while (1);

if (!(ah->imask & ATH9K_INT_RXEOL)) {
--
2.4.11


2018-02-26 21:39:09

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH] ath9k: break out of irq handler after 5 jiffies

On 02/07/2018 07:39 AM, Ben Greear wrote:
>
>
> On 02/07/2018 02:55 AM, Johannes Berg wrote:
>> On Wed, 2018-02-07 at 10:16 +0100, Felix Fietkau wrote:
>>> On 2018-02-07 00:05, [email protected] wrote:
>>>> From: Ben Greear <[email protected]>
>>>>
>>>> In case where the system is sluggish, we should probably break out
>>>> early. Maybe this will fix issues where the OS thinks the IRQ handler
>>>> is not responding and disables the IRQ because 'nobody cared'
>>>>
>>>> Signed-off-by: Ben Greear <[email protected]>
>>>
>>> 5 jiffies as a hardcoded value is a bad idea, since it produces
>>> different behavior based on CONFIG_HZ.
>
> I figured that was a benefit since it would run shorter duration on systems with
> a faster HZ clock.
>
>>
>> Also, err, NAPI? Or is something else is going on here?
>
> I don't really know, but part of my test was running traffic while creating
> 1200 stations, so likely there were lots of higher-level lock contention that
> slowed down sending pkts up the stack.
>
> I got a bunch of errors about IRQs being ignored because nobody cared. I noticed
> that the ath9k loop could handle up to 500 or so frames, and that seemed like too
> many for my particular test case.
>
> Once I put in this patch, I did not see the 'nobody cared' error again.
>
> There could easily be a better fix. If you all want me to use a fixed time instead
> of HZ, then please suggest a value. I was testing with HZ of 1000, btw.

Hello,

I don't mind changing this patch, but I could use some guidance as to what
values you all want me to use.

Should I use a millisecond based clock instead of jiffies?

What time duration do you want if 5 Jiffies (or 5ms) is not desired?

Thanks,
Ben


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2018-02-07 09:16:15

by Felix Fietkau

[permalink] [raw]
Subject: Re: [PATCH] ath9k: break out of irq handler after 5 jiffies

On 2018-02-07 00:05, [email protected] wrote:
> From: Ben Greear <[email protected]>
>
> In case where the system is sluggish, we should probably break out
> early. Maybe this will fix issues where the OS thinks the IRQ handler
> is not responding and disables the IRQ because 'nobody cared'
>
> Signed-off-by: Ben Greear <[email protected]>
5 jiffies as a hardcoded value is a bad idea, since it produces
different behavior based on CONFIG_HZ.

- Felix

2018-02-07 10:56:00

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] ath9k: break out of irq handler after 5 jiffies

On Wed, 2018-02-07 at 10:16 +0100, Felix Fietkau wrote:
> On 2018-02-07 00:05, [email protected] wrote:
> > From: Ben Greear <[email protected]>
> >
> > In case where the system is sluggish, we should probably break out
> > early. Maybe this will fix issues where the OS thinks the IRQ handler
> > is not responding and disables the IRQ because 'nobody cared'
> >
> > Signed-off-by: Ben Greear <[email protected]>
>
> 5 jiffies as a hardcoded value is a bad idea, since it produces
> different behavior based on CONFIG_HZ.

Also, err, NAPI? Or is something else is going on here?

johannes

2018-02-26 22:08:23

by Arend Van Spriel

[permalink] [raw]
Subject: Re: [PATCH] ath9k: break out of irq handler after 5 jiffies

On 2/26/2018 10:39 PM, Ben Greear wrote:
> On 02/07/2018 07:39 AM, Ben Greear wrote:
>>
>>
>> On 02/07/2018 02:55 AM, Johannes Berg wrote:
>>> On Wed, 2018-02-07 at 10:16 +0100, Felix Fietkau wrote:
>>>> On 2018-02-07 00:05, [email protected] wrote:
>>>>> From: Ben Greear <[email protected]>
>>>>>
>>>>> In case where the system is sluggish, we should probably break out
>>>>> early. Maybe this will fix issues where the OS thinks the IRQ handler
>>>>> is not responding and disables the IRQ because 'nobody cared'
>>>>>
>>>>> Signed-off-by: Ben Greear <[email protected]>
>>>>
>>>> 5 jiffies as a hardcoded value is a bad idea, since it produces
>>>> different behavior based on CONFIG_HZ.
>>
>> I figured that was a benefit since it would run shorter duration on
>> systems with
>> a faster HZ clock.
>>
>>>
>>> Also, err, NAPI? Or is something else is going on here?
>>
>> I don't really know, but part of my test was running traffic while
>> creating
>> 1200 stations, so likely there were lots of higher-level lock
>> contention that
>> slowed down sending pkts up the stack.
>>
>> I got a bunch of errors about IRQs being ignored because nobody
>> cared. I noticed
>> that the ath9k loop could handle up to 500 or so frames, and that
>> seemed like too
>> many for my particular test case.
>>
>> Once I put in this patch, I did not see the 'nobody cared' error again.
>>
>> There could easily be a better fix. If you all want me to use a fixed
>> time instead
>> of HZ, then please suggest a value. I was testing with HZ of 1000, btw.
>
> Hello,
>
> I don't mind changing this patch, but I could use some guidance as to what
> values you all want me to use.
>
> Should I use a millisecond based clock instead of jiffies?
>
> What time duration do you want if 5 Jiffies (or 5ms) is not desired?

Hi Ben,

Instead of using some time unit you could consider breaking out after
handing 'x' number of frames and make 'x' configurable through debugfs.

Regards,
Arend

2018-02-26 22:39:29

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH] ath9k: break out of irq handler after 5 jiffies

On 02/26/2018 02:08 PM, Arend van Spriel wrote:
> On 2/26/2018 10:39 PM, Ben Greear wrote:
>> On 02/07/2018 07:39 AM, Ben Greear wrote:
>>>
>>>
>>> On 02/07/2018 02:55 AM, Johannes Berg wrote:
>>>> On Wed, 2018-02-07 at 10:16 +0100, Felix Fietkau wrote:
>>>>> On 2018-02-07 00:05, [email protected] wrote:
>>>>>> From: Ben Greear <[email protected]>
>>>>>>
>>>>>> In case where the system is sluggish, we should probably break out
>>>>>> early. Maybe this will fix issues where the OS thinks the IRQ handler
>>>>>> is not responding and disables the IRQ because 'nobody cared'
>>>>>>
>>>>>> Signed-off-by: Ben Greear <[email protected]>
>>>>>
>>>>> 5 jiffies as a hardcoded value is a bad idea, since it produces
>>>>> different behavior based on CONFIG_HZ.
>>>
>>> I figured that was a benefit since it would run shorter duration on
>>> systems with
>>> a faster HZ clock.
>>>
>>>>
>>>> Also, err, NAPI? Or is something else is going on here?
>>>
>>> I don't really know, but part of my test was running traffic while
>>> creating
>>> 1200 stations, so likely there were lots of higher-level lock
>>> contention that
>>> slowed down sending pkts up the stack.
>>>
>>> I got a bunch of errors about IRQs being ignored because nobody
>>> cared. I noticed
>>> that the ath9k loop could handle up to 500 or so frames, and that
>>> seemed like too
>>> many for my particular test case.
>>>
>>> Once I put in this patch, I did not see the 'nobody cared' error again.
>>>
>>> There could easily be a better fix. If you all want me to use a fixed
>>> time instead
>>> of HZ, then please suggest a value. I was testing with HZ of 1000, btw.
>>
>> Hello,
>>
>> I don't mind changing this patch, but I could use some guidance as to what
>> values you all want me to use.
>>
>> Should I use a millisecond based clock instead of jiffies?
>>
>> What time duration do you want if 5 Jiffies (or 5ms) is not desired?
>
> Hi Ben,
>
> Instead of using some time unit you could consider breaking out after handing 'x' number of frames and make 'x' configurable through debugfs.

I don't see why you would care about number of pkts...it is just a proxy for time, right?

So, in that case, then using jiffies (or some other fast timer) seems the most useful.

Thanks,
Ben

>
> Regards,
> Arend
>


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2018-02-07 15:39:21

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH] ath9k: break out of irq handler after 5 jiffies



On 02/07/2018 02:55 AM, Johannes Berg wrote:
> On Wed, 2018-02-07 at 10:16 +0100, Felix Fietkau wrote:
>> On 2018-02-07 00:05, [email protected] wrote:
>>> From: Ben Greear <[email protected]>
>>>
>>> In case where the system is sluggish, we should probably break out
>>> early. Maybe this will fix issues where the OS thinks the IRQ handler
>>> is not responding and disables the IRQ because 'nobody cared'
>>>
>>> Signed-off-by: Ben Greear <[email protected]>
>>
>> 5 jiffies as a hardcoded value is a bad idea, since it produces
>> different behavior based on CONFIG_HZ.

I figured that was a benefit since it would run shorter duration on systems with
a faster HZ clock.

>
> Also, err, NAPI? Or is something else is going on here?

I don't really know, but part of my test was running traffic while creating
1200 stations, so likely there were lots of higher-level lock contention that
slowed down sending pkts up the stack.

I got a bunch of errors about IRQs being ignored because nobody cared. I noticed
that the ath9k loop could handle up to 500 or so frames, and that seemed like too
many for my particular test case.

Once I put in this patch, I did not see the 'nobody cared' error again.

There could easily be a better fix. If you all want me to use a fixed time instead
of HZ, then please suggest a value. I was testing with HZ of 1000, btw.

Thanks,
Ben

>
> johannes
>

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com