2020-10-08 18:23:30

by John Keeping

[permalink] [raw]
Subject: [PATCH] net: stmmac: Don't call _irqoff() with hardirqs enabled

With threadirqs, stmmac_interrupt() is called on a thread with hardirqs
enabled so we cannot call __napi_schedule_irqoff(). Under lockdep it
leads to:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 285 at kernel/softirq.c:598 __raise_softirq_irqoff+0x6c/0x1c8
IRQs not disabled as expected
Modules linked in: brcmfmac hci_uart btbcm cfg80211 brcmutil
CPU: 0 PID: 285 Comm: irq/41-eth0 Not tainted 5.4.69-rt39 #1
Hardware name: Rockchip (Device Tree)
[<c0110d3c>] (unwind_backtrace) from [<c010c284>] (show_stack+0x10/0x14)
[<c010c284>] (show_stack) from [<c0855504>] (dump_stack+0xa8/0xe0)
[<c0855504>] (dump_stack) from [<c0120a9c>] (__warn+0xe0/0xfc)
[<c0120a9c>] (__warn) from [<c0120e80>] (warn_slowpath_fmt+0x7c/0xa4)
[<c0120e80>] (warn_slowpath_fmt) from [<c01278c8>] (__raise_softirq_irqoff+0x6c/0x1c8)
[<c01278c8>] (__raise_softirq_irqoff) from [<c056bccc>] (stmmac_interrupt+0x388/0x4e0)
[<c056bccc>] (stmmac_interrupt) from [<c0178714>] (irq_forced_thread_fn+0x28/0x64)
[<c0178714>] (irq_forced_thread_fn) from [<c0178924>] (irq_thread+0x124/0x260)
[<c0178924>] (irq_thread) from [<c0142ee8>] (kthread+0x154/0x164)
[<c0142ee8>] (kthread) from [<c01010bc>] (ret_from_fork+0x14/0x38)
Exception stack(0xeb7b5fb0 to 0xeb7b5ff8)
5fa0: 00000000 00000000 00000000 00000000
5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
irq event stamp: 48
hardirqs last enabled at (50): [<c085c200>] prb_unlock+0x7c/0x8c
hardirqs last disabled at (51): [<c085c0dc>] prb_lock+0x58/0x100
softirqs last enabled at (0): [<c011e770>] copy_process+0x550/0x1654
softirqs last disabled at (25): [<c01786ec>] irq_forced_thread_fn+0x0/0x64
---[ end trace 0000000000000002 ]---

Use __napi_schedule() instead which will save & restore the interrupt
state.

Fixes: 4ccb45857c2c ("net: stmmac: Fix NAPI poll in TX path when in multi-queue")
Signed-off-by: John Keeping <[email protected]>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 220626a8d499..c331b829f60a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2145,7 +2145,7 @@ static int stmmac_napi_check(struct stmmac_priv *priv, u32 chan)
spin_lock_irqsave(&ch->lock, flags);
stmmac_disable_dma_irq(priv, priv->ioaddr, chan, 1, 0);
spin_unlock_irqrestore(&ch->lock, flags);
- __napi_schedule_irqoff(&ch->rx_napi);
+ __napi_schedule(&ch->rx_napi);
}
}

@@ -2154,7 +2154,7 @@ static int stmmac_napi_check(struct stmmac_priv *priv, u32 chan)
spin_lock_irqsave(&ch->lock, flags);
stmmac_disable_dma_irq(priv, priv->ioaddr, chan, 0, 1);
spin_unlock_irqrestore(&ch->lock, flags);
- __napi_schedule_irqoff(&ch->tx_napi);
+ __napi_schedule(&ch->tx_napi);
}
}

--
2.28.0


2020-10-09 02:32:09

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH] net: stmmac: Don't call _irqoff() with hardirqs enabled

On Thu, Oct 08, 2020 at 05:27:49PM +0100, John Keeping wrote:
> With threadirqs, stmmac_interrupt() is called on a thread with hardirqs
> enabled so we cannot call __napi_schedule_irqoff(). Under lockdep it
> leads to:
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 285 at kernel/softirq.c:598 __raise_softirq_irqoff+0x6c/0x1c8
> IRQs not disabled as expected
> Modules linked in: brcmfmac hci_uart btbcm cfg80211 brcmutil
> CPU: 0 PID: 285 Comm: irq/41-eth0 Not tainted 5.4.69-rt39 #1
> Hardware name: Rockchip (Device Tree)
> [<c0110d3c>] (unwind_backtrace) from [<c010c284>] (show_stack+0x10/0x14)
> [<c010c284>] (show_stack) from [<c0855504>] (dump_stack+0xa8/0xe0)
> [<c0855504>] (dump_stack) from [<c0120a9c>] (__warn+0xe0/0xfc)
> [<c0120a9c>] (__warn) from [<c0120e80>] (warn_slowpath_fmt+0x7c/0xa4)
> [<c0120e80>] (warn_slowpath_fmt) from [<c01278c8>] (__raise_softirq_irqoff+0x6c/0x1c8)
> [<c01278c8>] (__raise_softirq_irqoff) from [<c056bccc>] (stmmac_interrupt+0x388/0x4e0)
> [<c056bccc>] (stmmac_interrupt) from [<c0178714>] (irq_forced_thread_fn+0x28/0x64)
> [<c0178714>] (irq_forced_thread_fn) from [<c0178924>] (irq_thread+0x124/0x260)
> [<c0178924>] (irq_thread) from [<c0142ee8>] (kthread+0x154/0x164)
> [<c0142ee8>] (kthread) from [<c01010bc>] (ret_from_fork+0x14/0x38)
> Exception stack(0xeb7b5fb0 to 0xeb7b5ff8)
> 5fa0: 00000000 00000000 00000000 00000000
> 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> irq event stamp: 48
> hardirqs last enabled at (50): [<c085c200>] prb_unlock+0x7c/0x8c
> hardirqs last disabled at (51): [<c085c0dc>] prb_lock+0x58/0x100
> softirqs last enabled at (0): [<c011e770>] copy_process+0x550/0x1654
> softirqs last disabled at (25): [<c01786ec>] irq_forced_thread_fn+0x0/0x64
> ---[ end trace 0000000000000002 ]---
>
> Use __napi_schedule() instead which will save & restore the interrupt
> state.
>
> Fixes: 4ccb45857c2c ("net: stmmac: Fix NAPI poll in TX path when in multi-queue")
> Signed-off-by: John Keeping <[email protected]>
> ---

Don't get me wrong, this is so cool that the new lockdep warning is really
helping out finding real bugs, but the patch that adds that warning
(https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=cdabce2e3dff7e4bcef73473987618569d178af3)
isn't in 5.4.69-rt39, is it?

2020-10-09 12:33:46

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH] net: stmmac: Don't call _irqoff() with hardirqs enabled

On Fri, Oct 09, 2020 at 10:59:45AM +0100, John Keeping wrote:
> No, it's not, although I would have saved several days debugging if it
> was! I backported the lockdep warning to prove that it caught this
> issue.
>
> The evidence it is possible to see on vanilla 5.4.x is:
>
> $ trace-cmd report -l
> irq/43-e-280 0....2 74.017658: softirq_raise: vec=3 [action=NET_RX]
>
> Note the missing "d" where this should be "0d...2" to indicate hardirqs
> disabled.

Cool, makes sense.

2020-10-09 14:57:54

by Heiner Kallweit

[permalink] [raw]
Subject: Re: [PATCH] net: stmmac: Don't call _irqoff() with hardirqs enabled

On 08.10.2020 18:27, John Keeping wrote:
> With threadirqs, stmmac_interrupt() is called on a thread with hardirqs
> enabled so we cannot call __napi_schedule_irqoff(). Under lockdep it
> leads to:
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 285 at kernel/softirq.c:598 __raise_softirq_irqoff+0x6c/0x1c8
> IRQs not disabled as expected
> Modules linked in: brcmfmac hci_uart btbcm cfg80211 brcmutil
> CPU: 0 PID: 285 Comm: irq/41-eth0 Not tainted 5.4.69-rt39 #1
> Hardware name: Rockchip (Device Tree)
> [<c0110d3c>] (unwind_backtrace) from [<c010c284>] (show_stack+0x10/0x14)
> [<c010c284>] (show_stack) from [<c0855504>] (dump_stack+0xa8/0xe0)
> [<c0855504>] (dump_stack) from [<c0120a9c>] (__warn+0xe0/0xfc)
> [<c0120a9c>] (__warn) from [<c0120e80>] (warn_slowpath_fmt+0x7c/0xa4)
> [<c0120e80>] (warn_slowpath_fmt) from [<c01278c8>] (__raise_softirq_irqoff+0x6c/0x1c8)
> [<c01278c8>] (__raise_softirq_irqoff) from [<c056bccc>] (stmmac_interrupt+0x388/0x4e0)
> [<c056bccc>] (stmmac_interrupt) from [<c0178714>] (irq_forced_thread_fn+0x28/0x64)
> [<c0178714>] (irq_forced_thread_fn) from [<c0178924>] (irq_thread+0x124/0x260)
> [<c0178924>] (irq_thread) from [<c0142ee8>] (kthread+0x154/0x164)
> [<c0142ee8>] (kthread) from [<c01010bc>] (ret_from_fork+0x14/0x38)
> Exception stack(0xeb7b5fb0 to 0xeb7b5ff8)
> 5fa0: 00000000 00000000 00000000 00000000
> 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> irq event stamp: 48
> hardirqs last enabled at (50): [<c085c200>] prb_unlock+0x7c/0x8c
> hardirqs last disabled at (51): [<c085c0dc>] prb_lock+0x58/0x100
> softirqs last enabled at (0): [<c011e770>] copy_process+0x550/0x1654
> softirqs last disabled at (25): [<c01786ec>] irq_forced_thread_fn+0x0/0x64
> ---[ end trace 0000000000000002 ]---
>
> Use __napi_schedule() instead which will save & restore the interrupt
> state.
>
I'm thinking about a __napi_schedule version that disables hard irq's
conditionally, based on variable force_irqthreads, exported by the irq
subsystem. This would allow to behave correctly with threadirqs set,
whilst not loosing the _irqoff benefit with threadirqs unset.
Let me come up with a proposal.

2020-10-09 16:00:00

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] net: stmmac: Don't call _irqoff() with hardirqs enabled

On Fri, 9 Oct 2020 16:54:06 +0200 Heiner Kallweit wrote:
> I'm thinking about a __napi_schedule version that disables hard irq's
> conditionally, based on variable force_irqthreads, exported by the irq
> subsystem. This would allow to behave correctly with threadirqs set,
> whilst not loosing the _irqoff benefit with threadirqs unset.
> Let me come up with a proposal.

I think you'd need to make napi_schedule_irqoff() behave like that,
right? Are there any uses of napi_schedule_irqoff() that are disabling
irqs and not just running from an irq handler?

2020-10-09 16:08:35

by Heiner Kallweit

[permalink] [raw]
Subject: Re: [PATCH] net: stmmac: Don't call _irqoff() with hardirqs enabled

On 09.10.2020 17:58, Jakub Kicinski wrote:
> On Fri, 9 Oct 2020 16:54:06 +0200 Heiner Kallweit wrote:
>> I'm thinking about a __napi_schedule version that disables hard irq's
>> conditionally, based on variable force_irqthreads, exported by the irq
>> subsystem. This would allow to behave correctly with threadirqs set,
>> whilst not loosing the _irqoff benefit with threadirqs unset.
>> Let me come up with a proposal.
>
> I think you'd need to make napi_schedule_irqoff() behave like that,
> right? Are there any uses of napi_schedule_irqoff() that are disabling
> irqs and not just running from an irq handler?
>
Right, the best approach depends on the answer to the latter question.
I didn't check this yet, therefore I described the least intrusive approach.

2020-10-09 17:38:48

by John Keeping

[permalink] [raw]
Subject: Re: [PATCH] net: stmmac: Don't call _irqoff() with hardirqs enabled

On Fri, 9 Oct 2020 02:46:09 +0300
Vladimir Oltean <[email protected]> wrote:

> On Thu, Oct 08, 2020 at 05:27:49PM +0100, John Keeping wrote:
> > With threadirqs, stmmac_interrupt() is called on a thread with hardirqs
> > enabled so we cannot call __napi_schedule_irqoff(). Under lockdep it
> > leads to:
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 285 at kernel/softirq.c:598 __raise_softirq_irqoff+0x6c/0x1c8
> > IRQs not disabled as expected
> > Modules linked in: brcmfmac hci_uart btbcm cfg80211 brcmutil
> > CPU: 0 PID: 285 Comm: irq/41-eth0 Not tainted 5.4.69-rt39 #1
> > Hardware name: Rockchip (Device Tree)
> > [<c0110d3c>] (unwind_backtrace) from [<c010c284>] (show_stack+0x10/0x14)
> > [<c010c284>] (show_stack) from [<c0855504>] (dump_stack+0xa8/0xe0)
> > [<c0855504>] (dump_stack) from [<c0120a9c>] (__warn+0xe0/0xfc)
> > [<c0120a9c>] (__warn) from [<c0120e80>] (warn_slowpath_fmt+0x7c/0xa4)
> > [<c0120e80>] (warn_slowpath_fmt) from [<c01278c8>] (__raise_softirq_irqoff+0x6c/0x1c8)
> > [<c01278c8>] (__raise_softirq_irqoff) from [<c056bccc>] (stmmac_interrupt+0x388/0x4e0)
> > [<c056bccc>] (stmmac_interrupt) from [<c0178714>] (irq_forced_thread_fn+0x28/0x64)
> > [<c0178714>] (irq_forced_thread_fn) from [<c0178924>] (irq_thread+0x124/0x260)
> > [<c0178924>] (irq_thread) from [<c0142ee8>] (kthread+0x154/0x164)
> > [<c0142ee8>] (kthread) from [<c01010bc>] (ret_from_fork+0x14/0x38)
> > Exception stack(0xeb7b5fb0 to 0xeb7b5ff8)
> > 5fa0: 00000000 00000000 00000000 00000000
> > 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > irq event stamp: 48
> > hardirqs last enabled at (50): [<c085c200>] prb_unlock+0x7c/0x8c
> > hardirqs last disabled at (51): [<c085c0dc>] prb_lock+0x58/0x100
> > softirqs last enabled at (0): [<c011e770>] copy_process+0x550/0x1654
> > softirqs last disabled at (25): [<c01786ec>] irq_forced_thread_fn+0x0/0x64
> > ---[ end trace 0000000000000002 ]---
> >
> > Use __napi_schedule() instead which will save & restore the interrupt
> > state.
> >
> > Fixes: 4ccb45857c2c ("net: stmmac: Fix NAPI poll in TX path when in multi-queue")
> > Signed-off-by: John Keeping <[email protected]>
> > ---
>
> Don't get me wrong, this is so cool that the new lockdep warning is really
> helping out finding real bugs, but the patch that adds that warning
> (https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=cdabce2e3dff7e4bcef73473987618569d178af3)
> isn't in 5.4.69-rt39, is it?

No, it's not, although I would have saved several days debugging if it
was! I backported the lockdep warning to prove that it caught this
issue.

The evidence it is possible to see on vanilla 5.4.x is:

$ trace-cmd report -l
irq/43-e-280 0....2 74.017658: softirq_raise: vec=3 [action=NET_RX]

Note the missing "d" where this should be "0d...2" to indicate hardirqs
disabled.


Regards,
John

2020-10-10 23:10:34

by Heiner Kallweit

[permalink] [raw]
Subject: Re: [PATCH] net: stmmac: Don't call _irqoff() with hardirqs enabled

On 09.10.2020 18:06, Heiner Kallweit wrote:
> On 09.10.2020 17:58, Jakub Kicinski wrote:
>> On Fri, 9 Oct 2020 16:54:06 +0200 Heiner Kallweit wrote:
>>> I'm thinking about a __napi_schedule version that disables hard irq's
>>> conditionally, based on variable force_irqthreads, exported by the irq
>>> subsystem. This would allow to behave correctly with threadirqs set,
>>> whilst not loosing the _irqoff benefit with threadirqs unset.
>>> Let me come up with a proposal.
>>
>> I think you'd need to make napi_schedule_irqoff() behave like that,
>> right? Are there any uses of napi_schedule_irqoff() that are disabling
>> irqs and not just running from an irq handler?
>>
> Right, the best approach depends on the answer to the latter question.
> I didn't check this yet, therefore I described the least intrusive approach.
>

With some help from coccinelle I identified the following functions that
call napi_schedule_irqoff() or __napi_schedule_irqoff() and do not run
from an irq handler (at least not at the first glance).

dpaa2_caam_fqdan_cb
qede_simd_fp_handler
mlx4_en_rx_irq
mlx4_en_tx_irq
qeth_qdio_poll
netvsc_channel_cb
napi_watchdog

2020-10-10 23:12:50

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] net: stmmac: Don't call _irqoff() with hardirqs enabled

On Sat, 10 Oct 2020 15:08:15 +0200 Heiner Kallweit wrote:
> On 09.10.2020 18:06, Heiner Kallweit wrote:
> > On 09.10.2020 17:58, Jakub Kicinski wrote:
> >> On Fri, 9 Oct 2020 16:54:06 +0200 Heiner Kallweit wrote:
> >>> I'm thinking about a __napi_schedule version that disables hard irq's
> >>> conditionally, based on variable force_irqthreads, exported by the irq
> >>> subsystem. This would allow to behave correctly with threadirqs set,
> >>> whilst not loosing the _irqoff benefit with threadirqs unset.
> >>> Let me come up with a proposal.
> >>
> >> I think you'd need to make napi_schedule_irqoff() behave like that,
> >> right? Are there any uses of napi_schedule_irqoff() that are disabling
> >> irqs and not just running from an irq handler?
> >>
> > Right, the best approach depends on the answer to the latter question.
> > I didn't check this yet, therefore I described the least intrusive approach.
> >
>
> With some help from coccinelle I identified the following functions that
> call napi_schedule_irqoff() or __napi_schedule_irqoff() and do not run
> from an irq handler (at least not at the first glance).
>
> dpaa2_caam_fqdan_cb
> qede_simd_fp_handler
> mlx4_en_rx_irq
> mlx4_en_tx_irq

Don't know the others but FWIW the mlx4 ones run from an IRQ handler,
AFAICT:

static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr)
static irqreturn_t mlx4_msi_x_interrupt(int irq, void *eq_ptr)
mlx4_eq_int()
mlx4_cq_completion
cq->comp()

> qeth_qdio_poll
> netvsc_channel_cb
> napi_watchdog

This one runs from a hrtimer, which I believe will be a hard irq
context on anything but RT. I could be wrong.

2020-10-11 18:05:37

by Heiner Kallweit

[permalink] [raw]
Subject: Re: [PATCH] net: stmmac: Don't call _irqoff() with hardirqs enabled

On 10.10.2020 17:22, Jakub Kicinski wrote:
> On Sat, 10 Oct 2020 15:08:15 +0200 Heiner Kallweit wrote:
>> On 09.10.2020 18:06, Heiner Kallweit wrote:
>>> On 09.10.2020 17:58, Jakub Kicinski wrote:
>>>> On Fri, 9 Oct 2020 16:54:06 +0200 Heiner Kallweit wrote:
>>>>> I'm thinking about a __napi_schedule version that disables hard irq's
>>>>> conditionally, based on variable force_irqthreads, exported by the irq
>>>>> subsystem. This would allow to behave correctly with threadirqs set,
>>>>> whilst not loosing the _irqoff benefit with threadirqs unset.
>>>>> Let me come up with a proposal.
>>>>
>>>> I think you'd need to make napi_schedule_irqoff() behave like that,
>>>> right? Are there any uses of napi_schedule_irqoff() that are disabling
>>>> irqs and not just running from an irq handler?
>>>>
>>> Right, the best approach depends on the answer to the latter question.
>>> I didn't check this yet, therefore I described the least intrusive approach.
>>>
>>
>> With some help from coccinelle I identified the following functions that
>> call napi_schedule_irqoff() or __napi_schedule_irqoff() and do not run
>> from an irq handler (at least not at the first glance).
>>
>> dpaa2_caam_fqdan_cb
>> qede_simd_fp_handler
>> mlx4_en_rx_irq
>> mlx4_en_tx_irq
>
> Don't know the others but FWIW the mlx4 ones run from an IRQ handler,
> AFAICT:
>
> static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr)
> static irqreturn_t mlx4_msi_x_interrupt(int irq, void *eq_ptr)
> mlx4_eq_int()
> mlx4_cq_completion
> cq->comp()
>
>> qeth_qdio_poll
>> netvsc_channel_cb
>> napi_watchdog
>
> This one runs from a hrtimer, which I believe will be a hard irq
> context on anything but RT. I could be wrong.
>

Typically forced irq threading will not be enabled, therefore going
back to use napi_schedule() in drivers in most cases will cause
losing the benefit of the irqoff version. Something like the following
should be better. Only small drawback I see is that in case of forced
irq threading hrtimers will still run in hardirq context and we lose
the benefit of the irqoff version in napi_watchdog().

diff --git a/net/core/dev.c b/net/core/dev.c
index a146bac84..7d18560b2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6393,7 +6393,11 @@ EXPORT_SYMBOL(napi_schedule_prep);
*/
void __napi_schedule_irqoff(struct napi_struct *n)
{
- ____napi_schedule(this_cpu_ptr(&softnet_data), n);
+ /* hard irqs may not be masked in case of forced irq threading */
+ if (force_irqthreads)
+ __napi_schedule(n);
+ else
+ ____napi_schedule(this_cpu_ptr(&softnet_data), n);
}
EXPORT_SYMBOL(__napi_schedule_irqoff);

--
2.28.0


2020-10-11 18:09:02

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] net: stmmac: Don't call _irqoff() with hardirqs enabled

On Sun, 11 Oct 2020 11:24:41 +0200 Heiner Kallweit wrote:
> >> qeth_qdio_poll
> >> netvsc_channel_cb
> >> napi_watchdog
> >
> > This one runs from a hrtimer, which I believe will be a hard irq
> > context on anything but RT. I could be wrong.
> >
>
> A similar discussion can be found e.g. here:
> https://lore.kernel.org/netdev/[email protected]/
> However I don't see any actual outcome.

Interesting, hopefully Eric will chime in. I think the hrtimer issue
was solved. But I'm not actually seeing a lockdep_assert_irqs_disabled()
in __raise_softirq_irqoff() in net, so IDK what that's for?

In any case if NAPI thinks it has irqs off while they're not, and
interacts with other parts of the kernel we may be in for a game of
whack-a-mole.

Perhaps a way around touching force_irqthreads directly in net/ would
be some form of a helper like "theaded_local_irq_save" or such that'd
disable IRQs only if force_irqthreads == 1? Is that cheating? :)

2020-10-11 18:10:27

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] net: stmmac: Don't call _irqoff() with hardirqs enabled

On Sun, 11 Oct 2020 15:42:24 +0200 Heiner Kallweit wrote:
> On 10.10.2020 17:22, Jakub Kicinski wrote:
> > On Sat, 10 Oct 2020 15:08:15 +0200 Heiner Kallweit wrote:
> >> On 09.10.2020 18:06, Heiner Kallweit wrote:
> >>> On 09.10.2020 17:58, Jakub Kicinski wrote:
> >>>> On Fri, 9 Oct 2020 16:54:06 +0200 Heiner Kallweit wrote:
> >>>>> I'm thinking about a __napi_schedule version that disables hard irq's
> >>>>> conditionally, based on variable force_irqthreads, exported by the irq
> >>>>> subsystem. This would allow to behave correctly with threadirqs set,
> >>>>> whilst not loosing the _irqoff benefit with threadirqs unset.
> >>>>> Let me come up with a proposal.
> >>>>
> >>>> I think you'd need to make napi_schedule_irqoff() behave like that,
> >>>> right? Are there any uses of napi_schedule_irqoff() that are disabling
> >>>> irqs and not just running from an irq handler?
> >>>>
> >>> Right, the best approach depends on the answer to the latter question.
> >>> I didn't check this yet, therefore I described the least intrusive approach.
> >>>
> >>
> >> With some help from coccinelle I identified the following functions that
> >> call napi_schedule_irqoff() or __napi_schedule_irqoff() and do not run
> >> from an irq handler (at least not at the first glance).
> >>
> >> dpaa2_caam_fqdan_cb
> >> qede_simd_fp_handler
> >> mlx4_en_rx_irq
> >> mlx4_en_tx_irq
> >
> > Don't know the others but FWIW the mlx4 ones run from an IRQ handler,
> > AFAICT:
> >
> > static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr)
> > static irqreturn_t mlx4_msi_x_interrupt(int irq, void *eq_ptr)
> > mlx4_eq_int()
> > mlx4_cq_completion
> > cq->comp()
> >
> >> qeth_qdio_poll
> >> netvsc_channel_cb
> >> napi_watchdog
> >
> > This one runs from a hrtimer, which I believe will be a hard irq
> > context on anything but RT. I could be wrong.
> >
>
> Typically forced irq threading will not be enabled, therefore going
> back to use napi_schedule() in drivers in most cases will cause
> losing the benefit of the irqoff version. Something like the following
> should be better. Only small drawback I see is that in case of forced
> irq threading hrtimers will still run in hardirq context and we lose
> the benefit of the irqoff version in napi_watchdog().
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index a146bac84..7d18560b2 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6393,7 +6393,11 @@ EXPORT_SYMBOL(napi_schedule_prep);
> */
> void __napi_schedule_irqoff(struct napi_struct *n)
> {
> - ____napi_schedule(this_cpu_ptr(&softnet_data), n);
> + /* hard irqs may not be masked in case of forced irq threading */
> + if (force_irqthreads)
> + __napi_schedule(n);
> + else
> + ____napi_schedule(this_cpu_ptr(&softnet_data), n);
> }
> EXPORT_SYMBOL(__napi_schedule_irqoff);

Does

if (force_irqthreads)
local_irq_save(flags);
____napi_schedule(this_cpu_ptr(&softnet_data), n);
if (force_irqthreads)
local_irq_restore(flags);

not produce more concise assembly?

2020-10-11 19:47:00

by Heiner Kallweit

[permalink] [raw]
Subject: Re: [PATCH] net: stmmac: Don't call _irqoff() with hardirqs enabled

On 10.10.2020 17:22, Jakub Kicinski wrote:
> On Sat, 10 Oct 2020 15:08:15 +0200 Heiner Kallweit wrote:
>> On 09.10.2020 18:06, Heiner Kallweit wrote:
>>> On 09.10.2020 17:58, Jakub Kicinski wrote:
>>>> On Fri, 9 Oct 2020 16:54:06 +0200 Heiner Kallweit wrote:
>>>>> I'm thinking about a __napi_schedule version that disables hard irq's
>>>>> conditionally, based on variable force_irqthreads, exported by the irq
>>>>> subsystem. This would allow to behave correctly with threadirqs set,
>>>>> whilst not loosing the _irqoff benefit with threadirqs unset.
>>>>> Let me come up with a proposal.
>>>>
>>>> I think you'd need to make napi_schedule_irqoff() behave like that,
>>>> right? Are there any uses of napi_schedule_irqoff() that are disabling
>>>> irqs and not just running from an irq handler?
>>>>
>>> Right, the best approach depends on the answer to the latter question.
>>> I didn't check this yet, therefore I described the least intrusive approach.
>>>
>>
>> With some help from coccinelle I identified the following functions that
>> call napi_schedule_irqoff() or __napi_schedule_irqoff() and do not run
>> from an irq handler (at least not at the first glance).
>>
>> dpaa2_caam_fqdan_cb
>> qede_simd_fp_handler
>> mlx4_en_rx_irq
>> mlx4_en_tx_irq
>
> Don't know the others but FWIW the mlx4 ones run from an IRQ handler,
> AFAICT:
>
> static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr)
> static irqreturn_t mlx4_msi_x_interrupt(int irq, void *eq_ptr)
> mlx4_eq_int()
> mlx4_cq_completion
> cq->comp()
>
>> qeth_qdio_poll
>> netvsc_channel_cb
>> napi_watchdog
>
> This one runs from a hrtimer, which I believe will be a hard irq
> context on anything but RT. I could be wrong.
>

A similar discussion can be found e.g. here:
https://lore.kernel.org/netdev/[email protected]/
However I don't see any actual outcome.