2009-09-11 14:48:49

by Michael Büsch

[permalink] [raw]
Subject: mac80211: NOHZ: local_softirq_pending 08

Hi,

mac80211 (or some other part of the networking stack) triggers this warning
in the NOHZ code:
NOHZ: local_softirq_pending 08

08 seems to be NET_RX_SOFTIRQ.

It happens, because my test driver b43 handles all RX and TX-status callbacks in
process context. I guess some part of the networking stack expects RX to be in
tasklet and/or softirq context.

We also have a report of this warning in wl1251, so it's probably not a b43 problem.

--
Greetings, Michael.


2009-09-12 16:41:29

by Oliver Hartkopp

[permalink] [raw]
Subject: Re: mac80211: NOHZ: local_softirq_pending 08

Michael Buesch wrote:

>> As there are several users in the kernel do exact this test and call the
>> appropriate netif_rx() function, i would suggest to create a static inline
>> function:
>>
>> static inline int netif_rx_ti(struct sk_buff *skb)
>> {
>> if (in_interrupt())
>> return netif_rx(skb);
>> return netif_rx_ni(skb);
>> }
>>
>> ('ti' for test in_interrupt())
>>
>> in include/linux/netdevice.h
>>
>> What do you think about that?
>
> Yeah, I'm fine with that.
>

Hi Michael,

i cooked a patch that introduces netif_rx_ti() and fixes up the problems in
mac80211 and the CAN subsystem.

Currently i'm pondering whether netif_rx_ti() is needed in all cases or if
there are code sections that'll never be executed from irq-context.

In theses cases netif_rx_ni() should be prefered to netif_rx_ti() to prevent
the obsolete check ...

Is there any of your changes that should better use netif_rx_ni() ?

Regards,
Oliver


Attachments:
net-NOHZ-local_softirq_pending-08.patch (3.53 kB)

2009-09-30 14:54:27

by Johannes Berg

[permalink] [raw]
Subject: Re: mac80211: NOHZ: local_softirq_pending 08

On Wed, 2009-09-30 at 17:47 +0300, Kalle Valo wrote:

> I agree with Michael. The bug is real and I have verified that
> Michael's patch fixes the issue. Better to apply the patch now, it's
> trivial to change the implementation if/when the network stack has
> support for this.

FWIW, I think in mac80211 the in_interrupt() check can never return true
since we postpone all RX to the tasklet. But the tasklet seems to be ok
-- so should it really be in_interrupt()?

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part

2009-09-29 19:30:53

by John W. Linville

[permalink] [raw]
Subject: Re: mac80211: NOHZ: local_softirq_pending 08

On Sat, Sep 12, 2009 at 06:41:12PM +0200, Oliver Hartkopp wrote:

> i cooked a patch that introduces netif_rx_ti() and fixes up the problems in
> mac80211 and the CAN subsystem.

Oliver,

Are you going to send this patch to Dave? If you want me to carry
it instead, please resend it with a proper changelog including a
Signed-off-by line. For that matter, Dave will most certainly want
that as well...

Thanks!

John
--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2009-09-11 16:07:20

by Kalle Valo

[permalink] [raw]
Subject: Re: mac80211: NOHZ: local_softirq_pending 08

Michael Buesch <[email protected]> writes:

>> > mac80211 (or some other part of the networking stack) triggers this
>> > warning in the NOHZ code: NOHZ: local_softirq_pending 08
>> >
>> > 08 seems to be NET_RX_SOFTIRQ.
>> >
>> > It happens, because my test driver b43 handles all RX and TX-status
>> > callbacks in process context. I guess some part of the networking
>> > stack expects RX to be in tasklet and/or softirq context.
>> >
>> > We also have a report of this warning in wl1251, so it's probably not
>> > a b43 problem.
>>
>> Yes, I see this with wl1251. It uses workqueues everywhere.
>>
>
> This patch seems to fix it.

Yes, it does. Thanks.

> Signed-off-by: Michael Buesch <[email protected]>

Tested-by: Kalle Valo <[email protected]>

--
Kalle Valo

2009-09-30 17:51:10

by Oliver Hartkopp

[permalink] [raw]
Subject: Re: mac80211: NOHZ: local_softirq_pending 08

Johannes Berg wrote:
> On Wed, 2009-09-30 at 17:10 +0200, Michael Buesch wrote:
>> On Wednesday 30 September 2009 16:54:26 Johannes Berg wrote:
>>> On Wed, 2009-09-30 at 17:47 +0300, Kalle Valo wrote:
>>>
>>>> I agree with Michael. The bug is real and I have verified that
>>>> Michael's patch fixes the issue. Better to apply the patch now, it's
>>>> trivial to change the implementation if/when the network stack has
>>>> support for this.
>>> FWIW, I think in mac80211 the in_interrupt() check can never return true
>>> since we postpone all RX to the tasklet. But the tasklet seems to be ok
>>> -- so should it really be in_interrupt()?
>> I think a tasklet is also in_interrupt(), because it's a softirq.
>
> Ah, yes, indeed, in_interrupt() vs. in_irq().
>

Oops!

I missed that for my previous patch i added for two occurrences in the CAN
sources.

I'm currently compiling the patch for netif_rx_ti() and will post it in some
minutes (for CAN and mac80211) when it runs without probs.

Regards,
Oliver

2009-09-30 14:47:33

by Kalle Valo

[permalink] [raw]
Subject: Re: mac80211: NOHZ: local_softirq_pending 08

Michael Buesch <[email protected]> writes:

>> I don't know how expensive in_interrupt() is, but it IMO should be
>> avoided when the context for a code section can be determined in
>> another way.
>
> What if we just get the fix merged and discuss later whether it's
> worth to optimize a picosecond or not?? My patch fixes the _bug_.
> You can merge a more "efficient" fix later that saves one or two CPU
> cycles.

I agree with Michael. The bug is real and I have verified that
Michael's patch fixes the issue. Better to apply the patch now, it's
trivial to change the implementation if/when the network stack has
support for this.

--
Kalle Valo

2009-09-11 14:57:55

by Kalle Valo

[permalink] [raw]
Subject: Re: mac80211: NOHZ: local_softirq_pending 08

Michael Buesch <[email protected]> writes:

> Hi,

Hallo,

> mac80211 (or some other part of the networking stack) triggers this
> warning in the NOHZ code: NOHZ: local_softirq_pending 08
>
> 08 seems to be NET_RX_SOFTIRQ.
>
> It happens, because my test driver b43 handles all RX and TX-status
> callbacks in process context. I guess some part of the networking
> stack expects RX to be in tasklet and/or softirq context.
>
> We also have a report of this warning in wl1251, so it's probably not
> a b43 problem.

Yes, I see this with wl1251. It uses workqueues everywhere.

--
Kalle Valo

2009-09-30 11:56:20

by Oliver Hartkopp

[permalink] [raw]
Subject: Re: mac80211: NOHZ: local_softirq_pending 08

John W. Linville wrote:
> On Sat, Sep 12, 2009 at 06:41:12PM +0200, Oliver Hartkopp wrote:
>
>> i cooked a patch that introduces netif_rx_ti() and fixes up the problems in
>> mac80211 and the CAN subsystem.
>
> Oliver,
>
> Are you going to send this patch to Dave? If you want me to carry
> it instead, please resend it with a proper changelog including a
> Signed-off-by line. For that matter, Dave will most certainly want
> that as well...

Hello John,

as i wrote here

http://marc.info/?l=linux-netdev&m=125277885910179&w=2

there are currently only three occurrences of checks that use netif_rx() and
netif_rx_ni() depending on in_interrupt().

And regarding the suggested fix from Michael, that checked every(!) netif_rx()
whether it is in interrupt or not, i was unsure if a netif_tx_ti() would make
sense for only three cases?!?

If you think it makes sense, i can post a patch for that ... but:

Indeed it costs some additional investigation to prove whether netif_rx() or
netif_rx_ni() should be used in each case. But IMHO this has to be done before
providing a pump-gun function that solves the problem without thinking if we
are in irq-context or not. I want to avoid that people are using netif_rx_ti()
as some kind of default ...

I don't know how expensive in_interrupt() is, but it IMO should be avoided
when the context for a code section can be determined in another way.

Regards,
Oliver


2009-09-30 15:21:33

by Johannes Berg

[permalink] [raw]
Subject: Re: mac80211: NOHZ: local_softirq_pending 08

On Wed, 2009-09-30 at 17:10 +0200, Michael Buesch wrote:
> On Wednesday 30 September 2009 16:54:26 Johannes Berg wrote:
> > On Wed, 2009-09-30 at 17:47 +0300, Kalle Valo wrote:
> >
> > > I agree with Michael. The bug is real and I have verified that
> > > Michael's patch fixes the issue. Better to apply the patch now, it's
> > > trivial to change the implementation if/when the network stack has
> > > support for this.
> >
> > FWIW, I think in mac80211 the in_interrupt() check can never return true
> > since we postpone all RX to the tasklet. But the tasklet seems to be ok
> > -- so should it really be in_interrupt()?
>
> I think a tasklet is also in_interrupt(), because it's a softirq.

Ah, yes, indeed, in_interrupt() vs. in_irq().

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part

2009-09-30 19:00:54

by John W. Linville

[permalink] [raw]
Subject: Re: [PATCH] net: fix NOHZ: local_softirq_pending 08

On Wed, Sep 30, 2009 at 08:18:25PM +0200, Oliver Hartkopp wrote:
> Socket buffers that are generated and received inside softirqs or from process
> context must not use netif_rx() that's intended to be used from irq context only.
>
> This patch introduces a new helper function netif_rx_ti(skb) that tests for
> in_interrupt() before invoking netif_rx() or netif_rx_ni().
>
> It fixes the ratelimited kernel warning
>
> NOHZ: local_softirq_pending 08
>
> in the mac80211 and can subsystems.
>
> Signed-off-by: Oliver Hartkopp <[email protected]>

http://bugzilla.kernel.org/show_bug.cgi?id=14278

Acked-by: John W. Linville <[email protected]>

--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2009-09-30 15:10:32

by Michael Büsch

[permalink] [raw]
Subject: Re: mac80211: NOHZ: local_softirq_pending 08

On Wednesday 30 September 2009 16:54:26 Johannes Berg wrote:
> On Wed, 2009-09-30 at 17:47 +0300, Kalle Valo wrote:
>
> > I agree with Michael. The bug is real and I have verified that
> > Michael's patch fixes the issue. Better to apply the patch now, it's
> > trivial to change the implementation if/when the network stack has
> > support for this.
>
> FWIW, I think in mac80211 the in_interrupt() check can never return true
> since we postpone all RX to the tasklet. But the tasklet seems to be ok
> -- so should it really be in_interrupt()?

I think a tasklet is also in_interrupt(), because it's a softirq.
in_interrupt() returns false in process context. The problem appeared when
the b43 driver started passing RX frames while being in process context (threaded IRQ).
It previously was in tasklet (= softirq) context.

--
Greetings, Michael.

2009-09-11 16:07:54

by Oliver Hartkopp

[permalink] [raw]
Subject: Re: mac80211: NOHZ: local_softirq_pending 08

Michael Buesch wrote:
> On Friday 11 September 2009 16:57:54 Kalle Valo wrote:
>> Michael Buesch <[email protected]> writes:
>>
>>> Hi,
>> Hallo,
>>
>>> mac80211 (or some other part of the networking stack) triggers this
>>> warning in the NOHZ code: NOHZ: local_softirq_pending 08
>>>
>>> 08 seems to be NET_RX_SOFTIRQ.
>>>
>>> It happens, because my test driver b43 handles all RX and TX-status
>>> callbacks in process context. I guess some part of the networking
>>> stack expects RX to be in tasklet and/or softirq context.
>>>
>>> We also have a report of this warning in wl1251, so it's probably not
>>> a b43 problem.
>> Yes, I see this with wl1251. It uses workqueues everywhere.
>>
>
> This patch seems to fix it.
>
> Signed-off-by: Michael Buesch <[email protected]>
>
> Index: wireless-testing/net/mac80211/cfg.c
> ===================================================================
> --- wireless-testing.orig/net/mac80211/cfg.c 2009-08-09 18:47:11.000000000 +0200
> +++ wireless-testing/net/mac80211/cfg.c 2009-09-11 16:59:12.000000000 +0200
> @@ -606,7 +606,7 @@ static void ieee80211_send_layer2_update
> skb->dev = sta->sdata->dev;
> skb->protocol = eth_type_trans(skb, sta->sdata->dev);
> memset(skb->cb, 0, sizeof(skb->cb));
> - netif_rx(skb);
> + ieee80211_netif_rx(skb);
> }
>
> static void sta_apply_parameters(struct ieee80211_local *local,
> Index: wireless-testing/net/mac80211/ieee80211_i.h
> ===================================================================
> --- wireless-testing.orig/net/mac80211/ieee80211_i.h 2009-08-23 00:06:41.000000000 +0200
> +++ wireless-testing/net/mac80211/ieee80211_i.h 2009-09-11 17:02:05.000000000 +0200
> @@ -1053,6 +1053,14 @@ void ieee80211_tx_pending(unsigned long
> int ieee80211_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
> int ieee80211_subif_start_xmit(struct sk_buff *skb, struct net_device *dev);
>
> +/* rx handling */
> +static inline int ieee80211_netif_rx(struct sk_buff *skb)
> +{
> + if (in_interrupt())
> + return netif_rx(skb);
> + return netif_rx_ni(skb);
> +}
> +

Hello Michael,

i know this NOHZ warning from the CAN stack also - but now, i know what caused
this warning. I fixed it in my local tree and it works. Thanks!

As there are several users in the kernel do exact this test and call the
appropriate netif_rx() function, i would suggest to create a static inline
function:

static inline int netif_rx_ti(struct sk_buff *skb)
{
if (in_interrupt())
return netif_rx(skb);
return netif_rx_ni(skb);
}

('ti' for test in_interrupt())

in include/linux/netdevice.h

What do you think about that?

Regards,
Oliver

2009-09-30 18:18:23

by Oliver Hartkopp

[permalink] [raw]
Subject: [PATCH] net: fix NOHZ: local_softirq_pending 08

Socket buffers that are generated and received inside softirqs or from process
context must not use netif_rx() that's intended to be used from irq context only.

This patch introduces a new helper function netif_rx_ti(skb) that tests for
in_interrupt() before invoking netif_rx() or netif_rx_ni().

It fixes the ratelimited kernel warning

NOHZ: local_softirq_pending 08

in the mac80211 and can subsystems.

Signed-off-by: Oliver Hartkopp <[email protected]>

---




Attachments:
net-NOHZ-local_softirq_pending-08.patch (3.94 kB)

2009-09-30 14:33:20

by Michael Büsch

[permalink] [raw]
Subject: Re: mac80211: NOHZ: local_softirq_pending 08

On Wednesday 30 September 2009 13:56:12 Oliver Hartkopp wrote:
> John W. Linville wrote:
> > On Sat, Sep 12, 2009 at 06:41:12PM +0200, Oliver Hartkopp wrote:
> >
> >> i cooked a patch that introduces netif_rx_ti() and fixes up the problems in
> >> mac80211 and the CAN subsystem.
> >
> > Oliver,
> >
> > Are you going to send this patch to Dave? If you want me to carry
> > it instead, please resend it with a proper changelog including a
> > Signed-off-by line. For that matter, Dave will most certainly want
> > that as well...
>
> Hello John,
>
> as i wrote here
>
> http://marc.info/?l=linux-netdev&m=125277885910179&w=2
>
> there are currently only three occurrences of checks that use netif_rx() and
> netif_rx_ni() depending on in_interrupt().
>
> And regarding the suggested fix from Michael, that checked every(!) netif_rx()
> whether it is in interrupt or not, i was unsure if a netif_tx_ti() would make
> sense for only three cases?!?
>
> If you think it makes sense, i can post a patch for that ... but:
>
> Indeed it costs some additional investigation to prove whether netif_rx() or
> netif_rx_ni() should be used in each case. But IMHO this has to be done before
> providing a pump-gun function that solves the problem without thinking if we
> are in irq-context or not. I want to avoid that people are using netif_rx_ti()
> as some kind of default ...
>
> I don't know how expensive in_interrupt() is, but it IMO should be avoided
> when the context for a code section can be determined in another way.

What if we just get the fix merged and discuss later whether it's worth to optimize a picosecond or not??
My patch fixes the _bug_. You can merge a more "efficient" fix later that saves one or two CPU cycles.

--
Greetings, Michael.

2009-09-11 15:08:13

by Michael Büsch

[permalink] [raw]
Subject: Re: mac80211: NOHZ: local_softirq_pending 08

On Friday 11 September 2009 16:57:54 Kalle Valo wrote:
> Michael Buesch <[email protected]> writes:
>
> > Hi,
>
> Hallo,
>
> > mac80211 (or some other part of the networking stack) triggers this
> > warning in the NOHZ code: NOHZ: local_softirq_pending 08
> >
> > 08 seems to be NET_RX_SOFTIRQ.
> >
> > It happens, because my test driver b43 handles all RX and TX-status
> > callbacks in process context. I guess some part of the networking
> > stack expects RX to be in tasklet and/or softirq context.
> >
> > We also have a report of this warning in wl1251, so it's probably not
> > a b43 problem.
>
> Yes, I see this with wl1251. It uses workqueues everywhere.
>

This patch seems to fix it.

Signed-off-by: Michael Buesch <[email protected]>

Index: wireless-testing/net/mac80211/cfg.c
===================================================================
--- wireless-testing.orig/net/mac80211/cfg.c 2009-08-09 18:47:11.000000000 +0200
+++ wireless-testing/net/mac80211/cfg.c 2009-09-11 16:59:12.000000000 +0200
@@ -606,7 +606,7 @@ static void ieee80211_send_layer2_update
skb->dev = sta->sdata->dev;
skb->protocol = eth_type_trans(skb, sta->sdata->dev);
memset(skb->cb, 0, sizeof(skb->cb));
- netif_rx(skb);
+ ieee80211_netif_rx(skb);
}

static void sta_apply_parameters(struct ieee80211_local *local,
Index: wireless-testing/net/mac80211/ieee80211_i.h
===================================================================
--- wireless-testing.orig/net/mac80211/ieee80211_i.h 2009-08-23 00:06:41.000000000 +0200
+++ wireless-testing/net/mac80211/ieee80211_i.h 2009-09-11 17:02:05.000000000 +0200
@@ -1053,6 +1053,14 @@ void ieee80211_tx_pending(unsigned long
int ieee80211_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
int ieee80211_subif_start_xmit(struct sk_buff *skb, struct net_device *dev);

+/* rx handling */
+static inline int ieee80211_netif_rx(struct sk_buff *skb)
+{
+ if (in_interrupt())
+ return netif_rx(skb);
+ return netif_rx_ni(skb);
+}
+
/* HT */
void ieee80211_ht_cap_ie_to_sta_ht_cap(struct ieee80211_supported_band *sband,
struct ieee80211_ht_cap *ht_cap_ie,
Index: wireless-testing/net/mac80211/main.c
===================================================================
--- wireless-testing.orig/net/mac80211/main.c 2009-08-23 00:06:41.000000000 +0200
+++ wireless-testing/net/mac80211/main.c 2009-09-11 16:59:35.000000000 +0200
@@ -591,7 +591,7 @@ void ieee80211_tx_status(struct ieee8021
skb2 = skb_clone(skb, GFP_ATOMIC);
if (skb2) {
skb2->dev = prev_dev;
- netif_rx(skb2);
+ ieee80211_netif_rx(skb2);
}
}

@@ -600,7 +600,7 @@ void ieee80211_tx_status(struct ieee8021
}
if (prev_dev) {
skb->dev = prev_dev;
- netif_rx(skb);
+ ieee80211_netif_rx(skb);
skb = NULL;
}
rcu_read_unlock();
Index: wireless-testing/net/mac80211/rx.c
===================================================================
--- wireless-testing.orig/net/mac80211/rx.c 2009-09-04 19:08:05.000000000 +0200
+++ wireless-testing/net/mac80211/rx.c 2009-09-11 17:00:08.000000000 +0200
@@ -309,7 +309,7 @@ ieee80211_rx_monitor(struct ieee80211_lo
skb2 = skb_clone(skb, GFP_ATOMIC);
if (skb2) {
skb2->dev = prev_dev;
- netif_rx(skb2);
+ ieee80211_netif_rx(skb2);
}
}

@@ -320,7 +320,7 @@ ieee80211_rx_monitor(struct ieee80211_lo

if (prev_dev) {
skb->dev = prev_dev;
- netif_rx(skb);
+ ieee80211_netif_rx(skb);
} else
dev_kfree_skb(skb);

@@ -1349,7 +1349,7 @@ ieee80211_deliver_skb(struct ieee80211_r
/* deliver to local stack */
skb->protocol = eth_type_trans(skb, dev);
memset(skb->cb, 0, sizeof(skb->cb));
- netif_rx(skb);
+ ieee80211_netif_rx(skb);
}
}

@@ -1943,7 +1943,7 @@ static void ieee80211_rx_cooked_monitor(
skb2 = skb_clone(skb, GFP_ATOMIC);
if (skb2) {
skb2->dev = prev_dev;
- netif_rx(skb2);
+ ieee80211_netif_rx(skb2);
}
}

@@ -1954,7 +1954,7 @@ static void ieee80211_rx_cooked_monitor(

if (prev_dev) {
skb->dev = prev_dev;
- netif_rx(skb);
+ ieee80211_netif_rx(skb);
skb = NULL;
} else
goto out_free_skb;


--
Greetings, Michael.

2009-09-11 16:13:34

by Michael Büsch

[permalink] [raw]
Subject: Re: mac80211: NOHZ: local_softirq_pending 08

On Friday 11 September 2009 18:07:39 Oliver Hartkopp wrote:
> Michael Buesch wrote:
> > On Friday 11 September 2009 16:57:54 Kalle Valo wrote:
> >> Michael Buesch <[email protected]> writes:
> >>
> >>> Hi,
> >> Hallo,
> >>
> >>> mac80211 (or some other part of the networking stack) triggers this
> >>> warning in the NOHZ code: NOHZ: local_softirq_pending 08
> >>>
> >>> 08 seems to be NET_RX_SOFTIRQ.
> >>>
> >>> It happens, because my test driver b43 handles all RX and TX-status
> >>> callbacks in process context. I guess some part of the networking
> >>> stack expects RX to be in tasklet and/or softirq context.
> >>>
> >>> We also have a report of this warning in wl1251, so it's probably not
> >>> a b43 problem.
> >> Yes, I see this with wl1251. It uses workqueues everywhere.
> >>
> >
> > This patch seems to fix it.
> >
> > Signed-off-by: Michael Buesch <[email protected]>
> >
> > Index: wireless-testing/net/mac80211/cfg.c
> > ===================================================================
> > --- wireless-testing.orig/net/mac80211/cfg.c 2009-08-09 18:47:11.000000000 +0200
> > +++ wireless-testing/net/mac80211/cfg.c 2009-09-11 16:59:12.000000000 +0200
> > @@ -606,7 +606,7 @@ static void ieee80211_send_layer2_update
> > skb->dev = sta->sdata->dev;
> > skb->protocol = eth_type_trans(skb, sta->sdata->dev);
> > memset(skb->cb, 0, sizeof(skb->cb));
> > - netif_rx(skb);
> > + ieee80211_netif_rx(skb);
> > }
> >
> > static void sta_apply_parameters(struct ieee80211_local *local,
> > Index: wireless-testing/net/mac80211/ieee80211_i.h
> > ===================================================================
> > --- wireless-testing.orig/net/mac80211/ieee80211_i.h 2009-08-23 00:06:41.000000000 +0200
> > +++ wireless-testing/net/mac80211/ieee80211_i.h 2009-09-11 17:02:05.000000000 +0200
> > @@ -1053,6 +1053,14 @@ void ieee80211_tx_pending(unsigned long
> > int ieee80211_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
> > int ieee80211_subif_start_xmit(struct sk_buff *skb, struct net_device *dev);
> >
> > +/* rx handling */
> > +static inline int ieee80211_netif_rx(struct sk_buff *skb)
> > +{
> > + if (in_interrupt())
> > + return netif_rx(skb);
> > + return netif_rx_ni(skb);
> > +}
> > +
>
> Hello Michael,
>
> i know this NOHZ warning from the CAN stack also - but now, i know what caused
> this warning. I fixed it in my local tree and it works. Thanks!
>
> As there are several users in the kernel do exact this test and call the
> appropriate netif_rx() function, i would suggest to create a static inline
> function:
>
> static inline int netif_rx_ti(struct sk_buff *skb)
> {
> if (in_interrupt())
> return netif_rx(skb);
> return netif_rx_ni(skb);
> }
>
> ('ti' for test in_interrupt())
>
> in include/linux/netdevice.h
>
> What do you think about that?

Yeah, I'm fine with that.

--
Greetings, Michael.

2009-09-12 16:51:46

by Michael Büsch

[permalink] [raw]
Subject: Re: mac80211: NOHZ: local_softirq_pending 08

On Saturday 12 September 2009 18:41:12 Oliver Hartkopp wrote:
> Michael Buesch wrote:
>
> >> As there are several users in the kernel do exact this test and call the
> >> appropriate netif_rx() function, i would suggest to create a static inline
> >> function:
> >>
> >> static inline int netif_rx_ti(struct sk_buff *skb)
> >> {
> >> if (in_interrupt())
> >> return netif_rx(skb);
> >> return netif_rx_ni(skb);
> >> }
> >>
> >> ('ti' for test in_interrupt())
> >>
> >> in include/linux/netdevice.h
> >>
> >> What do you think about that?
> >
> > Yeah, I'm fine with that.
> >
>
> Hi Michael,
>
> i cooked a patch that introduces netif_rx_ti() and fixes up the problems in
> mac80211 and the CAN subsystem.
>
> Currently i'm pondering whether netif_rx_ti() is needed in all cases or if
> there are code sections that'll never be executed from irq-context.
>
> In theses cases netif_rx_ni() should be prefered to netif_rx_ti() to prevent
> the obsolete check ...
>
> Is there any of your changes that should better use netif_rx_ni() ?
>
> Regards,
> Oliver
>

Well, I'd say this check does not cost much at all.
If I were the net maintainer, I'd get rid of netif_rx_ni() _and_ netif_rx_ti() and
do the check internally in netif_rx().
But as I don't have to decide that, I just want the mac80211 issue fixed.

--
Greetings, Michael.

2009-09-12 18:07:29

by Oliver Hartkopp

[permalink] [raw]
Subject: Re: mac80211: NOHZ: local_softirq_pending 08

Michael Buesch wrote:
> On Saturday 12 September 2009 18:41:12 Oliver Hartkopp wrote:
>> Michael Buesch wrote:
>>
>>>> As there are several users in the kernel do exact this test and call the
>>>> appropriate netif_rx() function, i would suggest to create a static inline
>>>> function:
>>>>
>>>> static inline int netif_rx_ti(struct sk_buff *skb)
>>>> {
>>>> if (in_interrupt())
>>>> return netif_rx(skb);
>>>> return netif_rx_ni(skb);
>>>> }
>>>>
>>>> ('ti' for test in_interrupt())
>>>>
>>>> in include/linux/netdevice.h
>>>>
>>>> What do you think about that?
>>> Yeah, I'm fine with that.
>>>
>> Hi Michael,
>>
>> i cooked a patch that introduces netif_rx_ti() and fixes up the problems in
>> mac80211 and the CAN subsystem.
>>
>> Currently i'm pondering whether netif_rx_ti() is needed in all cases or if
>> there are code sections that'll never be executed from irq-context.
>>
>> In theses cases netif_rx_ni() should be prefered to netif_rx_ti() to prevent
>> the obsolete check ...
>>
>> Is there any of your changes that should better use netif_rx_ni() ?
>>
>> Regards,
>> Oliver
>>
>
> Well, I'd say this check does not cost much at all.
> If I were the net maintainer, I'd get rid of netif_rx_ni() _and_ netif_rx_ti() and
> do the check internally in netif_rx().
> But as I don't have to decide that, I just want the mac80211 issue fixed.
>

Like this?

int netif_rx(struct sk_buff *skb)
{
int err;

if (likely(in_interrupt()))
err = __netif_rx(skb);
else {
preempt_disable();
err = __netif_rx(skb);
if (local_softirq_pending())
do_softirq();
preempt_enable();
}

return err;
}

I don't know how expensive in_interrupt() is ... checking the code does not
give any answers to *me* ;-)

But i found

356 netif_rx()

but only

18 netif_rx_ni()

in the kernel tree.

And three of them check for in_interrupt() before using netif_rx() or
netif_rx_ni() ...


Finally i would tend to introduce netif_rx_ti() in include/linux/netdevice.h
as described above, for the rare code that can be used inside and outside the
irq context.

I assume the affected code in the CAN stuff has to use netif_rx_ni() - but i
will doublecheck that (and prepare a separate CAN patch).

For the mac80211 i would suggest to check whether you really need
netif_rx()/netif_rx_ni()/netif_rx_ti() in all the regarded cases.

I assume always using netif_rx_ti() (as you proposed in the original patch) is
not the most efficient approach.

Best regards,
Oliver


2009-10-01 19:32:32

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] net: fix NOHZ: local_softirq_pending 08

From: Michael Buesch <[email protected]>
Date: Thu, 1 Oct 2009 21:10:32 +0200

> For the benefit of a much bigger critical section? I don't get it
> why this would be any better.

Think about what you are saying when you introduce things
like this into your code:

if (in_interrupt())
foo();
else
bar();

That thing there means you don't know anything about how you'll need
to do locking properly, because you have no idea about even the
context in which your code is executed.

Sure, you can lock for the most stringent case, but that's silly and
wasteful.

2009-10-01 07:08:06

by Oliver Hartkopp

[permalink] [raw]
Subject: Re: [PATCH] net: fix NOHZ: local_softirq_pending 08

David Miller wrote:
> From: Oliver Hartkopp <[email protected]>
> Date: Wed, 30 Sep 2009 20:18:25 +0200
>
>> Socket buffers that are generated and received inside softirqs or from process
>> context must not use netif_rx() that's intended to be used from irq context only.
>>
>> This patch introduces a new helper function netif_rx_ti(skb) that tests for
>> in_interrupt() before invoking netif_rx() or netif_rx_ni().
>>
>> It fixes the ratelimited kernel warning
>>
>> NOHZ: local_softirq_pending 08
>>
>> in the mac80211 and can subsystems.
>>
>> Signed-off-by: Oliver Hartkopp <[email protected]>
>
> I bet all of these code paths can use netif_receive_skb() or
> don't need this conditional blob at all.
>
> Looking at some specific cases in this patch:
>
> 1) VCAN: This RX routine is only invoked from the drivers
> ->ndo_start_xmit() handler, and therefore like the loopback
> driver we know that BH's are already disabled and therefore
> it can always use netif_rx() safely.
>
> Why did you convert this case?
>
> And if this needs to be converted, why doesn't loopback need
> to be?
>
> 2) af_can.c: In what situation will netif_rx_ni() not be appropriate
> here? It should always execute in softirq or user context, now
> hardirq context.
>
> And the list goes on and on, I don't really like this new conditional
> testing of interrupt state.

Hello Dave,

i'm confused about in_interrupt(), in_softirq() and in_irq() as pointed out by
Johannes here:

http://marc.info/?l=linux-wireless&m=125432410405562&w=2

Indeed in the two cases for the CAN stuff (in vcan.c and af_can.c) the skb's
are received in process-context and softirq-context only.

Therefore i used netif_rx_ni() in my last change of this code.

Now i was reading from Johannes that in_interrupt() is used for
hardirq-context /and/ softirq-context, so i was just *unsure* and used the
newly introduced netif_rx_ti() for that, which tests for in_interrupt().

Indeed i'm not really happy with that, as it is always better to check each
receive case in which context it can be used and used exactly the right
function for that.

So when netif_rx_ni() or netif_receive_skb() is the best i can use when in
process-context or in softirq-context, i'll do it with pleasure.

And if it is like this the problematic netif_rx() calls in mac80211 need to be
sorted out in detail also ...

Regards,
Oliver


2009-10-01 19:10:38

by Michael Büsch

[permalink] [raw]
Subject: Re: [PATCH] net: fix NOHZ: local_softirq_pending 08

On Thursday 01 October 2009 20:42:28 Johannes Berg wrote:
> On Thu, 2009-10-01 at 16:04 +0200, Michael Buesch wrote:
> > On Thursday 01 October 2009 01:33:33 David Miller wrote:
> >
> > > I'm not applying this until all of these details are sorted out
> >
> > John, please apply my fix to wireless-testing to get rid of the regression.
> > You can revert it later, if there's a better fix available.
>
> I agree with davem, don't. Just fix the driver to local_bh_disable()
> around the rx function if necessary.

For the benefit of a much bigger critical section? I don't get it why this would be any better.

I _am_ going to do one thing now, however. That is ignoring any regression bugreport.
(Yes, it _is_ a regression for b43)

--
Greetings, Michael.

2009-10-01 18:42:28

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] net: fix NOHZ: local_softirq_pending 08

On Thu, 2009-10-01 at 16:04 +0200, Michael Buesch wrote:
> On Thursday 01 October 2009 01:33:33 David Miller wrote:
>
> > I'm not applying this until all of these details are sorted out
>
> John, please apply my fix to wireless-testing to get rid of the regression.
> You can revert it later, if there's a better fix available.

I agree with davem, don't. Just fix the driver to local_bh_disable()
around the rx function if necessary.

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part

2009-10-01 14:24:13

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] net: fix NOHZ: local_softirq_pending 08

Michael Buesch <[email protected]> writes:

> On Thursday 01 October 2009 01:33:33 David Miller wrote:
>
>> I'm not applying this until all of these details are sorted out
>
> John, please apply my fix to wireless-testing to get rid of the
> regression. You can revert it later, if there's a better fix
> available.

I agree, please take Michael's patch. It's trivial to change mac80211
part whenever there's better support available.

But I don't think this is a regression because I see the bug also with
2.6.28, most probably it has been in mac80211 forever. But it's still
a bug which needs to be fixed.

--
Kalle Valo

2009-09-30 23:33:14

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] net: fix NOHZ: local_softirq_pending 08

From: Oliver Hartkopp <[email protected]>
Date: Wed, 30 Sep 2009 20:18:25 +0200

> Socket buffers that are generated and received inside softirqs or from process
> context must not use netif_rx() that's intended to be used from irq context only.
>
> This patch introduces a new helper function netif_rx_ti(skb) that tests for
> in_interrupt() before invoking netif_rx() or netif_rx_ni().
>
> It fixes the ratelimited kernel warning
>
> NOHZ: local_softirq_pending 08
>
> in the mac80211 and can subsystems.
>
> Signed-off-by: Oliver Hartkopp <[email protected]>

I bet all of these code paths can use netif_receive_skb() or
don't need this conditional blob at all.

Looking at some specific cases in this patch:

1) VCAN: This RX routine is only invoked from the drivers
->ndo_start_xmit() handler, and therefore like the loopback
driver we know that BH's are already disabled and therefore
it can always use netif_rx() safely.

Why did you convert this case?

And if this needs to be converted, why doesn't loopback need
to be?

2) af_can.c: In what situation will netif_rx_ni() not be appropriate
here? It should always execute in softirq or user context, now
hardirq context.

And the list goes on and on, I don't really like this new conditional
testing of interrupt state. As always, that's usually a red flag and
as far as I can see these spots where you're changing things are only
trying to receive packets in one of the two possible cases not both.

I'm not applying this until all of these details are sorted out and
you add some verbosity to the commit message explaining each and every
case you are changing, what contexts those cases can be called
from, and from where.

Thanks.

2009-10-01 19:26:36

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH] net: fix NOHZ: local_softirq_pending 08

On Thu, 2009-10-01 at 21:10 +0200, Michael Buesch wrote:

> > I agree with davem, don't. Just fix the driver to local_bh_disable()
> > around the rx function if necessary.
>
> For the benefit of a much bigger critical section? I don't get it why this would be any better.

And how do you know mac80211 is actually safe with this change? It uses
tasklets too. At the very least you'd have to require drivers to never
mix & match the regular/irqsafe functions at all.

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part

2009-10-01 14:04:44

by Michael Büsch

[permalink] [raw]
Subject: Re: [PATCH] net: fix NOHZ: local_softirq_pending 08

On Thursday 01 October 2009 01:33:33 David Miller wrote:

> I'm not applying this until all of these details are sorted out

John, please apply my fix to wireless-testing to get rid of the regression.
You can revert it later, if there's a better fix available.

--
Greetings, Michael.