2006-10-31 22:06:04

by Andy Gospodarek

[permalink] [raw]
Subject: [PATCH] 2.6.19-rc4 - netlink messages created with bad flags in soft_irq context

I've got a kernel built where

CONFIG_DEBUG_SPINLOCK_SLEEP=y

is in the config and I've noticed some interesting behavior when
bringing up bonds in balance-alb mode. When I start to enslave devices
to a bond I get the following in the ring buffer:

BUG: sleeping function called from invalid context at mm/slab.c:3007
in_atomic():1, irqs_disabled():0

along with a nice backtrace of the error that pointed to the cause of
this message. The bonding code was calling for the device to set its
MAC address and the netlink message that would be send as a result of
this notification was being created with the flag GFP_KERNEL instead of
GFP_ATOMIC.

After I did this, I noticed I didn't completely clear the error (since
this call eventually tries to talk the rtnl_lock), but it gets us
closer. I'm still trying to decide how best to approach the remaining
problem and will hopefully post a solution soon, but I wanted to get
this in and/or get some feedback on this patch/direction first.


Signed-off-by: Andy Gospodarek <[email protected]>
---

rtnetlink.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 221e403..93d6fb3 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -159,7 +159,7 @@ int rtnetlink_send(struct sk_buff *skb,
NETLINK_CB(skb).dst_group = group;
if (echo)
atomic_inc(&skb->users);
- netlink_broadcast(rtnl, skb, pid, group, GFP_KERNEL);
+ netlink_broadcast(rtnl, skb, pid, group, GFP_ATOMIC);
if (echo)
err = netlink_unicast(rtnl, skb, pid, MSG_DONTWAIT);
return err;
@@ -589,7 +589,7 @@ #endif /* CONFIG_NET_WIRELESS_RTNETLINK

payload = NLMSG_ALIGN(sizeof(struct ifinfomsg) +
nla_total_size(iw_buf_len));
- nskb = nlmsg_new(nlmsg_total_size(payload), GFP_KERNEL);
+ nskb = nlmsg_new(nlmsg_total_size(payload), GFP_ATOMIC);
if (nskb == NULL) {
err = -ENOBUFS;
goto errout;
@@ -639,7 +639,7 @@ void rtmsg_ifinfo(int type, struct net_d
struct sk_buff *skb;
int err = -ENOBUFS;

- skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
+ skb = nlmsg_new(NLMSG_GOODSIZE, GFP_ATOMIC);
if (skb == NULL)
goto errout;

@@ -649,7 +649,7 @@ void rtmsg_ifinfo(int type, struct net_d
goto errout;
}

- err = rtnl_notify(skb, 0, RTNLGRP_LINK, NULL, GFP_KERNEL);
+ err = rtnl_notify(skb, 0, RTNLGRP_LINK, NULL, GFP_ATOMIC);
errout:
if (err < 0)
rtnl_set_sk_err(RTNLGRP_LINK, err);


2006-11-01 03:51:15

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] 2.6.19-rc4 - netlink messages created with bad flags in soft_irq context

Andy Gospodarek <[email protected]> wrote:
> I've got a kernel built where
>
> CONFIG_DEBUG_SPINLOCK_SLEEP=y
>
> is in the config and I've noticed some interesting behavior when
> bringing up bonds in balance-alb mode. When I start to enslave devices
> to a bond I get the following in the ring buffer:
>
> BUG: sleeping function called from invalid context at mm/slab.c:3007
> in_atomic():1, irqs_disabled():0
>
> along with a nice backtrace of the error that pointed to the cause of
> this message. The bonding code was calling for the device to set its
> MAC address and the netlink message that would be send as a result of
> this notification was being created with the flag GFP_KERNEL instead of
> GFP_ATOMIC.

The bonding driver is known to be broken in places where it tries to
call into the network stack in atomic contexts where it shouldn't.

So please verify whether this is the case here before changing netlink.

Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2006-11-01 06:00:47

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] 2.6.19-rc4 - netlink messages created with bad flags in soft_irq context

From: Andy Gospodarek <[email protected]>
Date: Tue, 31 Oct 2006 17:06:00 -0500

> I've got a kernel built where
>
> CONFIG_DEBUG_SPINLOCK_SLEEP=y
>
> is in the config and I've noticed some interesting behavior when
> bringing up bonds in balance-alb mode. When I start to enslave devices
> to a bond I get the following in the ring buffer:
>
> BUG: sleeping function called from invalid context at mm/slab.c:3007
> in_atomic():1, irqs_disabled():0

As Herbert mentioned, the bonding layer calls into the networking
in atomic contexts when that is illegal.

2006-11-01 13:10:33

by Andy Gospodarek

[permalink] [raw]
Subject: Re: [PATCH] 2.6.19-rc4 - netlink messages created with bad flags in soft_irq context

On Tue, Oct 31, 2006 at 10:00:47PM -0800, David Miller wrote:
> From: Andy Gospodarek <[email protected]>
> Date: Tue, 31 Oct 2006 17:06:00 -0500
>
> > I've got a kernel built where
> >
> > CONFIG_DEBUG_SPINLOCK_SLEEP=y
> >
> > is in the config and I've noticed some interesting behavior when
> > bringing up bonds in balance-alb mode. When I start to enslave devices
> > to a bond I get the following in the ring buffer:
> >
> > BUG: sleeping function called from invalid context at mm/slab.c:3007
> > in_atomic():1, irqs_disabled():0
>
> As Herbert mentioned, the bonding layer calls into the networking
> in atomic contexts when that is illegal.
> -

Thanks for the feedback. If it seems the bonding driver is one of the
only culprits, I'll investigate a solution that is specific to bonding
(maybe a workqueue for such calls...) rather that one that effects the
entire stack.