With commit 6146b1a4, the dev field in the RLB ARP packet handler was
set to NULL to wildcard and accommodate balancing VLANs on top of bonds.
This has the side-effect of the packet handler being called against
other, non RLB-enabled bonds, and a kernel oops results when it tries to
dereference rx_hashtbl in rlb_update_entry_from_arp(), which won't be
set for those bonds, e.g. active-backup.
With the __netif_receive_skb() changes from commit 1f3c8804, frames
received on VLANs correctly make their way to the bond's handler,
so we no longer need to wildcard the device.
Signed-off-by: Greg Edwards <[email protected]>
---
Jay,
The oops can be reproduced by:
modprobe bonding
echo active-backup > /sys/class/net/bond0/bonding/mode
echo 100 > /sys/class/net/bond0/bonding/miimon
ifconfig bond0 xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx
echo +eth0 > /sys/class/net/bond0/bonding/slaves
echo +eth1 > /sys/class/net/bond0/bonding/slaves
echo +bond1 > /sys/class/net/bonding_masters
echo balance-alb > /sys/class/net/bond1/bonding/mode
echo 100 > /sys/class/net/bond1/bonding/miimon
ifconfig bond1 xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx
echo +eth2 > /sys/class/net/bond1/bonding/slaves
echo +eth3 > /sys/class/net/bond1/bonding/slaves
Pass some traffic on bond0. Boom.
drivers/net/bonding/bond_alb.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index df48307..8d7dfd2 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -822,7 +822,7 @@ static int rlb_initialize(struct bonding *bond)
/*initialize packet type*/
pk_type->type = cpu_to_be16(ETH_P_ARP);
- pk_type->dev = NULL;
+ pk_type->dev = bond->dev;
pk_type->func = rlb_arp_recv;
/* register to receive ARPs */
--
1.7.0.4
On Thu, Jul 22, 2010 at 3:52 PM, Greg Edwards <[email protected]> wrote:
> With commit 6146b1a4, the dev field in the RLB ARP packet handler was
> set to NULL to wildcard and accommodate balancing VLANs on top of
> bonds.
>
> This has the side-effect of the packet handler being called against
> other, non RLB-enabled bonds, and a kernel oops results when it tries
> to
> dereference rx_hashtbl in rlb_update_entry_from_arp(), which won't be
> set for those bonds, e.g. active-backup.
>
> With the __netif_receive_skb() changes from commit 1f3c8804, frames
> received on VLANs correctly make their way to the bond's handler,
> so we no longer need to wildcard the device.
>
I see this problem as well, but I would propose to fix it another way to
not alter the receive path so close to the release of 2.6.35 and to
catch this for 802.3ad bonds as well.
> Signed-off-by: Greg Edwards <[email protected]>
> ---
> Jay,
>
> The oops can be reproduced by:
>
> modprobe bonding
>
> echo active-backup > /sys/class/net/bond0/bonding/mode
> echo 100 > /sys/class/net/bond0/bonding/miimon
> ifconfig bond0 xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx
> echo +eth0 > /sys/class/net/bond0/bonding/slaves
> echo +eth1 > /sys/class/net/bond0/bonding/slaves
>
> echo +bond1 > /sys/class/net/bonding_masters
> echo balance-alb > /sys/class/net/bond1/bonding/mode
> echo 100 > /sys/class/net/bond1/bonding/miimon
> ifconfig bond1 xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx
> echo +eth2 > /sys/class/net/bond1/bonding/slaves
> echo +eth3 > /sys/class/net/bond1/bonding/slaves
>
> Pass some traffic on bond0. Boom.
>
bonding: make sure mode-specific handlers handle appropriate frames
This patch will exit out of rlb_arp_recv and bond_3ad_lacpdu_recv early
if the bond receiving the frame isn't using that mode.
Fixes problem reported by Greg Edwards <[email protected]>.
Signed-off-by: Andy Gospodarek <[email protected]>
---
drivers/net/bonding/bond_3ad.c | 3 ++-
drivers/net/bonding/bond_alb.c | 6 +++---
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 822f586..cf7b08d 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2463,7 +2463,8 @@ int bond_3ad_lacpdu_recv(struct sk_buff *skb, struct net_device *dev, struct pac
struct slave *slave = NULL;
int ret = NET_RX_DROP;
- if (!(dev->flags & IFF_MASTER))
+ if (!(dev->flags & IFF_MASTER) ||
+ (bond->params.mode != BOND_MODE_8023AD))
goto out;
read_lock(&bond->lock);
diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index df48307..a82e38c 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -353,7 +353,7 @@ static void rlb_update_entry_from_arp(struct bonding *bond, struct arp_pkt *arp)
static int rlb_arp_recv(struct sk_buff *skb, struct net_device *bond_dev, struct packet_type *ptype, struct net_device *orig_dev)
{
- struct bonding *bond;
+ struct bonding *bond = netdev_priv(bond_dev);
struct arp_pkt *arp = (struct arp_pkt *)skb->data;
int res = NET_RX_DROP;
@@ -361,7 +361,8 @@ static int rlb_arp_recv(struct sk_buff *skb, struct net_device *bond_dev, struct
bond_dev = vlan_dev_real_dev(bond_dev);
if (!(bond_dev->priv_flags & IFF_BONDING) ||
- !(bond_dev->flags & IFF_MASTER))
+ !(bond_dev->flags & IFF_MASTER) ||
+ (bond->params.mode != BOND_MODE_ALB))
goto out;
if (!arp) {
@@ -376,7 +377,6 @@ static int rlb_arp_recv(struct sk_buff *skb, struct net_device *bond_dev, struct
if (arp->op_code == htons(ARPOP_REPLY)) {
/* update rx hash table for this ARP */
- bond = netdev_priv(bond_dev);
rlb_update_entry_from_arp(bond, arp);
pr_debug("Server received an ARP Reply from client\n");
}
--
1.6.2.5
On Fri, Jul 23, 2010 at 07:34:56PM +0000, Andy Gospodarek wrote:
> On Thu, Jul 22, 2010 at 3:52 PM, Greg Edwards <[email protected]> wrote:
>> With commit 6146b1a4, the dev field in the RLB ARP packet handler was
>> set to NULL to wildcard and accommodate balancing VLANs on top of
>> bonds.
>>
>> This has the side-effect of the packet handler being called against
>> other, non RLB-enabled bonds, and a kernel oops results when it tries
>> to
>> dereference rx_hashtbl in rlb_update_entry_from_arp(), which won't be
>> set for those bonds, e.g. active-backup.
>>
>> With the __netif_receive_skb() changes from commit 1f3c8804, frames
>> received on VLANs correctly make their way to the bond's handler,
>> so we no longer need to wildcard the device.
>
> I see this problem as well, but I would propose to fix it another way to
> not alter the receive path so close to the release of 2.6.35 and to
> catch this for 802.3ad bonds as well.
Is the problem demonstrable with 802.3ad bonds? bond_register_lacpdu()
sets pk_type->dev = bond->dev.
>> Signed-off-by: Greg Edwards <[email protected]>
>> ---
>> Jay,
>>
>> The oops can be reproduced by:
>>
>> modprobe bonding
>>
>> echo active-backup > /sys/class/net/bond0/bonding/mode
>> echo 100 > /sys/class/net/bond0/bonding/miimon
>> ifconfig bond0 xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx
>> echo +eth0 > /sys/class/net/bond0/bonding/slaves
>> echo +eth1 > /sys/class/net/bond0/bonding/slaves
>>
>> echo +bond1 > /sys/class/net/bonding_masters
>> echo balance-alb > /sys/class/net/bond1/bonding/mode
>> echo 100 > /sys/class/net/bond1/bonding/miimon
>> ifconfig bond1 xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx
>> echo +eth2 > /sys/class/net/bond1/bonding/slaves
>> echo +eth3 > /sys/class/net/bond1/bonding/slaves
>>
>> Pass some traffic on bond0. Boom.
>>
>
> bonding: make sure mode-specific handlers handle appropriate frames
>
> This patch will exit out of rlb_arp_recv and bond_3ad_lacpdu_recv early
> if the bond receiving the frame isn't using that mode.
I had originally thought of doing something like this, but it didn't
seem as clean. I don't have strong feelings one way or the other,
though.
Greg
From: Greg Edwards <[email protected]>
After:
commit 6146b1a4da98377e4abddc91ba5856bef8f23f1e
Author: Jay Vosburgh <[email protected]>
Date: Tue Nov 4 17:51:15 2008 -0800
bonding: Fix ALB mode to balance traffic on VLANs
the dev field in the RLB ARP packet handler was set to NULL to wildcard
and accommodate balancing VLANs on top of bonds.
This has the side-effect of the packet handler being called against
other, non RLB-enabled bonds, and a kernel oops results when it tries to
dereference rx_hashtbl in rlb_update_entry_from_arp(), which won't be
set for those bonds, e.g. active-backup.
With the __netif_receive_skb() changes from:
commit 1f3c8804acba841b5573b953f5560d2683d2db0d
Author: Andy Gospodarek <[email protected]>
Date: Mon Dec 14 10:48:58 2009 +0000
bonding: allow arp_ip_targets on separate vlans to use arp validation
frames received on VLANs correctly make their way to the bond's handler,
so we no longer need to wildcard the device.
The oops can be reproduced by:
modprobe bonding
echo active-backup > /sys/class/net/bond0/bonding/mode
echo 100 > /sys/class/net/bond0/bonding/miimon
ifconfig bond0 xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx
echo +eth0 > /sys/class/net/bond0/bonding/slaves
echo +eth1 > /sys/class/net/bond0/bonding/slaves
echo +bond1 > /sys/class/net/bonding_masters
echo balance-alb > /sys/class/net/bond1/bonding/mode
echo 100 > /sys/class/net/bond1/bonding/miimon
ifconfig bond1 xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx
echo +eth2 > /sys/class/net/bond1/bonding/slaves
echo +eth3 > /sys/class/net/bond1/bonding/slaves
Pass some traffic on bond0. Boom.
[ Tested, behaves as advertised. I do not believe a test of the bonding
mode is necessary, as there is no race between the packet handler and
the bonding mode changing (the mode can only change when the device is
closed). Also updated the log message to include the reproduction and
full commit ids. -J ]
Signed-off-by: Greg Edwards <[email protected]>
Signed-off-by: Jay Vosburgh <[email protected]>
---
drivers/net/bonding/bond_alb.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index df48307..8d7dfd2 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -822,7 +822,7 @@ static int rlb_initialize(struct bonding *bond)
/*initialize packet type*/
pk_type->type = cpu_to_be16(ETH_P_ARP);
- pk_type->dev = NULL;
+ pk_type->dev = bond->dev;
pk_type->func = rlb_arp_recv;
/* register to receive ARPs */
--
1.7.0.4
On Fri, Jul 23, 2010 at 01:59:47PM -0600, Greg Edwards wrote:
> On Fri, Jul 23, 2010 at 07:34:56PM +0000, Andy Gospodarek wrote:
> > On Thu, Jul 22, 2010 at 3:52 PM, Greg Edwards <[email protected]> wrote:
> >> With commit 6146b1a4, the dev field in the RLB ARP packet handler was
> >> set to NULL to wildcard and accommodate balancing VLANs on top of
> >> bonds.
> >>
> >> This has the side-effect of the packet handler being called against
> >> other, non RLB-enabled bonds, and a kernel oops results when it tries
> >> to
> >> dereference rx_hashtbl in rlb_update_entry_from_arp(), which won't be
> >> set for those bonds, e.g. active-backup.
> >>
> >> With the __netif_receive_skb() changes from commit 1f3c8804, frames
> >> received on VLANs correctly make their way to the bond's handler,
> >> so we no longer need to wildcard the device.
> >
> > I see this problem as well, but I would propose to fix it another way to
> > not alter the receive path so close to the release of 2.6.35 and to
> > catch this for 802.3ad bonds as well.
>
> Is the problem demonstrable with 802.3ad bonds? bond_register_lacpdu()
> sets pk_type->dev = bond->dev.
>
Doh! You are right.
Plus Jay just posted your patch again, so we can go with that solution.
> >> Signed-off-by: Greg Edwards <[email protected]>
> >> ---
> >> Jay,
> >>
> >> The oops can be reproduced by:
> >>
> >> modprobe bonding
> >>
> >> echo active-backup > /sys/class/net/bond0/bonding/mode
> >> echo 100 > /sys/class/net/bond0/bonding/miimon
> >> ifconfig bond0 xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx
> >> echo +eth0 > /sys/class/net/bond0/bonding/slaves
> >> echo +eth1 > /sys/class/net/bond0/bonding/slaves
> >>
> >> echo +bond1 > /sys/class/net/bonding_masters
> >> echo balance-alb > /sys/class/net/bond1/bonding/mode
> >> echo 100 > /sys/class/net/bond1/bonding/miimon
> >> ifconfig bond1 xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx
> >> echo +eth2 > /sys/class/net/bond1/bonding/slaves
> >> echo +eth3 > /sys/class/net/bond1/bonding/slaves
> >>
> >> Pass some traffic on bond0. Boom.
> >>
> >
> > bonding: make sure mode-specific handlers handle appropriate frames
> >
> > This patch will exit out of rlb_arp_recv and bond_3ad_lacpdu_recv early
> > if the bond receiving the frame isn't using that mode.
>
> I had originally thought of doing something like this, but it didn't
> seem as clean. I don't have strong feelings one way or the other,
> though.
>
> Greg
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 23, 2010 at 01:02:04PM -0700, Jay Vosburgh wrote:
>
> From: Greg Edwards <[email protected]>
>
> After:
>
> commit 6146b1a4da98377e4abddc91ba5856bef8f23f1e
> Author: Jay Vosburgh <[email protected]>
> Date: Tue Nov 4 17:51:15 2008 -0800
>
> bonding: Fix ALB mode to balance traffic on VLANs
>
> the dev field in the RLB ARP packet handler was set to NULL to wildcard
> and accommodate balancing VLANs on top of bonds.
>
> This has the side-effect of the packet handler being called against
> other, non RLB-enabled bonds, and a kernel oops results when it tries to
> dereference rx_hashtbl in rlb_update_entry_from_arp(), which won't be
> set for those bonds, e.g. active-backup.
>
> With the __netif_receive_skb() changes from:
>
> commit 1f3c8804acba841b5573b953f5560d2683d2db0d
> Author: Andy Gospodarek <[email protected]>
> Date: Mon Dec 14 10:48:58 2009 +0000
>
> bonding: allow arp_ip_targets on separate vlans to use arp validation
>
> frames received on VLANs correctly make their way to the bond's handler,
> so we no longer need to wildcard the device.
>
> The oops can be reproduced by:
>
> modprobe bonding
>
> echo active-backup > /sys/class/net/bond0/bonding/mode
> echo 100 > /sys/class/net/bond0/bonding/miimon
> ifconfig bond0 xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx
> echo +eth0 > /sys/class/net/bond0/bonding/slaves
> echo +eth1 > /sys/class/net/bond0/bonding/slaves
>
> echo +bond1 > /sys/class/net/bonding_masters
> echo balance-alb > /sys/class/net/bond1/bonding/mode
> echo 100 > /sys/class/net/bond1/bonding/miimon
> ifconfig bond1 xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx
> echo +eth2 > /sys/class/net/bond1/bonding/slaves
> echo +eth3 > /sys/class/net/bond1/bonding/slaves
>
> Pass some traffic on bond0. Boom.
>
> [ Tested, behaves as advertised. I do not believe a test of the bonding
> mode is necessary, as there is no race between the packet handler and
> the bonding mode changing (the mode can only change when the device is
> closed). Also updated the log message to include the reproduction and
> full commit ids. -J ]
>
> Signed-off-by: Greg Edwards <[email protected]>
> Signed-off-by: Jay Vosburgh <[email protected]>
Acked-by: Andy Gospodarek <[email protected]>
>
> ---
>
> drivers/net/bonding/bond_alb.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
> index df48307..8d7dfd2 100644
> --- a/drivers/net/bonding/bond_alb.c
> +++ b/drivers/net/bonding/bond_alb.c
> @@ -822,7 +822,7 @@ static int rlb_initialize(struct bonding *bond)
>
> /*initialize packet type*/
> pk_type->type = cpu_to_be16(ETH_P_ARP);
> - pk_type->dev = NULL;
> + pk_type->dev = bond->dev;
> pk_type->func = rlb_arp_recv;
>
> /* register to receive ARPs */
> --
> 1.7.0.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Andy Gospodarek <[email protected]>
Date: Fri, 23 Jul 2010 16:26:48 -0400
> On Fri, Jul 23, 2010 at 01:02:04PM -0700, Jay Vosburgh wrote:
>>
>> From: Greg Edwards <[email protected]>
>>
>> After:
>>
>> commit 6146b1a4da98377e4abddc91ba5856bef8f23f1e
>> Author: Jay Vosburgh <[email protected]>
>> Date: Tue Nov 4 17:51:15 2008 -0800
>>
>> bonding: Fix ALB mode to balance traffic on VLANs
>>
>> the dev field in the RLB ARP packet handler was set to NULL to wildcard
>> and accommodate balancing VLANs on top of bonds.
>>
>> This has the side-effect of the packet handler being called against
>> other, non RLB-enabled bonds, and a kernel oops results when it tries to
>> dereference rx_hashtbl in rlb_update_entry_from_arp(), which won't be
>> set for those bonds, e.g. active-backup.
>>
>> With the __netif_receive_skb() changes from:
>>
>> commit 1f3c8804acba841b5573b953f5560d2683d2db0d
>> Author: Andy Gospodarek <[email protected]>
>> Date: Mon Dec 14 10:48:58 2009 +0000
>>
>> bonding: allow arp_ip_targets on separate vlans to use arp validation
>>
>> frames received on VLANs correctly make their way to the bond's handler,
>> so we no longer need to wildcard the device.
>>
>> The oops can be reproduced by:
>>
>> modprobe bonding
>>
>> echo active-backup > /sys/class/net/bond0/bonding/mode
>> echo 100 > /sys/class/net/bond0/bonding/miimon
>> ifconfig bond0 xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx
>> echo +eth0 > /sys/class/net/bond0/bonding/slaves
>> echo +eth1 > /sys/class/net/bond0/bonding/slaves
>>
>> echo +bond1 > /sys/class/net/bonding_masters
>> echo balance-alb > /sys/class/net/bond1/bonding/mode
>> echo 100 > /sys/class/net/bond1/bonding/miimon
>> ifconfig bond1 xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx
>> echo +eth2 > /sys/class/net/bond1/bonding/slaves
>> echo +eth3 > /sys/class/net/bond1/bonding/slaves
>>
>> Pass some traffic on bond0. Boom.
>>
>> [ Tested, behaves as advertised. I do not believe a test of the bonding
>> mode is necessary, as there is no race between the packet handler and
>> the bonding mode changing (the mode can only change when the device is
>> closed). Also updated the log message to include the reproduction and
>> full commit ids. -J ]
>>
>> Signed-off-by: Greg Edwards <[email protected]>
>> Signed-off-by: Jay Vosburgh <[email protected]>
>
> Acked-by: Andy Gospodarek <[email protected]>
This seems serious enough to put into net-2.6, so that's where I applied
it.
Thanks!