LinuxLists.cc - bond interface arp, vlan and trunk / network question

2009-04-20 17:50:31

by stefan novak

[permalink] [raw]

Subject: bond interface arp, vlan and trunk / network question

Hello list,

I've got a Problem with my bladecenter and bond interfaces.
My Server has 2 interface, each on a seperate switch. Each
serverinterface is connected via a trunk to one of the switches.
Each switch has a trunk to stacked-backbone switch.

nic1 --trunk--> bladesw1 --trunk--> backbone switch
nic2 --trunk--> bladesw2 --trunk--> backbone switch

So far vlan and trunking works as expected. But if one trunk
connection from a bladeswitch to the backbone switch is down the
mii-tool cant recognize this.
I tried to use an arp target on the backbone switch to check the
connection state.

Now i'm running into problems. :(
My bladesw1 is configured with a trunk and a private vlan id of 600.
On the backbone switch is a server in the vlan 600 connected, but i
can't get an arp request.

What can be the problem, or is it possible to add a vlan tag on the arp check?

thx for your help and sorry for my bad english...

Stefan

2009-04-20 18:16:36

by Eric Dumazet

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

stefan novak a ?crit :
> Hello list,
>
> I've got a Problem with my bladecenter and bond interfaces.
> My Server has 2 interface, each on a seperate switch. Each
> serverinterface is connected via a trunk to one of the switches.
> Each switch has a trunk to stacked-backbone switch.
>
> nic1 --trunk--> bladesw1 --trunk--> backbone switch
> nic2 --trunk--> bladesw2 --trunk--> backbone switch
>
> So far vlan and trunking works as expected. But if one trunk
> connection from a bladeswitch to the backbone switch is down the
> mii-tool cant recognize this.

What is the exact problem on this one ?

> I tried to use an arp target on the backbone switch to check the
> connection state.
>
> Now i'm running into problems. :(
> My bladesw1 is configured with a trunk and a private vlan id of 600.
> On the backbone switch is a server in the vlan 600 connected, but i
> can't get an arp request.
>
> What can be the problem, or is it possible to add a vlan tag on the arp check?
>

What driver is in use on your NIC interface ?

Could you post your bonding settings ?

cat /proc/net/bonding/bond0

2009-04-20 18:37:40

by Jay Vosburgh

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

Eric Dumazet <[email protected]> wrote:

>stefan novak a écrit :
>> Hello list,
>>
>> I've got a Problem with my bladecenter and bond interfaces.
>> My Server has 2 interface, each on a seperate switch. Each
>> serverinterface is connected via a trunk to one of the switches.
>> Each switch has a trunk to stacked-backbone switch.
>>
>> nic1 --trunk--> bladesw1 --trunk--> backbone switch
>> nic2 --trunk--> bladesw2 --trunk--> backbone switch
>>
>> So far vlan and trunking works as expected. But if one trunk
>> connection from a bladeswitch to the backbone switch is down the
>> mii-tool cant recognize this.
>
>What is the exact problem on this one ?

This is the expected behavior, the external switch to
bladecenter switch (ESM) link status does not affect the ESM to blade
link status.

Unless... your ESM supports "trunk failover." I believe the
Cisco ESMs do, I'm not sure about others. With trunk failover enabled,
loss of link on an external switch port will in turn drop link on the
corresponding internal switch ports. There is a long-ish delay for
this, on the order of 750 ms, as I recall.

>> I tried to use an arp target on the backbone switch to check the
>> connection state.
>>
>> Now i'm running into problems. :(
>> My bladesw1 is configured with a trunk and a private vlan id of 600.
>> On the backbone switch is a server in the vlan 600 connected, but i
>> can't get an arp request.
>>
>> What can be the problem, or is it possible to add a vlan tag on the arp check?
>>
>
>What driver is in use on your NIC interface ?
>
>Could you post your bonding settings ?
>
>cat /proc/net/bonding/bond0

Sufficiently recent versions of bonding should VLAN tag the ARP
probes, provided you are using a VLAN device configured above the bond.
My recollection is that VLAN tagging of ARP probes was added about three
years ago.

If the switch port is configured as native to a VLAN, then it
should tag everything coming in.

As Eric asks, what are you running?

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [email protected]

2009-04-20 20:01:08

by stefan novak

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

>> nic1 --trunk--> bladesw1 --trunk--> backbone switch
>> nic2 --trunk--> bladesw2 --trunk--> backbone switch
>>
>> So far vlan and trunking works as expected. But if one trunk
>> connection from a bladeswitch to the backbone switch is down the
>> mii-tool cant recognize this.
>
> What is the exact problem on this one ?

The exact problem is that the bonding driver don't switch the
interface because the mii-tool don't recognize that the connection
between the two switches is now broken.

nic1 --trunk--> bladesw1 --trunk--> backbone switch (passive if)
nic2 --trunk--> bladesw2 --xxx--broken-trunk-xx-> backbone switch (active if)

> Sufficiently recent versions of bonding should VLAN tag the ARP
>probes, provided you are using a VLAN device configured above the bond.
>My recollection is that VLAN tagging of ARP probes was added about three
>years ago.
>
> If the switch port is configured as native to a VLAN, then it
>should tag everything coming in.
>
> As Eric asks, what are you running?

I've got a bonding interface over eth0 and eth1 and on that bonding
interface several vlans. The arp probe is in vlan 600 and i can ping
it from the bash.
Also a arpping with -I bond0.600 is working. Arpping with -I bond0 is
not working.
So the arp check is not working for me :(

the nic driver is:
alias eth0 igb
alias eth1 igb

the bond setiings:
cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 30
ARP IP target/s (n.n.n.n form): 172.21.0.254

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:30:48:94:7d:1a

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:30:48:94:7d:1b

On Mon, Apr 20, 2009 at 8:37 PM, Jay Vosburgh <[email protected]> wrote:
> Eric Dumazet <[email protected]> wrote:
>
>>stefan novak a ?crit :
>>> Hello list,
>>>
>>> I've got a Problem with my bladecenter and bond interfaces.
>>> My Server has 2 interface, each on a seperate switch. Each
>>> serverinterface is connected via a trunk to one of the switches.
>>> Each switch has a trunk to stacked-backbone switch.
>>>
>>> nic1 --trunk--> bladesw1 --trunk--> backbone switch
>>> nic2 --trunk--> bladesw2 --trunk--> backbone switch
>>>
>>> So far vlan and trunking works as expected. But if one trunk
>>> connection from a bladeswitch to the backbone switch is down the
>>> mii-tool cant recognize this.
>>
>>What is the exact problem on this one ?
>
> ? ? ? ?This is the expected behavior, the external switch to
> bladecenter switch (ESM) link status does not affect the ESM to blade
> link status.
>
> ? ? ? ?Unless... your ESM supports "trunk failover." ?I believe the
> Cisco ESMs do, I'm not sure about others. ?With trunk failover enabled,
> loss of link on an external switch port will in turn drop link on the
> corresponding internal switch ports. ?There is a long-ish delay for
> this, on the order of 750 ms, as I recall.
>
>>> I tried to use an arp target on the backbone switch to check the
>>> connection state.
>>>
>>> Now i'm running into problems. :(
>>> My bladesw1 is configured with a trunk and a private vlan id of 600.
>>> On the backbone switch is a server in the vlan 600 connected, but i
>>> can't get an arp request.
>>>
>>> What can be the problem, or is it possible to add a vlan tag on the arp check?
>>>
>>
>>What driver is in use on your NIC interface ?
>>
>>Could you post your bonding settings ?
>>
>>cat /proc/net/bonding/bond0
>
> ? ? ? ?Sufficiently recent versions of bonding should VLAN tag the ARP
> probes, provided you are using a VLAN device configured above the bond.
> My recollection is that VLAN tagging of ARP probes was added about three
> years ago.
>
> ? ? ? ?If the switch port is configured as native to a VLAN, then it
> should tag everything coming in.
>
> ? ? ? ?As Eric asks, what are you running?
>
> ? ? ? ?-J
>
> ---
> ? ? ? ?-Jay Vosburgh, IBM Linux Technology Center, [email protected]
>

2009-04-20 20:36:42

by Jay Vosburgh

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

stefan novak <[email protected]> wrote:

>>> nic1 --trunk--> bladesw1 --trunk--> backbone switch
>>> nic2 --trunk--> bladesw2 --trunk--> backbone switch
>>>
>>> So far vlan and trunking works as expected. But if one trunk
>>> connection from a bladeswitch to the backbone switch is down the
>>> mii-tool cant recognize this.
>>
>> What is the exact problem on this one ?
>
>The exact problem is that the bonding driver don't switch the
>interface because the mii-tool don't recognize that the connection
>between the two switches is now broken.

No, from your configuration information, you're running the ARP
monitor, in which case the actual link state ("mii-tool", although that
isn't really how it works) is not used in the failover decision.

For the ARP monitor, the decision is based on whether or not
"replies" to the ARP probes come through. More on that in a bit.

>nic1 --trunk--> bladesw1 --trunk--> backbone switch (passive if)
>nic2 --trunk--> bladesw2 --xxx--broken-trunk-xx-> backbone switch (active if)
>
>> Sufficiently recent versions of bonding should VLAN tag the ARP
>>probes, provided you are using a VLAN device configured above the bond.
>>My recollection is that VLAN tagging of ARP probes was added about three
>>years ago.
>>
>> If the switch port is configured as native to a VLAN, then it
>>should tag everything coming in.
>>
>> As Eric asks, what are you running?
>
>I've got a bonding interface over eth0 and eth1 and on that bonding
>interface several vlans. The arp probe is in vlan 600 and i can ping
>it from the bash.
>Also a arpping with -I bond0.600 is working. Arpping with -I bond0 is
>not working.
>So the arp check is not working for me :(

I believe you're seeing the expected behavior from arping here,
and it does not automatically indicate that anything is wrong.

It's very possible that your network topology is such that
arping -I bond0 won't work while arping -I bond0.600 does. If the
target you specify is reachable only on the VLAN, it's expected behavior
that arping -I bond0 of that target won't work (because the interface
bond0 is not attached to the VLAN, only bond0.600 is). That doesn't
mean that the ARPs generated internally by bonding are untagged /
failing, as bonding itself adds VLAN tags to its own ARP probes as
needed.

On the other hand, if you specify different targets to the
arping -I bond0 and arping -I bond0.600 (so that the "bond0" target
isn't a VLAN destination), then something unusual may be occuring.

Also, are you running multiple blades with bonding behind the
same set of switches? If you are, you probably want to set the
arp_validate option to either "active" or "all", as the default setting
(none) relies only on the existance of traffic on the slaves, and
doesn't check the source of that traffic. The end result of that is the
probes from multiple bonding instances fool one another into thinking
the path is up, when it is not. With arp_validate enabled, it'll check
that the slaves are actually receiving their own ARP traffic.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [email protected]

>the nic driver is:
>alias eth0 igb
>alias eth1 igb
>
>the bond setiings:
>cat /proc/net/bonding/bond0
>Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)
>
>Bonding Mode: fault-tolerance (active-backup)
>Primary Slave: None
>Currently Active Slave: eth0
>MII Status: up
>MII Polling Interval (ms): 0
>Up Delay (ms): 0
>Down Delay (ms): 0
>ARP Polling Interval (ms): 30
>ARP IP target/s (n.n.n.n form): 172.21.0.254
>
>Slave Interface: eth0
>MII Status: up
>Link Failure Count: 0
>Permanent HW addr: 00:30:48:94:7d:1a
>
>Slave Interface: eth1
>MII Status: up
>Link Failure Count: 0
>Permanent HW addr: 00:30:48:94:7d:1b
>
>
>On Mon, Apr 20, 2009 at 8:37 PM, Jay Vosburgh <[email protected]> wrote:
>> Eric Dumazet <[email protected]> wrote:
>>
>>>stefan novak a écrit :
>>>> Hello list,
>>>>
>>>> I've got a Problem with my bladecenter and bond interfaces.
>>>> My Server has 2 interface, each on a seperate switch. Each
>>>> serverinterface is connected via a trunk to one of the switches.
>>>> Each switch has a trunk to stacked-backbone switch.
>>>>
>>>> nic1 --trunk--> bladesw1 --trunk--> backbone switch
>>>> nic2 --trunk--> bladesw2 --trunk--> backbone switch
>>>>
>>>> So far vlan and trunking works as expected. But if one trunk
>>>> connection from a bladeswitch to the backbone switch is down the
>>>> mii-tool cant recognize this.
>>>
>>>What is the exact problem on this one ?
>>
>> This is the expected behavior, the external switch to
>> bladecenter switch (ESM) link status does not affect the ESM to blade
>> link status.
>>
>> Unless... your ESM supports "trunk failover." I believe the
>> Cisco ESMs do, I'm not sure about others. With trunk failover enabled,
>> loss of link on an external switch port will in turn drop link on the
>> corresponding internal switch ports. There is a long-ish delay for
>> this, on the order of 750 ms, as I recall.
>>
>>>> I tried to use an arp target on the backbone switch to check the
>>>> connection state.
>>>>
>>>> Now i'm running into problems. :(
>>>> My bladesw1 is configured with a trunk and a private vlan id of 600.
>>>> On the backbone switch is a server in the vlan 600 connected, but i
>>>> can't get an arp request.
>>>>
>>>> What can be the problem, or is it possible to add a vlan tag on the arp check?
>>>>
>>>
>>>What driver is in use on your NIC interface ?
>>>
>>>Could you post your bonding settings ?
>>>
>>>cat /proc/net/bonding/bond0
>>
>> Sufficiently recent versions of bonding should VLAN tag the ARP
>> probes, provided you are using a VLAN device configured above the bond.
>> My recollection is that VLAN tagging of ARP probes was added about three
>> years ago.
>>
>> If the switch port is configured as native to a VLAN, then it
>> should tag everything coming in.
>>
>> As Eric asks, what are you running?
>>
>> -J
>>
>> ---
>> -Jay Vosburgh, IBM Linux Technology Center, [email protected]
>>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-04-20 21:03:24

by stefan novak

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

>
> ? ? ? ?I believe you're seeing the expected behavior from arping here,
> and it does not automatically indicate that anything is wrong.
>
> ? ? ? ?It's very possible that your network topology is such that
> arping -I bond0 won't work while arping -I bond0.600 does. ?If the
> target you specify is reachable only on the VLAN, it's expected behavior
> that arping -I bond0 of that target won't work (because the interface
> bond0 is not attached to the VLAN, only bond0.600 is). ?That doesn't
> mean that the ARPs generated internally by bonding are untagged /
> failing, as bonding itself adds VLAN tags to its own ARP probes as
> needed.

Ok. I've checked the tcpdump's on the machines and I think something is working.

tcpdump -v -i eth0 arp
tcpdump: WARNING: eth0: no IPv4 address assigned
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
22:56:38.817599 arp who-has 172.21.0.254 tell 172.21.0.1
22:56:38.847597 arp who-has 172.21.0.254 tell 172.21.0.1
22:56:38.877598 arp who-has 172.21.0.254 tell 172.21.0.1
22:56:38.907596 arp who-has 172.21.0.254 tell 172.21.0.1

tcpdump -v -i bond0.600 arp
tcpdump: listening on bond0.600, link-type EN10MB (Ethernet), capture
size 96 bytes
22:56:49.167157 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
22:56:49.197162 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
22:56:49.227130 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
22:56:49.257144 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)

the arp's are sent out on eth0 and recieved via bond0.600. When they
are sent on eth0 then the switch must tag the vlan600 (private vlan).
Then they come in at the right interface. Is it normal that so many
arp's are sent?
Is there a way to check if the arp check is working right in the proc
fs oder something like that?

> ? ? ? ?Also, are you running multiple blades with bonding behind the
> same set of switches?

Yes, 14 blades with 2 seperate(not connected) switches.

> ?If you are, you probably want to set the
> arp_validate option to either "active" or "all", as the default setting
> (none) relies only on the existance of traffic on the slaves, and
> doesn't check the source of that traffic. ?The end result of that is the
> probes from multiple bonding instances fool one another into thinking
> the path is up, when it is not. ?With arp_validate enabled, it'll check
> that the slaves are actually receiving their own ARP traffic.

Ok, sounds right for me. I've set the arp_validate option to "all".

2009-04-20 21:16:33

by Eric Dumazet

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

stefan novak a ?crit :
>> I believe you're seeing the expected behavior from arping here,
>> and it does not automatically indicate that anything is wrong.
>>
>> It's very possible that your network topology is such that
>> arping -I bond0 won't work while arping -I bond0.600 does. If the
>> target you specify is reachable only on the VLAN, it's expected behavior
>> that arping -I bond0 of that target won't work (because the interface
>> bond0 is not attached to the VLAN, only bond0.600 is). That doesn't
>> mean that the ARPs generated internally by bonding are untagged /
>> failing, as bonding itself adds VLAN tags to its own ARP probes as
>> needed.
>
> Ok. I've checked the tcpdump's on the machines and I think something is working.
>
> tcpdump -v -i eth0 arp
> tcpdump: WARNING: eth0: no IPv4 address assigned
> tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
> 22:56:38.817599 arp who-has 172.21.0.254 tell 172.21.0.1
> 22:56:38.847597 arp who-has 172.21.0.254 tell 172.21.0.1
> 22:56:38.877598 arp who-has 172.21.0.254 tell 172.21.0.1
> 22:56:38.907596 arp who-has 172.21.0.254 tell 172.21.0.1
>
> tcpdump -v -i bond0.600 arp
> tcpdump: listening on bond0.600, link-type EN10MB (Ethernet), capture
> size 96 bytes
> 22:56:49.167157 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
> 22:56:49.197162 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
> 22:56:49.227130 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
> 22:56:49.257144 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
>
> the arp's are sent out on eth0 and recieved via bond0.600. When they
> are sent on eth0 then the switch must tag the vlan600 (private vlan).

Ah, you setup eth0 or bond0 with an IP ?

bond driver does a route loookup to find out if a vlan tag is necessary or not
when issuing an arp request.

So check result of : "ip route get 172.21.0.254"

> Then they come in at the right interface. Is it normal that so many
> arp's are sent?

you setup a 30 ms interval, so you get what you asked for :)

> Is there a way to check if the arp check is working right in the proc
> fs oder something like that?
>
>> Also, are you running multiple blades with bonding behind the
>> same set of switches?
>
> Yes, 14 blades with 2 seperate(not connected) switches.
>
>> If you are, you probably want to set the
>> arp_validate option to either "active" or "all", as the default setting
>> (none) relies only on the existance of traffic on the slaves, and
>> doesn't check the source of that traffic. The end result of that is the
>> probes from multiple bonding instances fool one another into thinking
>> the path is up, when it is not. With arp_validate enabled, it'll check
>> that the slaves are actually receiving their own ARP traffic.
>
> Ok, sounds right for me. I've set the arp_validate option to "all".

Please give us :

ifconfig -a

2009-04-20 21:23:22

by Jay Vosburgh

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

stefan novak <[email protected]> wrote:

>>
>> I believe you're seeing the expected behavior from arping here,
>> and it does not automatically indicate that anything is wrong.
>>
>> It's very possible that your network topology is such that
>> arping -I bond0 won't work while arping -I bond0.600 does. If the
>> target you specify is reachable only on the VLAN, it's expected behavior
>> that arping -I bond0 of that target won't work (because the interface
>> bond0 is not attached to the VLAN, only bond0.600 is). That doesn't
>> mean that the ARPs generated internally by bonding are untagged /
>> failing, as bonding itself adds VLAN tags to its own ARP probes as
>> needed.
>
>Ok. I've checked the tcpdump's on the machines and I think something is working.
>
>tcpdump -v -i eth0 arp
>tcpdump: WARNING: eth0: no IPv4 address assigned
>tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
>22:56:38.817599 arp who-has 172.21.0.254 tell 172.21.0.1
>22:56:38.847597 arp who-has 172.21.0.254 tell 172.21.0.1
>22:56:38.877598 arp who-has 172.21.0.254 tell 172.21.0.1
>22:56:38.907596 arp who-has 172.21.0.254 tell 172.21.0.1
>
>tcpdump -v -i bond0.600 arp
>tcpdump: listening on bond0.600, link-type EN10MB (Ethernet), capture
>size 96 bytes
>22:56:49.167157 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
>22:56:49.197162 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
>22:56:49.227130 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
>22:56:49.257144 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
>
>the arp's are sent out on eth0 and recieved via bond0.600. When they
>are sent on eth0 then the switch must tag the vlan600 (private vlan).
>Then they come in at the right interface. Is it normal that so many
>arp's are sent?

You set arp_interval to 30, so you get one probe every 30
milliseconds, 33 per second.

As far as seeing the ARPs on eth0 and bond0.600, I believe
that's normal, as the interfaces are logically stacked, so you can see
the traffic in both places. If the device supports VLAN acceleration
(and maybe even if it doesn't; I'd have to look), I don't think you'll
see the tag in the tcpdumped traffic, as the tagging is done by the
hardware, not in software.

It's also normal on bonding to see transmits on the slave and
received traffic on the bond device; that has to do with how the
transmit and receive paths differ.

>Is there a way to check if the arp check is working right in the proc
>fs oder something like that?

There's no "self test" or anything like that, if that's what you
mean. If the ARPs work (make the round trip) the link is up, if they
don't, the link is down. That's subject to some details related to
arp_validate, but is basically it.

>> Also, are you running multiple blades with bonding behind the
>> same set of switches?
>
>Yes, 14 blades with 2 seperate(not connected) switches.
>
>> If you are, you probably want to set the
>> arp_validate option to either "active" or "all", as the default setting
>> (none) relies only on the existance of traffic on the slaves, and
>> doesn't check the source of that traffic. The end result of that is the
>> probes from multiple bonding instances fool one another into thinking
>> the path is up, when it is not. With arp_validate enabled, it'll check
>> that the slaves are actually receiving their own ARP traffic.
>
>Ok, sounds right for me. I've set the arp_validate option to "all".

That may make the phantom "up" link go away.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [email protected]

2009-04-20 21:39:36

by stefan novak

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

ip route get 172.21.0.254
172.21.0.254 dev bond0.600 src 172.21.0.1
cache mtu 1500 advmss 1460 hoplimit 64

ifconfig -a :
bond0 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:62221072 errors:0 dropped:0 overruns:0 frame:0
TX packets:1152153 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:4037489679 (3.7 GiB) TX bytes:124861822 (119.0 MiB)

bond0.200 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A
inet addr:172.22.0.1 Bcast:172.22.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:5 errors:0 dropped:0 overruns:0 frame:0
TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:250 (250.0 b) TX bytes:252 (252.0 b)

bond0.500 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A
inet addr:172.20.0.1 Bcast:172.20.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:26448 errors:0 dropped:0 overruns:0 frame:0
TX packets:33570 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:6278388 (5.9 MiB) TX bytes:6429404 (6.1 MiB)

bond0.600 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A
inet addr:172.21.0.1 Bcast:172.21.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:1911 errors:0 dropped:0 overruns:0 frame:0
TX packets:8665 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:98604 (96.2 KiB) TX bytes:1208395 (1.1 MiB)

eth0 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:881723 errors:0 dropped:0 overruns:0 frame:0
TX packets:1074651 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:108550619 (103.5 MiB) TX bytes:115574099 (110.2 MiB)
Memory:f8220000-f8240000

eth1 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:61339352 errors:0 dropped:0 overruns:0 frame:0
TX packets:77510 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3928939240 (3.6 GiB) TX bytes:9290795 (8.8 MiB)
Memory:f8260000-f8280000

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:105468 errors:0 dropped:0 overruns:0 frame:0
TX packets:105468 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:7016087 (6.6 MiB) TX bytes:7016087 (6.6 MiB)

> ? ? ? ?You set arp_interval to 30, so you get one probe every 30
> milliseconds, 33 per second.

Yes, sure. my mistake....

> ? ? ? ?There's no "self test" or anything like that, if that's what you
> mean. ?If the ARPs work (make the round trip) the link is up, if they
> don't, the link is down. ?That's subject to some details related to
> arp_validate, but is basically it.

Ok. my interfaces are now up with arp_validate set to 0/none.
Everything works as expected, thank you.
But with arp_validate set to all the interfaces are always down. :(
I think that i have to use the all option because i have many servers
in that vlan?

2009-04-21 00:02:03

by Jay Vosburgh

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

stefan novak <[email protected]> wrote:

>ip route get 172.21.0.254
>172.21.0.254 dev bond0.600 src 172.21.0.1
> cache mtu 1500 advmss 1460 hoplimit 64
>
>ifconfig -a :
>bond0 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A
> UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
> RX packets:62221072 errors:0 dropped:0 overruns:0 frame:0
> TX packets:1152153 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:4037489679 (3.7 GiB) TX bytes:124861822 (119.0 MiB)
>
>bond0.200 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A
> inet addr:172.22.0.1 Bcast:172.22.0.255 Mask:255.255.255.0
> UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
> RX packets:5 errors:0 dropped:0 overruns:0 frame:0
> TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:250 (250.0 b) TX bytes:252 (252.0 b)
>
>bond0.500 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A
> inet addr:172.20.0.1 Bcast:172.20.0.255 Mask:255.255.255.0
> UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
> RX packets:26448 errors:0 dropped:0 overruns:0 frame:0
> TX packets:33570 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:6278388 (5.9 MiB) TX bytes:6429404 (6.1 MiB)
>
>bond0.600 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A
> inet addr:172.21.0.1 Bcast:172.21.0.255 Mask:255.255.255.0
> UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
> RX packets:1911 errors:0 dropped:0 overruns:0 frame:0
> TX packets:8665 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:98604 (96.2 KiB) TX bytes:1208395 (1.1 MiB)
>
>eth0 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A
> UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
> RX packets:881723 errors:0 dropped:0 overruns:0 frame:0
> TX packets:1074651 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:108550619 (103.5 MiB) TX bytes:115574099 (110.2 MiB)
> Memory:f8220000-f8240000
>
>eth1 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A
> UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
> RX packets:61339352 errors:0 dropped:0 overruns:0 frame:0
> TX packets:77510 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:3928939240 (3.6 GiB) TX bytes:9290795 (8.8 MiB)
> Memory:f8260000-f8280000
>
>lo Link encap:Local Loopback
> inet addr:127.0.0.1 Mask:255.0.0.0
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:105468 errors:0 dropped:0 overruns:0 frame:0
> TX packets:105468 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:7016087 (6.6 MiB) TX bytes:7016087 (6.6 MiB)

This looks pretty much as expected.

[...]
>> There's no "self test" or anything like that, if that's what you
>> mean. If the ARPs work (make the round trip) the link is up, if they
>> don't, the link is down. That's subject to some details related to
>> arp_validate, but is basically it.
>
>Ok. my interfaces are now up with arp_validate set to 0/none.
>Everything works as expected, thank you.
>But with arp_validate set to all the interfaces are always down. :(
>I think that i have to use the all option because i have many servers
>in that vlan?

Well, you'll probably want to run with arp_validate, otherwise
the probe traffic can fool the arp monitor into thinking the path is up
when it isn't.

But...

I know what your problem with arp_validate is, though, and it's
something I've been working on as time permits (and forgot to mention
previously). Basically, the VLAN receive path assigns the VLAN device
to the received packet before doing receive processing on it, so the
"slave" identity is lost before the bonding ARP receive function looks
at it, so it never counts the ARP (for validate purposes).

I've been chewing on the least bad way to fix this, mostly
trying to figure out if its possible to resolve without adding a hook
into bonding that basically replaces and extends skb_bond_should_drop
(which does work, it's just not very elegant).

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [email protected]

2009-04-22 08:29:20

by stefan novak

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

so its a bug in kernel?

2009-04-23 01:12:42

by Jay Vosburgh

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

stefan novak <[email protected]> wrote:

>so its a bug in kernel?

Yes. I think the following patch should add support for
arp_validate over VLANs, could you test this? This is still a work in
progress, so it's pretty rough around the edges, but the core
functionality should be there.

This works by moving the skb_bond_should_drop logic around, and
replaces the current inline code with a hook into bonding to do all of
that, plus additional logic. The reason for a hook is to get the
skb_bond_should_drop callers from the VLAN input path before the input
device is assigned to the VLAN device.

-J

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 99610f3..cc367a3 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1607,6 +1607,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
read_lock(&bond->lock);

new_slave->last_arp_rx = jiffies;
+ bond_fix_slave_validate_flag(bond, new_slave);

if (bond->params.miimon && !bond->params.use_carrier) {
link_reporting = bond_check_dev_link(bond, slave_dev, 1);
@@ -2538,8 +2539,8 @@ static void bond_arp_send(struct net_device *slave_dev, int arp_op, __be32 dest_
{
struct sk_buff *skb;

- pr_debug("arp %d on slave %s: dst %x src %x vid %d\n", arp_op,
- slave_dev->name, dest_ip, src_ip, vlan_id);
+ pr_debug("arp %d on slave %s: dst %x src %x vid %d j %lu\n", arp_op,
+ slave_dev->name, dest_ip, src_ip, vlan_id, jiffies);

skb = arp_create(arp_op, ETH_P_ARP, dest_ip, slave_dev, src_ip,
NULL, slave_dev->dev_addr, NULL);
@@ -2679,8 +2680,6 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32

targets = bond->params.arp_targets;
for (i = 0; (i < BOND_MAX_ARP_TARGETS) && targets[i]; i++) {
- pr_debug("bva: sip %pI4 tip %pI4 t[%d] %pI4 bhti(tip) %d\n",
- &sip, &tip, i, &targets[i], bond_has_this_ip(bond, tip));
if (sip == targets[i]) {
if (bond_has_this_ip(bond, tip))
slave->last_arp_rx = jiffies;
@@ -2689,35 +2688,36 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
}
}

-static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
+static void bond_handle_arp(struct sk_buff *skb, struct net_device *dev)
{
struct arphdr *arp;
struct slave *slave;
+ struct net_device *bond_dev = dev->master;
struct bonding *bond;
unsigned char *arp_ptr;
__be32 sip, tip;

if (dev_net(dev) != &init_net)
- goto out;
+ return;

- if (!(dev->priv_flags & IFF_BONDING) || !(dev->flags & IFF_MASTER))
- goto out;
+ bond = netdev_priv(bond_dev);

- bond = netdev_priv(dev);
read_lock(&bond->lock);

- pr_debug("bond_arp_rcv: bond %s skb->dev %s orig_dev %s\n",
- bond->dev->name, skb->dev ? skb->dev->name : "NULL",
- orig_dev ? orig_dev->name : "NULL");
-
- slave = bond_get_slave_by_dev(bond, orig_dev);
+ slave = bond_get_slave_by_dev(bond, dev);
if (!slave || !slave_do_arp_validate(bond, slave))
goto out_unlock;

if (!pskb_may_pull(skb, arp_hdr_len(dev)))
goto out_unlock;

- arp = arp_hdr(skb);
+ /* Can't use arp_hdr; it's not initialized yet. */
+ arp = (struct arphdr *)skb->data;
+
+ pr_debug("arp: hln %d p_t %d hrd %x pro %x pln %d\n",
+ arp->ar_hln, skb->pkt_type, ntohs(arp->ar_hrd),
+ ntohs(arp->ar_pro), arp->ar_pln);
+
if (arp->ar_hln != dev->addr_len ||
skb->pkt_type == PACKET_OTHERHOST ||
skb->pkt_type == PACKET_LOOPBACK ||
@@ -2732,29 +2732,67 @@ static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct pack
arp_ptr += 4 + dev->addr_len;
memcpy(&tip, arp_ptr, 4);

- pr_debug("bond_arp_rcv: %s %s/%d av %d sv %d sip %pI4 tip %pI4\n",
- bond->dev->name, slave->dev->name, slave->state,
- bond->params.arp_validate, slave_do_arp_validate(bond, slave),
- &sip, &tip);
-
- /*
- * Backup slaves won't see the ARP reply, but do come through
- * here for each ARP probe (so we swap the sip/tip to validate
- * the probe). In a "redundant switch, common router" type of
- * configuration, the ARP probe will (hopefully) travel from
- * the active, through one switch, the router, then the other
- * switch before reaching the backup.
- */
- if (slave->state == BOND_STATE_ACTIVE)
+ switch (ntohs(arp->ar_op)) {
+ case ARPOP_REPLY:
bond_validate_arp(bond, slave, sip, tip);
- else
+ break;
+ case ARPOP_REQUEST:
bond_validate_arp(bond, slave, tip, sip);
+ break;
+ }

out_unlock:
read_unlock(&bond->lock);
-out:
- dev_kfree_skb(skb);
- return NET_RX_SUCCESS;
+}
+
+/*
+ * Called by core packet processing in netif_receive_skb and in VLAN fast
+ * path to (a) determine if packet should be dropped, and (b) to perform
+ * ARP monitor processing (last_rx, validation).
+ *
+ * For the VLAN case, called before the skb->dev is reassigned to the
+ * VLAN.
+ *
+ * Returns 1 if frame should nominally be dropped; 0 if it should be kept.
+ *
+ * We want to keep:
+ * - all traffic on active slaves
+ * - some traffic on inactive slaves: non-broadcast and non-multicast in
+ * ALB/TLB mode and LACP frames in 802.3ad mode.
+ *
+ * ARP frames are handled as normal traffic; the ARP monitor handling
+ * takes place here, so they need not be kept on inactive slaves.
+ */
+static int bond_handle_frame(struct sk_buff *skb)
+{
+ struct net_device *dev;
+ struct net_device *master;
+
+ dev = skb->dev;
+ master = dev->master;
+ if (!master || !master->priv_flags & IFF_BONDING)
+ return 0;
+
+ if (master->priv_flags & IFF_MASTER_ARPMON) {
+ dev->last_rx = jiffies;
+ if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
+ skb->protocol == __constant_htons(ETH_P_ARP))
+ bond_handle_arp(skb, dev);
+ }
+
+ if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
+ if (master->priv_flags & IFF_MASTER_ALB) {
+ if (skb->pkt_type != PACKET_BROADCAST &&
+ skb->pkt_type != PACKET_MULTICAST)
+ return 0;
+ }
+ if (master->priv_flags & IFF_MASTER_8023AD &&
+ skb->protocol == __constant_htons(ETH_P_SLOW))
+ return 0;
+
+ return 1;
+ }
+ return 0;
}

/*
@@ -3688,6 +3726,7 @@ static void bond_unregister_lacpdu(struct bonding *bond)

void bond_register_arp(struct bonding *bond)
{
+#if 0
struct packet_type *pt = &bond->arp_mon_pt;

if (pt->type)
@@ -3697,14 +3736,17 @@ void bond_register_arp(struct bonding *bond)
pt->dev = bond->dev;
pt->func = bond_arp_rcv;
dev_add_pack(pt);
+#endif
}

void bond_unregister_arp(struct bonding *bond)
{
+#if 0
struct packet_type *pt = &bond->arp_mon_pt;

dev_remove_pack(pt);
pt->type = 0;
+#endif
}

/*---------------------------- Hashing Policies -----------------------------*/
@@ -5188,6 +5230,8 @@ out_rtnl:
return res;
}

+extern int (*bond_handle_frame_hook)(struct sk_buff *skb);
+
static int __init bonding_init(void)
{
int i;
@@ -5221,6 +5265,8 @@ static int __init bonding_init(void)
register_inetaddr_notifier(&bond_inetaddr_notifier);
bond_register_ipv6_notifier();

+ bond_handle_frame_hook = bond_handle_frame;
+
goto out;
err:
list_for_each_entry(bond, &bond_dev_list, bond_list) {
@@ -5249,6 +5295,9 @@ static void __exit bonding_exit(void)
rtnl_lock();
bond_free_all();
rtnl_unlock();
+
+ bond_handle_frame_hook = NULL;
+
}

module_init(bonding_init);
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 18cf478..0e9f0e9 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -484,8 +484,9 @@ static ssize_t bonding_store_arp_validate(struct device *d,
struct device_attribute *attr,
const char *buf, size_t count)
{
- int new_value;
+ int new_value, i;
struct bonding *bond = to_bond(d);
+ struct slave *slave;

new_value = bond_parse_parm(buf, arp_validate_tbl);
if (new_value < 0) {
@@ -504,13 +505,12 @@ static ssize_t bonding_store_arp_validate(struct device *d,
bond->dev->name, arp_validate_tbl[new_value].modename,
new_value);

- if (!bond->params.arp_validate && new_value) {
- bond_register_arp(bond);
- } else if (bond->params.arp_validate && !new_value) {
- bond_unregister_arp(bond);
- }
+ if (bond->params.arp_validate != new_value) {
+ bond->params.arp_validate = new_value;

- bond->params.arp_validate = new_value;
+ bond_for_each_slave(bond, slave, i)
+ bond_fix_slave_validate_flag(bond, slave);
+ }

return count;
}
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index ca849d2..275f08d 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -283,6 +283,15 @@ static inline unsigned long slave_last_rx(struct bonding *bond,
return slave->dev->last_rx;
}

+static inline void bond_fix_slave_validate_flag(struct bonding *bond,
+ struct slave *slave)
+{
+ if (slave_do_arp_validate(bond, slave))
+ slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
+ else
+ slave->dev->priv_flags &= ~IFF_SLAVE_NEEDARP;
+}
+
static inline void bond_set_slave_inactive_flags(struct slave *slave)
{
struct bonding *bond = netdev_priv(slave->dev->master);
@@ -290,14 +299,16 @@ static inline void bond_set_slave_inactive_flags(struct slave *slave)
bond->params.mode != BOND_MODE_ALB)
slave->state = BOND_STATE_BACKUP;
slave->dev->priv_flags |= IFF_SLAVE_INACTIVE;
- if (slave_do_arp_validate(bond, slave))
- slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
+ bond_fix_slave_validate_flag(bond, slave);
}

static inline void bond_set_slave_active_flags(struct slave *slave)
{
+ struct bonding *bond = netdev_priv(slave->dev->master);
+
slave->state = BOND_STATE_ACTIVE;
- slave->dev->priv_flags &= ~(IFF_SLAVE_INACTIVE | IFF_SLAVE_NEEDARP);
+ slave->dev->priv_flags &= ~IFF_SLAVE_INACTIVE;
+ bond_fix_slave_validate_flag(bond, slave);
}

static inline void bond_set_master_3ad_flags(struct bonding *bond)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2e7783f..3dafd8f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1874,6 +1874,7 @@ static inline void netif_set_gso_max_size(struct net_device *dev,
dev->gso_max_size = size;
}

+#if 0
/* On bonding slaves other than the currently active slave, suppress
* duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
* ARP on active-backup slaves with arp_validate enabled.
@@ -1906,6 +1907,8 @@ static inline int skb_bond_should_drop(struct sk_buff *skb)
}
return 0;
}
+#endif
+extern int skb_bond_should_drop(struct sk_buff *skb);

extern struct pernet_operations __net_initdata loopback_net_ops;
#endif /* __KERNEL__ */
diff --git a/net/core/dev.c b/net/core/dev.c
index 91d792d..ac5a37e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2098,6 +2098,56 @@ static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
#define handle_macvlan(skb, pt_prev, ret, orig_dev) (skb)
#endif

+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
+int (*bond_handle_frame_hook)(struct sk_buff *skb);
+EXPORT_SYMBOL_GPL(bond_handle_frame_hook);
+
+/* On bonding slaves other than the currently active slave, suppress
+ * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
+ * ARP on active-backup slaves with arp_validate enabled.
+ */
+int skb_bond_should_drop(struct sk_buff *skb)
+{
+ struct net_device *dev = skb->dev;
+ struct net_device *master = dev->master;
+
+ if (master)
+ return bond_handle_frame_hook(skb);
+
+ return 0;
+
+#if 0
+ if (master->priv_flags & IFF_MASTER_ARPMON)
+ dev->last_rx = jiffies;
+
+ if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
+ if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
+ skb->protocol == __constant_htons(ETH_P_ARP))
+ return 0;
+
+ if (master->priv_flags & IFF_MASTER_ALB) {
+ if (skb->pkt_type != PACKET_BROADCAST &&
+ skb->pkt_type != PACKET_MULTICAST)
+ return 0;
+ }
+ if (master->priv_flags & IFF_MASTER_8023AD &&
+ skb->protocol == __constant_htons(ETH_P_SLOW))
+ return 0;
+
+ return 1;
+ }
+ }
+ return 0;
+#endif /* 0 */
+}
+#else
+int skb_bond_should_drop(struct sk_buff *skb)
+{
+ return 0;
+}
+#endif
+EXPORT_SYMBOL_GPL(skb_bond_should_drop);
+
#ifdef CONFIG_NET_CLS_ACT
/* TODO: Maybe we should just force sch_ingress to be compiled in
* when CONFIG_NET_CLS_ACT is? otherwise some useless instructions

---
-Jay Vosburgh, IBM Linux Technology Center, [email protected]

2009-04-23 05:59:20

by Eric Dumazet

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

Jay Vosburgh a ?crit :
> stefan novak <[email protected]> wrote:
>
>> so its a bug in kernel?
>
> Yes. I think the following patch should add support for
> arp_validate over VLANs, could you test this? This is still a work in
> progress, so it's pretty rough around the edges, but the core
> functionality should be there.
>
> This works by moving the skb_bond_should_drop logic around, and
> replaces the current inline code with a hook into bonding to do all of
> that, plus additional logic. The reason for a hook is to get the
> skb_bond_should_drop callers from the VLAN input path before the input
> device is assigned to the VLAN device.
>
> -J

Hi Jay

I wanted to test your patch and setup such VLAN config, and not yet applied your patch.

# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth2
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 250
ARP IP target/s (n.n.n.n form): 192.168.20.254

Slave Interface: eth1
MII Status: up
Link Failure Count: 8
Permanent HW addr: 00:1e:0b:ec:d3:d2

Slave Interface: eth2
MII Status: up
Link Failure Count: 11
Permanent HW addr: 00:1e:0b:92:78:50

# grep . /sys/class/net/bond1/bonding/*
/sys/class/net/bond1/bonding/active_slave:eth2
/sys/class/net/bond1/bonding/ad_select:stable 0
/sys/class/net/bond1/bonding/arp_interval:250
/sys/class/net/bond1/bonding/arp_ip_target:192.168.20.254
/sys/class/net/bond1/bonding/arp_validate:none 0
/sys/class/net/bond1/bonding/downdelay:0
/sys/class/net/bond1/bonding/fail_over_mac:none 0
/sys/class/net/bond1/bonding/lacp_rate:slow 0
/sys/class/net/bond1/bonding/miimon:0
/sys/class/net/bond1/bonding/mii_status:up
/sys/class/net/bond1/bonding/mode:active-backup 1
/sys/class/net/bond1/bonding/num_grat_arp:1
/sys/class/net/bond1/bonding/num_unsol_na:1
/sys/class/net/bond1/bonding/slaves:eth1 eth2
/sys/class/net/bond1/bonding/updelay:0
/sys/class/net/bond1/bonding/use_carrier:1
/sys/class/net/bond1/bonding/xmit_hash_policy:layer2 0

So active slave is eth2. Still I receive trafic on eth1, according to ifconfig :
# ifconfig eth1|grep packets;sleep 10;ifconfig eth1|grep packets
RX packets:2098122 errors:0 dropped:0 overruns:0 frame:0
TX packets:2053085 errors:0 dropped:0 overruns:0 carrier:0
RX packets:2098162 errors:0 dropped:0 overruns:0 frame:0
TX packets:2053085 errors:0 dropped:0 overruns:0 carrier:0

exactly 4 packets per second.

# ifconfig eth2|grep packets;sleep 10;ifconfig eth2|grep packets
RX packets:3695512 errors:0 dropped:0 overruns:0 frame:0
TX packets:3683799 errors:0 dropped:0 overruns:0 carrier:0
RX packets:3695554 errors:0 dropped:0 overruns:0 frame:0
TX packets:3683841 errors:0 dropped:0 overruns:0 carrier:0

I also receive arp answers on eth2 (quite normal)

I wanted to tcpdump on eth1 but got nothing :

# tcpdump -p -n -s 0 -i eth1
tcpdump: WARNING: eth1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
^C

# tcpdump -p -n -s 0 -i eth2
tcpdump: WARNING: eth2: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
07:54:48.430982 arp who-has 192.168.20.254 tell 192.168.20.110
07:54:48.680980 arp who-has 192.168.20.254 tell 192.168.20.110
07:54:48.930981 arp who-has 192.168.20.254 tell 192.168.20.110
07:54:49.180980 arp who-has 192.168.20.254 tell 192.168.20.110
07:54:49.430980 arp who-has 192.168.20.254 tell 192.168.20.110
07:54:49.680980 arp who-has 192.168.20.254 tell 192.168.20.110

# tcpdump -p -n -s 0 -i bond1
tcpdump: WARNING: bond1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond1, link-type EN10MB (Ethernet), capture size 65535 bytes
07:55:28.681491 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:28.931493 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:29.181466 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:29.431496 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:29.681487 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:29.931492 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01

# tcpdump -p -n -s 0 -i bond1.103
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond1.103, link-type EN10MB (Ethernet), capture size 65535 bytes
07:55:58.681521 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:58.931636 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:59.181495 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:59.431496 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:59.681499 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:59.931499 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01

Configuration script is

ip link set eth1 up
ip link set eth2 up

ip addr flush dev eth1
ip addr flush dev eth2

ip link set eth1 up
ip link set eth2 up

modprobe bond1

ifconfig bond1 down

# test du arp_check toutes les 250ms, en choissant l'HSRP du vlan 103 comme IP
echo +192.168.20.254 >/sys/class/net/bond1/bonding/arp_ip_target
echo 250 >/sys/class/net/bond1/bonding/arp_interval

# mode actif/passif (active-backup)
echo 1 >/sys/class/net/bond1/bonding/mode

ifconfig bond1 up

ifenslave bond1 eth1 eth2

ip link set eth1 up
ip link set eth2 up

ip link add link bond1 bond1.103 txqueuelen 100 type vlan id 103
ip addr add 192.168.20.110/24 dev bond1.103
ip link set bond1.103 up

ip link add link bond1 bond1.825 txqueuelen 1000 type vlan id 825
ip addr add 10.170.73.123/25 dev bond1.825
ip link set bond1.825 up

Is this setup OK to test your patch ?

2009-04-23 08:06:10

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

Thu, Apr 23, 2009 at 03:12:29AM CEST, [email protected] wrote:
>stefan novak <[email protected]> wrote:
>
>>so its a bug in kernel?
>
> Yes. I think the following patch should add support for
>arp_validate over VLANs, could you test this? This is still a work in
>progress, so it's pretty rough around the edges, but the core
>functionality should be there.
>
> This works by moving the skb_bond_should_drop logic around, and
>replaces the current inline code with a hook into bonding to do all of
>that, plus additional logic. The reason for a hook is to get the
>skb_bond_should_drop callers from the VLAN input path before the input
>device is assigned to the VLAN device.
>
> -J
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 99610f3..cc367a3 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1607,6 +1607,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
> read_lock(&bond->lock);
>
> new_slave->last_arp_rx = jiffies;
>+ bond_fix_slave_validate_flag(bond, new_slave);
>
> if (bond->params.miimon && !bond->params.use_carrier) {
> link_reporting = bond_check_dev_link(bond, slave_dev, 1);
>@@ -2538,8 +2539,8 @@ static void bond_arp_send(struct net_device *slave_dev, int arp_op, __be32 dest_
> {
> struct sk_buff *skb;
>
>- pr_debug("arp %d on slave %s: dst %x src %x vid %d\n", arp_op,
>- slave_dev->name, dest_ip, src_ip, vlan_id);
>+ pr_debug("arp %d on slave %s: dst %x src %x vid %d j %lu\n", arp_op,
>+ slave_dev->name, dest_ip, src_ip, vlan_id, jiffies);
>
> skb = arp_create(arp_op, ETH_P_ARP, dest_ip, slave_dev, src_ip,
> NULL, slave_dev->dev_addr, NULL);
>@@ -2679,8 +2680,6 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
>
> targets = bond->params.arp_targets;
> for (i = 0; (i < BOND_MAX_ARP_TARGETS) && targets[i]; i++) {
>- pr_debug("bva: sip %pI4 tip %pI4 t[%d] %pI4 bhti(tip) %d\n",
>- &sip, &tip, i, &targets[i], bond_has_this_ip(bond, tip));
> if (sip == targets[i]) {
> if (bond_has_this_ip(bond, tip))
> slave->last_arp_rx = jiffies;
>@@ -2689,35 +2688,36 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
> }
> }
>
>-static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
>+static void bond_handle_arp(struct sk_buff *skb, struct net_device *dev)
> {
> struct arphdr *arp;
> struct slave *slave;
>+ struct net_device *bond_dev = dev->master;
> struct bonding *bond;
> unsigned char *arp_ptr;
> __be32 sip, tip;
>
> if (dev_net(dev) != &init_net)
>- goto out;
>+ return;
>
>- if (!(dev->priv_flags & IFF_BONDING) || !(dev->flags & IFF_MASTER))
>- goto out;
>+ bond = netdev_priv(bond_dev);
>
>- bond = netdev_priv(dev);
> read_lock(&bond->lock);
>
>- pr_debug("bond_arp_rcv: bond %s skb->dev %s orig_dev %s\n",
>- bond->dev->name, skb->dev ? skb->dev->name : "NULL",
>- orig_dev ? orig_dev->name : "NULL");
>-
>- slave = bond_get_slave_by_dev(bond, orig_dev);
>+ slave = bond_get_slave_by_dev(bond, dev);
> if (!slave || !slave_do_arp_validate(bond, slave))
> goto out_unlock;
>
> if (!pskb_may_pull(skb, arp_hdr_len(dev)))
> goto out_unlock;
>
>- arp = arp_hdr(skb);
>+ /* Can't use arp_hdr; it's not initialized yet. */
>+ arp = (struct arphdr *)skb->data;
>+
>+ pr_debug("arp: hln %d p_t %d hrd %x pro %x pln %d\n",
>+ arp->ar_hln, skb->pkt_type, ntohs(arp->ar_hrd),
>+ ntohs(arp->ar_pro), arp->ar_pln);
>+
> if (arp->ar_hln != dev->addr_len ||
> skb->pkt_type == PACKET_OTHERHOST ||
> skb->pkt_type == PACKET_LOOPBACK ||
>@@ -2732,29 +2732,67 @@ static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct pack
> arp_ptr += 4 + dev->addr_len;
> memcpy(&tip, arp_ptr, 4);
>
>- pr_debug("bond_arp_rcv: %s %s/%d av %d sv %d sip %pI4 tip %pI4\n",
>- bond->dev->name, slave->dev->name, slave->state,
>- bond->params.arp_validate, slave_do_arp_validate(bond, slave),
>- &sip, &tip);
>-
>- /*
>- * Backup slaves won't see the ARP reply, but do come through
>- * here for each ARP probe (so we swap the sip/tip to validate
>- * the probe). In a "redundant switch, common router" type of
>- * configuration, the ARP probe will (hopefully) travel from
>- * the active, through one switch, the router, then the other
>- * switch before reaching the backup.
>- */
>- if (slave->state == BOND_STATE_ACTIVE)
>+ switch (ntohs(arp->ar_op)) {
>+ case ARPOP_REPLY:
> bond_validate_arp(bond, slave, sip, tip);
>- else
>+ break;
>+ case ARPOP_REQUEST:
> bond_validate_arp(bond, slave, tip, sip);
>+ break;
>+ }
>
> out_unlock:
> read_unlock(&bond->lock);
>-out:
>- dev_kfree_skb(skb);
>- return NET_RX_SUCCESS;
>+}
>+
>+/*
>+ * Called by core packet processing in netif_receive_skb and in VLAN fast
>+ * path to (a) determine if packet should be dropped, and (b) to perform
>+ * ARP monitor processing (last_rx, validation).
>+ *
>+ * For the VLAN case, called before the skb->dev is reassigned to the
>+ * VLAN.
>+ *
>+ * Returns 1 if frame should nominally be dropped; 0 if it should be kept.
>+ *
>+ * We want to keep:
>+ * - all traffic on active slaves
>+ * - some traffic on inactive slaves: non-broadcast and non-multicast in
>+ * ALB/TLB mode and LACP frames in 802.3ad mode.
>+ *
>+ * ARP frames are handled as normal traffic; the ARP monitor handling
>+ * takes place here, so they need not be kept on inactive slaves.
>+ */
>+static int bond_handle_frame(struct sk_buff *skb)
>+{
>+ struct net_device *dev;
>+ struct net_device *master;
>+
>+ dev = skb->dev;
>+ master = dev->master;
>+ if (!master || !master->priv_flags & IFF_BONDING)
>+ return 0;
>+
>+ if (master->priv_flags & IFF_MASTER_ARPMON) {
>+ dev->last_rx = jiffies;
>+ if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>+ skb->protocol == __constant_htons(ETH_P_ARP))
>+ bond_handle_arp(skb, dev);
>+ }
>+
>+ if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>+ if (master->priv_flags & IFF_MASTER_ALB) {
>+ if (skb->pkt_type != PACKET_BROADCAST &&
>+ skb->pkt_type != PACKET_MULTICAST)
>+ return 0;
>+ }
>+ if (master->priv_flags & IFF_MASTER_8023AD &&
>+ skb->protocol == __constant_htons(ETH_P_SLOW))
>+ return 0;
>+
>+ return 1;
>+ }
>+ return 0;
> }
>
> /*
>@@ -3688,6 +3726,7 @@ static void bond_unregister_lacpdu(struct bonding *bond)
>
> void bond_register_arp(struct bonding *bond)
> {
>+#if 0
> struct packet_type *pt = &bond->arp_mon_pt;
>
> if (pt->type)
>@@ -3697,14 +3736,17 @@ void bond_register_arp(struct bonding *bond)
> pt->dev = bond->dev;
> pt->func = bond_arp_rcv;
> dev_add_pack(pt);
>+#endif
> }
>
> void bond_unregister_arp(struct bonding *bond)
> {
>+#if 0
> struct packet_type *pt = &bond->arp_mon_pt;
>
> dev_remove_pack(pt);
> pt->type = 0;
>+#endif
> }
>
> /*---------------------------- Hashing Policies -----------------------------*/
>@@ -5188,6 +5230,8 @@ out_rtnl:
> return res;
> }
>
>+extern int (*bond_handle_frame_hook)(struct sk_buff *skb);
>+
> static int __init bonding_init(void)
> {
> int i;
>@@ -5221,6 +5265,8 @@ static int __init bonding_init(void)
> register_inetaddr_notifier(&bond_inetaddr_notifier);
> bond_register_ipv6_notifier();
>
>+ bond_handle_frame_hook = bond_handle_frame;
>+
> goto out;
> err:
> list_for_each_entry(bond, &bond_dev_list, bond_list) {
>@@ -5249,6 +5295,9 @@ static void __exit bonding_exit(void)
> rtnl_lock();
> bond_free_all();
> rtnl_unlock();
>+
>+ bond_handle_frame_hook = NULL;
>+
> }
>
> module_init(bonding_init);
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index 18cf478..0e9f0e9 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -484,8 +484,9 @@ static ssize_t bonding_store_arp_validate(struct device *d,
> struct device_attribute *attr,
> const char *buf, size_t count)
> {
>- int new_value;
>+ int new_value, i;
> struct bonding *bond = to_bond(d);
>+ struct slave *slave;
>
> new_value = bond_parse_parm(buf, arp_validate_tbl);
> if (new_value < 0) {
>@@ -504,13 +505,12 @@ static ssize_t bonding_store_arp_validate(struct device *d,
> bond->dev->name, arp_validate_tbl[new_value].modename,
> new_value);
>
>- if (!bond->params.arp_validate && new_value) {
>- bond_register_arp(bond);
>- } else if (bond->params.arp_validate && !new_value) {
>- bond_unregister_arp(bond);
>- }
>+ if (bond->params.arp_validate != new_value) {
>+ bond->params.arp_validate = new_value;
>
>- bond->params.arp_validate = new_value;
>+ bond_for_each_slave(bond, slave, i)
>+ bond_fix_slave_validate_flag(bond, slave);
>+ }
>
> return count;
> }
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index ca849d2..275f08d 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -283,6 +283,15 @@ static inline unsigned long slave_last_rx(struct bonding *bond,
> return slave->dev->last_rx;
> }
>
>+static inline void bond_fix_slave_validate_flag(struct bonding *bond,
>+ struct slave *slave)
>+{
>+ if (slave_do_arp_validate(bond, slave))
>+ slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
>+ else
>+ slave->dev->priv_flags &= ~IFF_SLAVE_NEEDARP;
>+}
>+
> static inline void bond_set_slave_inactive_flags(struct slave *slave)
> {
> struct bonding *bond = netdev_priv(slave->dev->master);
>@@ -290,14 +299,16 @@ static inline void bond_set_slave_inactive_flags(struct slave *slave)
> bond->params.mode != BOND_MODE_ALB)
> slave->state = BOND_STATE_BACKUP;
> slave->dev->priv_flags |= IFF_SLAVE_INACTIVE;
>- if (slave_do_arp_validate(bond, slave))
>- slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
>+ bond_fix_slave_validate_flag(bond, slave);
> }
>
> static inline void bond_set_slave_active_flags(struct slave *slave)
> {
>+ struct bonding *bond = netdev_priv(slave->dev->master);
>+
> slave->state = BOND_STATE_ACTIVE;
>- slave->dev->priv_flags &= ~(IFF_SLAVE_INACTIVE | IFF_SLAVE_NEEDARP);
>+ slave->dev->priv_flags &= ~IFF_SLAVE_INACTIVE;
>+ bond_fix_slave_validate_flag(bond, slave);
> }
>
> static inline void bond_set_master_3ad_flags(struct bonding *bond)
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index 2e7783f..3dafd8f 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -1874,6 +1874,7 @@ static inline void netif_set_gso_max_size(struct net_device *dev,
> dev->gso_max_size = size;
> }
>
>+#if 0
> /* On bonding slaves other than the currently active slave, suppress
> * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
> * ARP on active-backup slaves with arp_validate enabled.
>@@ -1906,6 +1907,8 @@ static inline int skb_bond_should_drop(struct sk_buff *skb)
> }
> return 0;
> }
>+#endif
>+extern int skb_bond_should_drop(struct sk_buff *skb);
>
> extern struct pernet_operations __net_initdata loopback_net_ops;
> #endif /* __KERNEL__ */
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 91d792d..ac5a37e 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -2098,6 +2098,56 @@ static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
> #define handle_macvlan(skb, pt_prev, ret, orig_dev) (skb)
> #endif
>
>+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
>+int (*bond_handle_frame_hook)(struct sk_buff *skb);
>+EXPORT_SYMBOL_GPL(bond_handle_frame_hook);
>+
>+/* On bonding slaves other than the currently active slave, suppress
>+ * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
>+ * ARP on active-backup slaves with arp_validate enabled.
>+ */
>+int skb_bond_should_drop(struct sk_buff *skb)
>+{
>+ struct net_device *dev = skb->dev;
>+ struct net_device *master = dev->master;
>+
>+ if (master)
>+ return bond_handle_frame_hook(skb);

Maybe this hook can be called from netif_receive_skb() directly. You would safe
at least 2 dereferences, 1 if check. You would also need to add
"skb->dev->master &&" to if in __vlan_hwaccel_rx() and vlan_gro_common().
>+
>+ return 0;
>+
>+#if 0
>+ if (master->priv_flags & IFF_MASTER_ARPMON)
>+ dev->last_rx = jiffies;
>+
>+ if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>+ if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>+ skb->protocol == __constant_htons(ETH_P_ARP))
>+ return 0;
>+
>+ if (master->priv_flags & IFF_MASTER_ALB) {
>+ if (skb->pkt_type != PACKET_BROADCAST &&
>+ skb->pkt_type != PACKET_MULTICAST)
>+ return 0;
>+ }
>+ if (master->priv_flags & IFF_MASTER_8023AD &&
>+ skb->protocol == __constant_htons(ETH_P_SLOW))
>+ return 0;
>+
>+ return 1;
>+ }
>+ }
>+ return 0;
>+#endif /* 0 */
>+}
>+#else
>+int skb_bond_should_drop(struct sk_buff *skb)
>+{
>+ return 0;
>+}
>+#endif
>+EXPORT_SYMBOL_GPL(skb_bond_should_drop);
>+
> #ifdef CONFIG_NET_CLS_ACT
> /* TODO: Maybe we should just force sch_ingress to be compiled in
> * when CONFIG_NET_CLS_ACT is? otherwise some useless instructions
>
>
>---
> -Jay Vosburgh, IBM Linux Technology Center, [email protected]
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-04-23 15:00:26

by Jay Vosburgh

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

Jiri Pirko <[email protected]> wrote:

>Thu, Apr 23, 2009 at 03:12:29AM CEST, [email protected] wrote:
>>stefan novak <[email protected]> wrote:
>>
>>>so its a bug in kernel?
>>
>> Yes. I think the following patch should add support for
>>arp_validate over VLANs, could you test this? This is still a work in
>>progress, so it's pretty rough around the edges, but the core
>>functionality should be there.
>>
>> This works by moving the skb_bond_should_drop logic around, and
>>replaces the current inline code with a hook into bonding to do all of
>>that, plus additional logic. The reason for a hook is to get the
>>skb_bond_should_drop callers from the VLAN input path before the input
>>device is assigned to the VLAN device.
>>
>> -J
>>
>>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>index 99610f3..cc367a3 100644
>>--- a/drivers/net/bonding/bond_main.c
>>+++ b/drivers/net/bonding/bond_main.c
>>@@ -1607,6 +1607,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>> read_lock(&bond->lock);
>>
>> new_slave->last_arp_rx = jiffies;
>>+ bond_fix_slave_validate_flag(bond, new_slave);
>>
>> if (bond->params.miimon && !bond->params.use_carrier) {
>> link_reporting = bond_check_dev_link(bond, slave_dev, 1);
>>@@ -2538,8 +2539,8 @@ static void bond_arp_send(struct net_device *slave_dev, int arp_op, __be32 dest_
>> {
>> struct sk_buff *skb;
>>
>>- pr_debug("arp %d on slave %s: dst %x src %x vid %d\n", arp_op,
>>- slave_dev->name, dest_ip, src_ip, vlan_id);
>>+ pr_debug("arp %d on slave %s: dst %x src %x vid %d j %lu\n", arp_op,
>>+ slave_dev->name, dest_ip, src_ip, vlan_id, jiffies);
>>
>> skb = arp_create(arp_op, ETH_P_ARP, dest_ip, slave_dev, src_ip,
>> NULL, slave_dev->dev_addr, NULL);
>>@@ -2679,8 +2680,6 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
>>
>> targets = bond->params.arp_targets;
>> for (i = 0; (i < BOND_MAX_ARP_TARGETS) && targets[i]; i++) {
>>- pr_debug("bva: sip %pI4 tip %pI4 t[%d] %pI4 bhti(tip) %d\n",
>>- &sip, &tip, i, &targets[i], bond_has_this_ip(bond, tip));
>> if (sip == targets[i]) {
>> if (bond_has_this_ip(bond, tip))
>> slave->last_arp_rx = jiffies;
>>@@ -2689,35 +2688,36 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
>> }
>> }
>>
>>-static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
>>+static void bond_handle_arp(struct sk_buff *skb, struct net_device *dev)
>> {
>> struct arphdr *arp;
>> struct slave *slave;
>>+ struct net_device *bond_dev = dev->master;
>> struct bonding *bond;
>> unsigned char *arp_ptr;
>> __be32 sip, tip;
>>
>> if (dev_net(dev) != &init_net)
>>- goto out;
>>+ return;
>>
>>- if (!(dev->priv_flags & IFF_BONDING) || !(dev->flags & IFF_MASTER))
>>- goto out;
>>+ bond = netdev_priv(bond_dev);
>>
>>- bond = netdev_priv(dev);
>> read_lock(&bond->lock);
>>
>>- pr_debug("bond_arp_rcv: bond %s skb->dev %s orig_dev %s\n",
>>- bond->dev->name, skb->dev ? skb->dev->name : "NULL",
>>- orig_dev ? orig_dev->name : "NULL");
>>-
>>- slave = bond_get_slave_by_dev(bond, orig_dev);
>>+ slave = bond_get_slave_by_dev(bond, dev);
>> if (!slave || !slave_do_arp_validate(bond, slave))
>> goto out_unlock;
>>
>> if (!pskb_may_pull(skb, arp_hdr_len(dev)))
>> goto out_unlock;
>>
>>- arp = arp_hdr(skb);
>>+ /* Can't use arp_hdr; it's not initialized yet. */
>>+ arp = (struct arphdr *)skb->data;
>>+
>>+ pr_debug("arp: hln %d p_t %d hrd %x pro %x pln %d\n",
>>+ arp->ar_hln, skb->pkt_type, ntohs(arp->ar_hrd),
>>+ ntohs(arp->ar_pro), arp->ar_pln);
>>+
>> if (arp->ar_hln != dev->addr_len ||
>> skb->pkt_type == PACKET_OTHERHOST ||
>> skb->pkt_type == PACKET_LOOPBACK ||
>>@@ -2732,29 +2732,67 @@ static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct pack
>> arp_ptr += 4 + dev->addr_len;
>> memcpy(&tip, arp_ptr, 4);
>>
>>- pr_debug("bond_arp_rcv: %s %s/%d av %d sv %d sip %pI4 tip %pI4\n",
>>- bond->dev->name, slave->dev->name, slave->state,
>>- bond->params.arp_validate, slave_do_arp_validate(bond, slave),
>>- &sip, &tip);
>>-
>>- /*
>>- * Backup slaves won't see the ARP reply, but do come through
>>- * here for each ARP probe (so we swap the sip/tip to validate
>>- * the probe). In a "redundant switch, common router" type of
>>- * configuration, the ARP probe will (hopefully) travel from
>>- * the active, through one switch, the router, then the other
>>- * switch before reaching the backup.
>>- */
>>- if (slave->state == BOND_STATE_ACTIVE)
>>+ switch (ntohs(arp->ar_op)) {
>>+ case ARPOP_REPLY:
>> bond_validate_arp(bond, slave, sip, tip);
>>- else
>>+ break;
>>+ case ARPOP_REQUEST:
>> bond_validate_arp(bond, slave, tip, sip);
>>+ break;
>>+ }
>>
>> out_unlock:
>> read_unlock(&bond->lock);
>>-out:
>>- dev_kfree_skb(skb);
>>- return NET_RX_SUCCESS;
>>+}
>>+
>>+/*
>>+ * Called by core packet processing in netif_receive_skb and in VLAN fast
>>+ * path to (a) determine if packet should be dropped, and (b) to perform
>>+ * ARP monitor processing (last_rx, validation).
>>+ *
>>+ * For the VLAN case, called before the skb->dev is reassigned to the
>>+ * VLAN.
>>+ *
>>+ * Returns 1 if frame should nominally be dropped; 0 if it should be kept.
>>+ *
>>+ * We want to keep:
>>+ * - all traffic on active slaves
>>+ * - some traffic on inactive slaves: non-broadcast and non-multicast in
>>+ * ALB/TLB mode and LACP frames in 802.3ad mode.
>>+ *
>>+ * ARP frames are handled as normal traffic; the ARP monitor handling
>>+ * takes place here, so they need not be kept on inactive slaves.
>>+ */
>>+static int bond_handle_frame(struct sk_buff *skb)
>>+{
>>+ struct net_device *dev;
>>+ struct net_device *master;
>>+
>>+ dev = skb->dev;
>>+ master = dev->master;
>>+ if (!master || !master->priv_flags & IFF_BONDING)
>>+ return 0;
>>+
>>+ if (master->priv_flags & IFF_MASTER_ARPMON) {
>>+ dev->last_rx = jiffies;
>>+ if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>>+ skb->protocol == __constant_htons(ETH_P_ARP))
>>+ bond_handle_arp(skb, dev);
>>+ }
>>+
>>+ if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>>+ if (master->priv_flags & IFF_MASTER_ALB) {
>>+ if (skb->pkt_type != PACKET_BROADCAST &&
>>+ skb->pkt_type != PACKET_MULTICAST)
>>+ return 0;
>>+ }
>>+ if (master->priv_flags & IFF_MASTER_8023AD &&
>>+ skb->protocol == __constant_htons(ETH_P_SLOW))
>>+ return 0;
>>+
>>+ return 1;
>>+ }
>>+ return 0;
>> }
>>
>> /*
>>@@ -3688,6 +3726,7 @@ static void bond_unregister_lacpdu(struct bonding *bond)
>>
>> void bond_register_arp(struct bonding *bond)
>> {
>>+#if 0
>> struct packet_type *pt = &bond->arp_mon_pt;
>>
>> if (pt->type)
>>@@ -3697,14 +3736,17 @@ void bond_register_arp(struct bonding *bond)
>> pt->dev = bond->dev;
>> pt->func = bond_arp_rcv;
>> dev_add_pack(pt);
>>+#endif
>> }
>>
>> void bond_unregister_arp(struct bonding *bond)
>> {
>>+#if 0
>> struct packet_type *pt = &bond->arp_mon_pt;
>>
>> dev_remove_pack(pt);
>> pt->type = 0;
>>+#endif
>> }
>>
>> /*---------------------------- Hashing Policies -----------------------------*/
>>@@ -5188,6 +5230,8 @@ out_rtnl:
>> return res;
>> }
>>
>>+extern int (*bond_handle_frame_hook)(struct sk_buff *skb);
>>+
>> static int __init bonding_init(void)
>> {
>> int i;
>>@@ -5221,6 +5265,8 @@ static int __init bonding_init(void)
>> register_inetaddr_notifier(&bond_inetaddr_notifier);
>> bond_register_ipv6_notifier();
>>
>>+ bond_handle_frame_hook = bond_handle_frame;
>>+
>> goto out;
>> err:
>> list_for_each_entry(bond, &bond_dev_list, bond_list) {
>>@@ -5249,6 +5295,9 @@ static void __exit bonding_exit(void)
>> rtnl_lock();
>> bond_free_all();
>> rtnl_unlock();
>>+
>>+ bond_handle_frame_hook = NULL;
>>+
>> }
>>
>> module_init(bonding_init);
>>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>>index 18cf478..0e9f0e9 100644
>>--- a/drivers/net/bonding/bond_sysfs.c
>>+++ b/drivers/net/bonding/bond_sysfs.c
>>@@ -484,8 +484,9 @@ static ssize_t bonding_store_arp_validate(struct device *d,
>> struct device_attribute *attr,
>> const char *buf, size_t count)
>> {
>>- int new_value;
>>+ int new_value, i;
>> struct bonding *bond = to_bond(d);
>>+ struct slave *slave;
>>
>> new_value = bond_parse_parm(buf, arp_validate_tbl);
>> if (new_value < 0) {
>>@@ -504,13 +505,12 @@ static ssize_t bonding_store_arp_validate(struct device *d,
>> bond->dev->name, arp_validate_tbl[new_value].modename,
>> new_value);
>>
>>- if (!bond->params.arp_validate && new_value) {
>>- bond_register_arp(bond);
>>- } else if (bond->params.arp_validate && !new_value) {
>>- bond_unregister_arp(bond);
>>- }
>>+ if (bond->params.arp_validate != new_value) {
>>+ bond->params.arp_validate = new_value;
>>
>>- bond->params.arp_validate = new_value;
>>+ bond_for_each_slave(bond, slave, i)
>>+ bond_fix_slave_validate_flag(bond, slave);
>>+ }
>>
>> return count;
>> }
>>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>>index ca849d2..275f08d 100644
>>--- a/drivers/net/bonding/bonding.h
>>+++ b/drivers/net/bonding/bonding.h
>>@@ -283,6 +283,15 @@ static inline unsigned long slave_last_rx(struct bonding *bond,
>> return slave->dev->last_rx;
>> }
>>
>>+static inline void bond_fix_slave_validate_flag(struct bonding *bond,
>>+ struct slave *slave)
>>+{
>>+ if (slave_do_arp_validate(bond, slave))
>>+ slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
>>+ else
>>+ slave->dev->priv_flags &= ~IFF_SLAVE_NEEDARP;
>>+}
>>+
>> static inline void bond_set_slave_inactive_flags(struct slave *slave)
>> {
>> struct bonding *bond = netdev_priv(slave->dev->master);
>>@@ -290,14 +299,16 @@ static inline void bond_set_slave_inactive_flags(struct slave *slave)
>> bond->params.mode != BOND_MODE_ALB)
>> slave->state = BOND_STATE_BACKUP;
>> slave->dev->priv_flags |= IFF_SLAVE_INACTIVE;
>>- if (slave_do_arp_validate(bond, slave))
>>- slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
>>+ bond_fix_slave_validate_flag(bond, slave);
>> }
>>
>> static inline void bond_set_slave_active_flags(struct slave *slave)
>> {
>>+ struct bonding *bond = netdev_priv(slave->dev->master);
>>+
>> slave->state = BOND_STATE_ACTIVE;
>>- slave->dev->priv_flags &= ~(IFF_SLAVE_INACTIVE | IFF_SLAVE_NEEDARP);
>>+ slave->dev->priv_flags &= ~IFF_SLAVE_INACTIVE;
>>+ bond_fix_slave_validate_flag(bond, slave);
>> }
>>
>> static inline void bond_set_master_3ad_flags(struct bonding *bond)
>>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>index 2e7783f..3dafd8f 100644
>>--- a/include/linux/netdevice.h
>>+++ b/include/linux/netdevice.h
>>@@ -1874,6 +1874,7 @@ static inline void netif_set_gso_max_size(struct net_device *dev,
>> dev->gso_max_size = size;
>> }
>>
>>+#if 0
>> /* On bonding slaves other than the currently active slave, suppress
>> * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
>> * ARP on active-backup slaves with arp_validate enabled.
>>@@ -1906,6 +1907,8 @@ static inline int skb_bond_should_drop(struct sk_buff *skb)
>> }
>> return 0;
>> }
>>+#endif
>>+extern int skb_bond_should_drop(struct sk_buff *skb);
>>
>> extern struct pernet_operations __net_initdata loopback_net_ops;
>> #endif /* __KERNEL__ */
>>diff --git a/net/core/dev.c b/net/core/dev.c
>>index 91d792d..ac5a37e 100644
>>--- a/net/core/dev.c
>>+++ b/net/core/dev.c
>>@@ -2098,6 +2098,56 @@ static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
>> #define handle_macvlan(skb, pt_prev, ret, orig_dev) (skb)
>> #endif
>>
>>+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
>>+int (*bond_handle_frame_hook)(struct sk_buff *skb);
>>+EXPORT_SYMBOL_GPL(bond_handle_frame_hook);
>>+
>>+/* On bonding slaves other than the currently active slave, suppress
>>+ * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
>>+ * ARP on active-backup slaves with arp_validate enabled.
>>+ */
>>+int skb_bond_should_drop(struct sk_buff *skb)
>>+{
>>+ struct net_device *dev = skb->dev;
>>+ struct net_device *master = dev->master;
>>+
>>+ if (master)
>>+ return bond_handle_frame_hook(skb);
>
>Maybe this hook can be called from netif_receive_skb() directly. You would safe
>at least 2 dereferences, 1 if check. You would also need to add
>"skb->dev->master &&" to if in __vlan_hwaccel_rx() and vlan_gro_common().

This won't work, because the VLAN code reassigns skb->dev to the
VLAN device before calling netif_receive_skb.

In the VLAN path, the second call to skb_bond_should_drop (the
first is within the VLAN code itself, __vlan_hwaccel_rx or
vlan_gro_common, the second is netif_receive_skb) won't call the hook,
because the VLAN device has no dev->master.

This is the whole reason for the hook: to process the ARP before
VLAN reassigns skb->dev. Once that happens, the actual device the ARP
arrived on is lost.

-J

>>+
>>+ return 0;
>>+
>>+#if 0
>>+ if (master->priv_flags & IFF_MASTER_ARPMON)
>>+ dev->last_rx = jiffies;
>>+
>>+ if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>>+ if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>>+ skb->protocol == __constant_htons(ETH_P_ARP))
>>+ return 0;
>>+
>>+ if (master->priv_flags & IFF_MASTER_ALB) {
>>+ if (skb->pkt_type != PACKET_BROADCAST &&
>>+ skb->pkt_type != PACKET_MULTICAST)
>>+ return 0;
>>+ }
>>+ if (master->priv_flags & IFF_MASTER_8023AD &&
>>+ skb->protocol == __constant_htons(ETH_P_SLOW))
>>+ return 0;
>>+
>>+ return 1;
>>+ }
>>+ }
>>+ return 0;
>>+#endif /* 0 */
>>+}
>>+#else
>>+int skb_bond_should_drop(struct sk_buff *skb)
>>+{
>>+ return 0;
>>+}
>>+#endif
>>+EXPORT_SYMBOL_GPL(skb_bond_should_drop);
>>+
>> #ifdef CONFIG_NET_CLS_ACT
>> /* TODO: Maybe we should just force sch_ingress to be compiled in
>> * when CONFIG_NET_CLS_ACT is? otherwise some useless instructions
>>
>>
>>---
>> -Jay Vosburgh, IBM Linux Technology Center, [email protected]
>>--
>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>the body of a message to [email protected]
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-04-23 15:21:00

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

Thu, Apr 23, 2009 at 04:59:41PM CEST, [email protected] wrote:
>Jiri Pirko <[email protected]> wrote:
>
>>Thu, Apr 23, 2009 at 03:12:29AM CEST, [email protected] wrote:
>>>stefan novak <[email protected]> wrote:
>>>
>>>>so its a bug in kernel?
>>>
>>> Yes. I think the following patch should add support for
>>>arp_validate over VLANs, could you test this? This is still a work in
>>>progress, so it's pretty rough around the edges, but the core
>>>functionality should be there.
>>>
>>> This works by moving the skb_bond_should_drop logic around, and
>>>replaces the current inline code with a hook into bonding to do all of
>>>that, plus additional logic. The reason for a hook is to get the
>>>skb_bond_should_drop callers from the VLAN input path before the input
>>>device is assigned to the VLAN device.
>>>
>>> -J
>>>
>>>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>>index 99610f3..cc367a3 100644
>>>--- a/drivers/net/bonding/bond_main.c
>>>+++ b/drivers/net/bonding/bond_main.c
>>>@@ -1607,6 +1607,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>>> read_lock(&bond->lock);
>>>
>>> new_slave->last_arp_rx = jiffies;
>>>+ bond_fix_slave_validate_flag(bond, new_slave);
>>>
>>> if (bond->params.miimon && !bond->params.use_carrier) {
>>> link_reporting = bond_check_dev_link(bond, slave_dev, 1);
>>>@@ -2538,8 +2539,8 @@ static void bond_arp_send(struct net_device *slave_dev, int arp_op, __be32 dest_
>>> {
>>> struct sk_buff *skb;
>>>
>>>- pr_debug("arp %d on slave %s: dst %x src %x vid %d\n", arp_op,
>>>- slave_dev->name, dest_ip, src_ip, vlan_id);
>>>+ pr_debug("arp %d on slave %s: dst %x src %x vid %d j %lu\n", arp_op,
>>>+ slave_dev->name, dest_ip, src_ip, vlan_id, jiffies);
>>>
>>> skb = arp_create(arp_op, ETH_P_ARP, dest_ip, slave_dev, src_ip,
>>> NULL, slave_dev->dev_addr, NULL);
>>>@@ -2679,8 +2680,6 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
>>>
>>> targets = bond->params.arp_targets;
>>> for (i = 0; (i < BOND_MAX_ARP_TARGETS) && targets[i]; i++) {
>>>- pr_debug("bva: sip %pI4 tip %pI4 t[%d] %pI4 bhti(tip) %d\n",
>>>- &sip, &tip, i, &targets[i], bond_has_this_ip(bond, tip));
>>> if (sip == targets[i]) {
>>> if (bond_has_this_ip(bond, tip))
>>> slave->last_arp_rx = jiffies;
>>>@@ -2689,35 +2688,36 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
>>> }
>>> }
>>>
>>>-static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
>>>+static void bond_handle_arp(struct sk_buff *skb, struct net_device *dev)
>>> {
>>> struct arphdr *arp;
>>> struct slave *slave;
>>>+ struct net_device *bond_dev = dev->master;
>>> struct bonding *bond;
>>> unsigned char *arp_ptr;
>>> __be32 sip, tip;
>>>
>>> if (dev_net(dev) != &init_net)
>>>- goto out;
>>>+ return;
>>>
>>>- if (!(dev->priv_flags & IFF_BONDING) || !(dev->flags & IFF_MASTER))
>>>- goto out;
>>>+ bond = netdev_priv(bond_dev);
>>>
>>>- bond = netdev_priv(dev);
>>> read_lock(&bond->lock);
>>>
>>>- pr_debug("bond_arp_rcv: bond %s skb->dev %s orig_dev %s\n",
>>>- bond->dev->name, skb->dev ? skb->dev->name : "NULL",
>>>- orig_dev ? orig_dev->name : "NULL");
>>>-
>>>- slave = bond_get_slave_by_dev(bond, orig_dev);
>>>+ slave = bond_get_slave_by_dev(bond, dev);
>>> if (!slave || !slave_do_arp_validate(bond, slave))
>>> goto out_unlock;
>>>
>>> if (!pskb_may_pull(skb, arp_hdr_len(dev)))
>>> goto out_unlock;
>>>
>>>- arp = arp_hdr(skb);
>>>+ /* Can't use arp_hdr; it's not initialized yet. */
>>>+ arp = (struct arphdr *)skb->data;
>>>+
>>>+ pr_debug("arp: hln %d p_t %d hrd %x pro %x pln %d\n",
>>>+ arp->ar_hln, skb->pkt_type, ntohs(arp->ar_hrd),
>>>+ ntohs(arp->ar_pro), arp->ar_pln);
>>>+
>>> if (arp->ar_hln != dev->addr_len ||
>>> skb->pkt_type == PACKET_OTHERHOST ||
>>> skb->pkt_type == PACKET_LOOPBACK ||
>>>@@ -2732,29 +2732,67 @@ static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct pack
>>> arp_ptr += 4 + dev->addr_len;
>>> memcpy(&tip, arp_ptr, 4);
>>>
>>>- pr_debug("bond_arp_rcv: %s %s/%d av %d sv %d sip %pI4 tip %pI4\n",
>>>- bond->dev->name, slave->dev->name, slave->state,
>>>- bond->params.arp_validate, slave_do_arp_validate(bond, slave),
>>>- &sip, &tip);
>>>-
>>>- /*
>>>- * Backup slaves won't see the ARP reply, but do come through
>>>- * here for each ARP probe (so we swap the sip/tip to validate
>>>- * the probe). In a "redundant switch, common router" type of
>>>- * configuration, the ARP probe will (hopefully) travel from
>>>- * the active, through one switch, the router, then the other
>>>- * switch before reaching the backup.
>>>- */
>>>- if (slave->state == BOND_STATE_ACTIVE)
>>>+ switch (ntohs(arp->ar_op)) {
>>>+ case ARPOP_REPLY:
>>> bond_validate_arp(bond, slave, sip, tip);
>>>- else
>>>+ break;
>>>+ case ARPOP_REQUEST:
>>> bond_validate_arp(bond, slave, tip, sip);
>>>+ break;
>>>+ }
>>>
>>> out_unlock:
>>> read_unlock(&bond->lock);
>>>-out:
>>>- dev_kfree_skb(skb);
>>>- return NET_RX_SUCCESS;
>>>+}
>>>+
>>>+/*
>>>+ * Called by core packet processing in netif_receive_skb and in VLAN fast
>>>+ * path to (a) determine if packet should be dropped, and (b) to perform
>>>+ * ARP monitor processing (last_rx, validation).
>>>+ *
>>>+ * For the VLAN case, called before the skb->dev is reassigned to the
>>>+ * VLAN.
>>>+ *
>>>+ * Returns 1 if frame should nominally be dropped; 0 if it should be kept.
>>>+ *
>>>+ * We want to keep:
>>>+ * - all traffic on active slaves
>>>+ * - some traffic on inactive slaves: non-broadcast and non-multicast in
>>>+ * ALB/TLB mode and LACP frames in 802.3ad mode.
>>>+ *
>>>+ * ARP frames are handled as normal traffic; the ARP monitor handling
>>>+ * takes place here, so they need not be kept on inactive slaves.
>>>+ */
>>>+static int bond_handle_frame(struct sk_buff *skb)
>>>+{
>>>+ struct net_device *dev;
>>>+ struct net_device *master;
>>>+
>>>+ dev = skb->dev;
>>>+ master = dev->master;
>>>+ if (!master || !master->priv_flags & IFF_BONDING)
>>>+ return 0;
>>>+
>>>+ if (master->priv_flags & IFF_MASTER_ARPMON) {
>>>+ dev->last_rx = jiffies;
>>>+ if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>>>+ skb->protocol == __constant_htons(ETH_P_ARP))
>>>+ bond_handle_arp(skb, dev);
>>>+ }
>>>+
>>>+ if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>>>+ if (master->priv_flags & IFF_MASTER_ALB) {
>>>+ if (skb->pkt_type != PACKET_BROADCAST &&
>>>+ skb->pkt_type != PACKET_MULTICAST)
>>>+ return 0;
>>>+ }
>>>+ if (master->priv_flags & IFF_MASTER_8023AD &&
>>>+ skb->protocol == __constant_htons(ETH_P_SLOW))
>>>+ return 0;
>>>+
>>>+ return 1;
>>>+ }
>>>+ return 0;
>>> }
>>>
>>> /*
>>>@@ -3688,6 +3726,7 @@ static void bond_unregister_lacpdu(struct bonding *bond)
>>>
>>> void bond_register_arp(struct bonding *bond)
>>> {
>>>+#if 0
>>> struct packet_type *pt = &bond->arp_mon_pt;
>>>
>>> if (pt->type)
>>>@@ -3697,14 +3736,17 @@ void bond_register_arp(struct bonding *bond)
>>> pt->dev = bond->dev;
>>> pt->func = bond_arp_rcv;
>>> dev_add_pack(pt);
>>>+#endif
>>> }
>>>
>>> void bond_unregister_arp(struct bonding *bond)
>>> {
>>>+#if 0
>>> struct packet_type *pt = &bond->arp_mon_pt;
>>>
>>> dev_remove_pack(pt);
>>> pt->type = 0;
>>>+#endif
>>> }
>>>
>>> /*---------------------------- Hashing Policies -----------------------------*/
>>>@@ -5188,6 +5230,8 @@ out_rtnl:
>>> return res;
>>> }
>>>
>>>+extern int (*bond_handle_frame_hook)(struct sk_buff *skb);
>>>+
>>> static int __init bonding_init(void)
>>> {
>>> int i;
>>>@@ -5221,6 +5265,8 @@ static int __init bonding_init(void)
>>> register_inetaddr_notifier(&bond_inetaddr_notifier);
>>> bond_register_ipv6_notifier();
>>>
>>>+ bond_handle_frame_hook = bond_handle_frame;
>>>+
>>> goto out;
>>> err:
>>> list_for_each_entry(bond, &bond_dev_list, bond_list) {
>>>@@ -5249,6 +5295,9 @@ static void __exit bonding_exit(void)
>>> rtnl_lock();
>>> bond_free_all();
>>> rtnl_unlock();
>>>+
>>>+ bond_handle_frame_hook = NULL;
>>>+
>>> }
>>>
>>> module_init(bonding_init);
>>>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>>>index 18cf478..0e9f0e9 100644
>>>--- a/drivers/net/bonding/bond_sysfs.c
>>>+++ b/drivers/net/bonding/bond_sysfs.c
>>>@@ -484,8 +484,9 @@ static ssize_t bonding_store_arp_validate(struct device *d,
>>> struct device_attribute *attr,
>>> const char *buf, size_t count)
>>> {
>>>- int new_value;
>>>+ int new_value, i;
>>> struct bonding *bond = to_bond(d);
>>>+ struct slave *slave;
>>>
>>> new_value = bond_parse_parm(buf, arp_validate_tbl);
>>> if (new_value < 0) {
>>>@@ -504,13 +505,12 @@ static ssize_t bonding_store_arp_validate(struct device *d,
>>> bond->dev->name, arp_validate_tbl[new_value].modename,
>>> new_value);
>>>
>>>- if (!bond->params.arp_validate && new_value) {
>>>- bond_register_arp(bond);
>>>- } else if (bond->params.arp_validate && !new_value) {
>>>- bond_unregister_arp(bond);
>>>- }
>>>+ if (bond->params.arp_validate != new_value) {
>>>+ bond->params.arp_validate = new_value;
>>>
>>>- bond->params.arp_validate = new_value;
>>>+ bond_for_each_slave(bond, slave, i)
>>>+ bond_fix_slave_validate_flag(bond, slave);
>>>+ }
>>>
>>> return count;
>>> }
>>>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>>>index ca849d2..275f08d 100644
>>>--- a/drivers/net/bonding/bonding.h
>>>+++ b/drivers/net/bonding/bonding.h
>>>@@ -283,6 +283,15 @@ static inline unsigned long slave_last_rx(struct bonding *bond,
>>> return slave->dev->last_rx;
>>> }
>>>
>>>+static inline void bond_fix_slave_validate_flag(struct bonding *bond,
>>>+ struct slave *slave)
>>>+{
>>>+ if (slave_do_arp_validate(bond, slave))
>>>+ slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
>>>+ else
>>>+ slave->dev->priv_flags &= ~IFF_SLAVE_NEEDARP;
>>>+}
>>>+
>>> static inline void bond_set_slave_inactive_flags(struct slave *slave)
>>> {
>>> struct bonding *bond = netdev_priv(slave->dev->master);
>>>@@ -290,14 +299,16 @@ static inline void bond_set_slave_inactive_flags(struct slave *slave)
>>> bond->params.mode != BOND_MODE_ALB)
>>> slave->state = BOND_STATE_BACKUP;
>>> slave->dev->priv_flags |= IFF_SLAVE_INACTIVE;
>>>- if (slave_do_arp_validate(bond, slave))
>>>- slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
>>>+ bond_fix_slave_validate_flag(bond, slave);
>>> }
>>>
>>> static inline void bond_set_slave_active_flags(struct slave *slave)
>>> {
>>>+ struct bonding *bond = netdev_priv(slave->dev->master);
>>>+
>>> slave->state = BOND_STATE_ACTIVE;
>>>- slave->dev->priv_flags &= ~(IFF_SLAVE_INACTIVE | IFF_SLAVE_NEEDARP);
>>>+ slave->dev->priv_flags &= ~IFF_SLAVE_INACTIVE;
>>>+ bond_fix_slave_validate_flag(bond, slave);
>>> }
>>>
>>> static inline void bond_set_master_3ad_flags(struct bonding *bond)
>>>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>index 2e7783f..3dafd8f 100644
>>>--- a/include/linux/netdevice.h
>>>+++ b/include/linux/netdevice.h
>>>@@ -1874,6 +1874,7 @@ static inline void netif_set_gso_max_size(struct net_device *dev,
>>> dev->gso_max_size = size;
>>> }
>>>
>>>+#if 0
>>> /* On bonding slaves other than the currently active slave, suppress
>>> * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
>>> * ARP on active-backup slaves with arp_validate enabled.
>>>@@ -1906,6 +1907,8 @@ static inline int skb_bond_should_drop(struct sk_buff *skb)
>>> }
>>> return 0;
>>> }
>>>+#endif
>>>+extern int skb_bond_should_drop(struct sk_buff *skb);
>>>
>>> extern struct pernet_operations __net_initdata loopback_net_ops;
>>> #endif /* __KERNEL__ */
>>>diff --git a/net/core/dev.c b/net/core/dev.c
>>>index 91d792d..ac5a37e 100644
>>>--- a/net/core/dev.c
>>>+++ b/net/core/dev.c
>>>@@ -2098,6 +2098,56 @@ static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
>>> #define handle_macvlan(skb, pt_prev, ret, orig_dev) (skb)
>>> #endif
>>>
>>>+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
>>>+int (*bond_handle_frame_hook)(struct sk_buff *skb);
>>>+EXPORT_SYMBOL_GPL(bond_handle_frame_hook);
>>>+
>>>+/* On bonding slaves other than the currently active slave, suppress
>>>+ * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
>>>+ * ARP on active-backup slaves with arp_validate enabled.
>>>+ */
>>>+int skb_bond_should_drop(struct sk_buff *skb)
>>>+{
>>>+ struct net_device *dev = skb->dev;
>>>+ struct net_device *master = dev->master;
>>>+
>>>+ if (master)
>>>+ return bond_handle_frame_hook(skb);
>>
>>Maybe this hook can be called from netif_receive_skb() directly. You would safe
>>at least 2 dereferences, 1 if check. You would also need to add
>>"skb->dev->master &&" to if in __vlan_hwaccel_rx() and vlan_gro_common().
>
> This won't work, because the VLAN code reassigns skb->dev to the
>VLAN device before calling netif_receive_skb.

Sure, but bond_should_drop is called before it actually reassigns that. So the
check in bond_should_drop tests "original_dev->master".

I had on mind something like following:

Signed-off-by: Jiri Pirko <[email protected]>

diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index c67fe6f..87a7334 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -11,7 +11,7 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
if (netpoll_rx(skb))
return NET_RX_DROP;

- if (skb_bond_should_drop(skb))
+ if (skb->dev->master && bond_handle_frame_hook(skb))
goto drop;

skb->vlan_tci = vlan_tci;
@@ -79,7 +79,7 @@ static int vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp,
{
struct sk_buff *p;

- if (skb_bond_should_drop(skb))
+ if (skb->dev->master && bond_handle_frame_hook(skb))
goto drop;

skb->vlan_tci = vlan_tci;
diff --git a/net/core/dev.c b/net/core/dev.c
index 0590b2a..280e3de 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2282,7 +2282,7 @@ int netif_receive_skb(struct sk_buff *skb)
null_or_orig = NULL;
orig_dev = skb->dev;
if (orig_dev->master) {
- if (skb_bond_should_drop(skb))
+ if (bond_handle_frame_hook(skb))
null_or_orig = orig_dev; /* deliver only exact match */
else
skb->dev = orig_dev->master;

Note I did not even compile that.

>
> In the VLAN path, the second call to skb_bond_should_drop (the
>first is within the VLAN code itself, __vlan_hwaccel_rx or
>vlan_gro_common, the second is netif_receive_skb) won't call the hook,
>because the VLAN device has no dev->master.
>
> This is the whole reason for the hook: to process the ARP before
>VLAN reassigns skb->dev. Once that happens, the actual device the ARP
>arrived on is lost.
>
> -J
>
>
>>>+
>>>+ return 0;
>>>+
>>>+#if 0
>>>+ if (master->priv_flags & IFF_MASTER_ARPMON)
>>>+ dev->last_rx = jiffies;
>>>+
>>>+ if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>>>+ if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>>>+ skb->protocol == __constant_htons(ETH_P_ARP))
>>>+ return 0;
>>>+
>>>+ if (master->priv_flags & IFF_MASTER_ALB) {
>>>+ if (skb->pkt_type != PACKET_BROADCAST &&
>>>+ skb->pkt_type != PACKET_MULTICAST)
>>>+ return 0;
>>>+ }
>>>+ if (master->priv_flags & IFF_MASTER_8023AD &&
>>>+ skb->protocol == __constant_htons(ETH_P_SLOW))
>>>+ return 0;
>>>+
>>>+ return 1;
>>>+ }
>>>+ }
>>>+ return 0;
>>>+#endif /* 0 */
>>>+}
>>>+#else
>>>+int skb_bond_should_drop(struct sk_buff *skb)
>>>+{
>>>+ return 0;
>>>+}
>>>+#endif
>>>+EXPORT_SYMBOL_GPL(skb_bond_should_drop);
>>>+
>>> #ifdef CONFIG_NET_CLS_ACT
>>> /* TODO: Maybe we should just force sch_ingress to be compiled in
>>> * when CONFIG_NET_CLS_ACT is? otherwise some useless instructions
>>>
>>>
>>>---
>>> -Jay Vosburgh, IBM Linux Technology Center, [email protected]
>>>--
>>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>the body of a message to [email protected]
>>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>--
>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>the body of a message to [email protected]
>>More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-04-23 15:38:44

by Jay Vosburgh

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

Eric Dumazet <[email protected]> wrote:

>Jay Vosburgh a écrit :
>> stefan novak <[email protected]> wrote:
>>
>>> so its a bug in kernel?
>>
>> Yes. I think the following patch should add support for
>> arp_validate over VLANs, could you test this? This is still a work in
>> progress, so it's pretty rough around the edges, but the core
>> functionality should be there.
>>
>> This works by moving the skb_bond_should_drop logic around, and
>> replaces the current inline code with a hook into bonding to do all of
>> that, plus additional logic. The reason for a hook is to get the
>> skb_bond_should_drop callers from the VLAN input path before the input
>> device is assigned to the VLAN device.
>>
>> -J
>
>Hi Jay
>
>I wanted to test your patch and setup such VLAN config, and not yet applied your patch.
>
># cat /proc/net/bonding/bond1
>Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
>
>Bonding Mode: fault-tolerance (active-backup)
>Primary Slave: None
>Currently Active Slave: eth2
>MII Status: up
>MII Polling Interval (ms): 0
>Up Delay (ms): 0
>Down Delay (ms): 0
>ARP Polling Interval (ms): 250
>ARP IP target/s (n.n.n.n form): 192.168.20.254
>
>Slave Interface: eth1
>MII Status: up
>Link Failure Count: 8
>Permanent HW addr: 00:1e:0b:ec:d3:d2
>
>Slave Interface: eth2
>MII Status: up
>Link Failure Count: 11
>Permanent HW addr: 00:1e:0b:92:78:50
>
># grep . /sys/class/net/bond1/bonding/*
>/sys/class/net/bond1/bonding/active_slave:eth2
>/sys/class/net/bond1/bonding/ad_select:stable 0
>/sys/class/net/bond1/bonding/arp_interval:250
>/sys/class/net/bond1/bonding/arp_ip_target:192.168.20.254
>/sys/class/net/bond1/bonding/arp_validate:none 0

To test this patch, you'll need to set arp_validate to all.
Well, insuring that nothing breaks without arp_validate is good, too,
but the patch is supposed to make arp_validate function with an
arp_ip_target accessed via a VLAN.

>/sys/class/net/bond1/bonding/downdelay:0
>/sys/class/net/bond1/bonding/fail_over_mac:none 0
>/sys/class/net/bond1/bonding/lacp_rate:slow 0
>/sys/class/net/bond1/bonding/miimon:0
>/sys/class/net/bond1/bonding/mii_status:up
>/sys/class/net/bond1/bonding/mode:active-backup 1
>/sys/class/net/bond1/bonding/num_grat_arp:1
>/sys/class/net/bond1/bonding/num_unsol_na:1
>/sys/class/net/bond1/bonding/slaves:eth1 eth2
>/sys/class/net/bond1/bonding/updelay:0
>/sys/class/net/bond1/bonding/use_carrier:1
>/sys/class/net/bond1/bonding/xmit_hash_policy:layer2 0
>
>
>
>So active slave is eth2. Still I receive trafic on eth1, according to ifconfig :
># ifconfig eth1|grep packets;sleep 10;ifconfig eth1|grep packets
> RX packets:2098122 errors:0 dropped:0 overruns:0 frame:0
> TX packets:2053085 errors:0 dropped:0 overruns:0 carrier:0
> RX packets:2098162 errors:0 dropped:0 overruns:0 frame:0
> TX packets:2053085 errors:0 dropped:0 overruns:0 carrier:0
>
>exactly 4 packets per second.

This is expected, these are the broadcast ARP_REQUEST packets
the bond issues as probes, every 250 ms as specified by your
arp_interval. Any broadcasts on the switch or other traffic flooded to
all ports will come into the slave device, and (for the active-backup
inactive slave case) all of it is thrown away. There's some trickery in
netif_receive_skb for sockets that bind directly to the slave, but that
doesn't seem to affect tcpdump.

># ifconfig eth2|grep packets;sleep 10;ifconfig eth2|grep packets
> RX packets:3695512 errors:0 dropped:0 overruns:0 frame:0
> TX packets:3683799 errors:0 dropped:0 overruns:0 carrier:0
> RX packets:3695554 errors:0 dropped:0 overruns:0 frame:0
> TX packets:3683841 errors:0 dropped:0 overruns:0 carrier:0
>
>I also receive arp answers on eth2 (quite normal)
>
>I wanted to tcpdump on eth1 but got nothing :
>
># tcpdump -p -n -s 0 -i eth1
>tcpdump: WARNING: eth1: no IPv4 address assigned
>tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes

This is normal, as incoming traffic is assigned to the master
before the packet capture gets a look.

># tcpdump -p -n -s 0 -i eth2
>tcpdump: WARNING: eth2: no IPv4 address assigned
>tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
>07:54:48.430982 arp who-has 192.168.20.254 tell 192.168.20.110
>07:54:48.680980 arp who-has 192.168.20.254 tell 192.168.20.110
>07:54:48.930981 arp who-has 192.168.20.254 tell 192.168.20.110
>07:54:49.180980 arp who-has 192.168.20.254 tell 192.168.20.110
>07:54:49.430980 arp who-has 192.168.20.254 tell 192.168.20.110
>07:54:49.680980 arp who-has 192.168.20.254 tell 192.168.20.110

This is also normal (seeing only outbound traffic) for the same
reason as above.

># tcpdump -p -n -s 0 -i bond1
>tcpdump: WARNING: bond1: no IPv4 address assigned
>tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>listening on bond1, link-type EN10MB (Ethernet), capture size 65535 bytes
>07:55:28.681491 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:28.931493 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:29.181466 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:29.431496 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:29.681487 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:29.931492 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>
># tcpdump -p -n -s 0 -i bond1.103
>tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>listening on bond1.103, link-type EN10MB (Ethernet), capture size 65535 bytes
>07:55:58.681521 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:58.931636 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:59.181495 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:59.431496 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:59.681499 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:59.931499 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>
>Configuration script is
>
>ip link set eth1 up
>ip link set eth2 up
>
>ip addr flush dev eth1
>ip addr flush dev eth2
>
>ip link set eth1 up
>ip link set eth2 up
>
>modprobe bond1
>
>ifconfig bond1 down
>
># test du arp_check toutes les 250ms, en choissant l'HSRP du vlan 103 comme IP
>echo +192.168.20.254 >/sys/class/net/bond1/bonding/arp_ip_target
>echo 250 >/sys/class/net/bond1/bonding/arp_interval
>
># mode actif/passif (active-backup)
>echo 1 >/sys/class/net/bond1/bonding/mode
>
>ifconfig bond1 up
>
>ifenslave bond1 eth1 eth2
>
>ip link set eth1 up
>ip link set eth2 up
>
>ip link add link bond1 bond1.103 txqueuelen 100 type vlan id 103
>ip addr add 192.168.20.110/24 dev bond1.103
>ip link set bond1.103 up
>
>ip link add link bond1 bond1.825 txqueuelen 1000 type vlan id 825
>ip addr add 10.170.73.123/25 dev bond1.825
>ip link set bond1.825 up
>
>
>
>Is this setup OK to test your patch ?

I believe so, provided you enable arp_validate as mentioned
above.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [email protected]

2009-04-23 18:34:31

by Jay Vosburgh

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

Jiri Pirko <[email protected]> wrote:
>Thu, Apr 23, 2009 at 04:59:41PM CEST, [email protected] wrote:
>>Jiri Pirko <[email protected]> wrote:
[...]
>>>>+{
>>>>+ struct net_device *dev = skb->dev;
>>>>+ struct net_device *master = dev->master;
>>>>+
>>>>+ if (master)
>>>>+ return bond_handle_frame_hook(skb);
>>>
>>>Maybe this hook can be called from netif_receive_skb() directly. You would safe
>>>at least 2 dereferences, 1 if check. You would also need to add
>>>"skb->dev->master &&" to if in __vlan_hwaccel_rx() and vlan_gro_common().
>>
>> This won't work, because the VLAN code reassigns skb->dev to the
>>VLAN device before calling netif_receive_skb.
>
>Sure, but bond_should_drop is called before it actually reassigns that. So the
>check in bond_should_drop tests "original_dev->master".
>
>I had on mind something like following:
>
>Signed-off-by: Jiri Pirko <[email protected]>
>
>diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
>index c67fe6f..87a7334 100644
>--- a/net/8021q/vlan_core.c
>+++ b/net/8021q/vlan_core.c
>@@ -11,7 +11,7 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
> if (netpoll_rx(skb))
> return NET_RX_DROP;
>
>- if (skb_bond_should_drop(skb))
>+ if (skb->dev->master && bond_handle_frame_hook(skb))
[...]

Yah, ok, I see what you mean. The same could be accomplished by
turning skb_bond_should_drop back into an inline in the header file, and
hiding the fiddly bits from the calling context.

It's pretty grotty no matter how it's done; I'd prefer to avoid
the whole hook business, but I haven't thought of a less bad way.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [email protected]

2009-04-23 19:23:39

[permalink] [raw]

Subject: Re: bond interface arp, vlan and trunk / network question

Thu, Apr 23, 2009 at 08:34:20PM CEST, [email protected] wrote:
>Jiri Pirko <[email protected]> wrote:
>>Thu, Apr 23, 2009 at 04:59:41PM CEST, [email protected] wrote:
>>>Jiri Pirko <[email protected]> wrote:
>[...]
>>>>>+{
>>>>>+ struct net_device *dev = skb->dev;
>>>>>+ struct net_device *master = dev->master;
>>>>>+
>>>>>+ if (master)
>>>>>+ return bond_handle_frame_hook(skb);
>>>>
>>>>Maybe this hook can be called from netif_receive_skb() directly. You would safe
>>>>at least 2 dereferences, 1 if check. You would also need to add
>>>>"skb->dev->master &&" to if in __vlan_hwaccel_rx() and vlan_gro_common().
>>>
>>> This won't work, because the VLAN code reassigns skb->dev to the
>>>VLAN device before calling netif_receive_skb.
>>
>>Sure, but bond_should_drop is called before it actually reassigns that. So the
>>check in bond_should_drop tests "original_dev->master".
>>
>>I had on mind something like following:
>>
>>Signed-off-by: Jiri Pirko <[email protected]>
>>
>>diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
>>index c67fe6f..87a7334 100644
>>--- a/net/8021q/vlan_core.c
>>+++ b/net/8021q/vlan_core.c
>>@@ -11,7 +11,7 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
>> if (netpoll_rx(skb))
>> return NET_RX_DROP;
>>
>>- if (skb_bond_should_drop(skb))
>>+ if (skb->dev->master && bond_handle_frame_hook(skb))
>[...]
>
> Yah, ok, I see what you mean. The same could be accomplished by
>turning skb_bond_should_drop back into an inline in the header file, and

Well no exactly. In vlan_x certainly yes but in netif_receive_skb() you safe
something by not checking ->master twice and 2 dereferences. Just noting when I
see the waste...

>hiding the fiddly bits from the calling context.
>
> It's pretty grotty no matter how it's done; I'd prefer to avoid
>the whole hook business, but I haven't thought of a less bad way.

Yes I agree hooking this sucks :/ Anyway, is there any movement to eliminate
all these hooks from netif_receive_skb() ?

Jirka
>
> -J
>
>---
> -Jay Vosburgh, IBM Linux Technology Center, [email protected]