2007-10-31 18:44:13

by Dave Johnson

[permalink] [raw]
Subject: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?


Depending on the network driver, I'm seeing different behavior if
a .1q packet is received to an PF_PACKET, SOCK_RAW, ETH_P_ALL socket.


On devices what do not use NETIF_F_HW_VLAN_RX, the packet socket gets
the complete packet with vlan tag included as the driver simply calls
netif_receive_skb() or equivilant. packet_rcv() then gets the whole
thing vlan tag included and sends this through the socket.

vlan_skb_recv() also gets these all and will drop them because there
are no vlans configured.

Example, e100 driver gives this to tcpdump:

# ifconfig eth1 up
# tcpdump -s 2000 -e -n -i eth1
tcpdump: WARNING: eth1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 2000 bytes
14:11:03.707178 00:0b:82:05:22:0a > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 64: vlan 101, p 0, ethertype ARP, arp who-has 192.168.101.191 tell 192.168.101.131
14:11:04.215164 00:0b:82:05:22:05 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 64: vlan 101, p 0, ethertype ARP, arp who-has 192.168.101.191 tell 192.168.101.130
14:11:04.658940 00:0b:82:05:22:0c > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 64: vlan 101, p 0, ethertype ARP, arp who-has 192.168.101.191 tell 192.168.101.135
14:11:05.706070 00:0b:82:05:22:0a > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 64: vlan 101, p 0, ethertype ARP, arp who-has 192.168.101.191 tell 192.168.101.131
14:11:05.939195 00:b0:c2:e8:d8:1c > 33:33:00:00:00:01, ethertype 802.1Q (0x8100), length 122: vlan 108, p 0, ethertype IPv6, fe80::2b0:c2ff:fee8:d81c > ff02::1: icmp6: router advertisement [class 0xe0]
14:11:07.222302 00:b0:c2:e8:d8:1c > 33:33:00:00:00:01, ethertype 802.1Q (0x8100), length 122: vlan 110, p 0, ethertype IPv6, fe80::2b0:c2ff:fee8:d81c > ff02::1: icmp6: router advertisement [class 0xe0]
14:11:08.486953 00:b0:c2:e8:d8:1c > 01:00:5e:00:00:05, ethertype 802.1Q (0x8100), length 134: vlan 110, p 0, ethertype IPv4, IP 192.168.110.20 > 224.0.0.5: OSPFv2, Hello (1), length: 80
14:11:11.528569 00:30:48:22:63:50 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 154: vlan 208, p 0, ethertype IPv4, IP 195.180.3.200.33350 > 195.180.3.255.111: UDP, length: 108
14:11:12.642762 00:0b:82:05:22:05 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 64: vlan 101, p 0, ethertype ARP, arp who-has 192.168.101.191 tell 192.168.101.130
14:11:12.642766 00:0b:82:05:22:05 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 64: vlan 101, p 0, ethertype ARP, arp who-has 192.168.101.191 tell 192.168.101.130

The packet socket gets everything including the vlan tag as I'd
expect.


But on the bnx2 driver (for example) I get 2 different behaviors:

1)

If no vlan interfaces are configured, it calls netif_receive_skb()
because there isn't a vlan group registered via
bnx2_vlan_rx_register().

# ifconfig eth1 up
# tcpdump -s 2000 -e -n -i eth1
tcpdump: WARNING: eth1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 2000 bytes
14:21:27.170505 00:0b:82:05:22:05 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: arp who-has 192.168.101.191 tell 192.168.101.130
14:21:27.170577 00:0b:82:05:22:05 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: arp who-has 192.168.101.191 tell 192.168.101.130
14:21:27.495814 00:0b:82:05:22:0c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: arp who-has 192.168.101.191 tell 192.168.101.135
14:21:27.495881 00:0b:82:05:22:0c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: arp who-has 192.168.101.191 tell 192.168.101.135
14:21:28.151070 00:0b:82:05:22:05 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: arp who-has 192.168.101.191 tell 192.168.101.130
14:21:28.166780 00:b0:c2:e8:d8:1c > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 118: fe80::2b0:c2ff:fee8:d81c > ff02::1: icmp6: router advertisement [class 0xe0]
14:21:28.476404 00:0b:82:05:22:0c > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: arp who-has 192.168.101.191 tell 192.168.101.135
14:21:28.492099 00:b0:c2:e8:d8:1c > 01:00:5e:00:00:05, ethertype IPv4 (0x0800), length 130: IP 192.168.110.20 > 224.0.0.5: OSPFv2, Hello (1), length: 80
14:21:28.631439 00:19:b9:e7:8a:d7 > 33:33:ff:e7:8a:d7, ethertype IPv6 (0x86dd), length 78: :: > ff02::1:ffe7:8ad7: icmp6: neighbor sol: who has fd4d:5643:2886:67:219:b9ff:fee7:8ad7
14:21:28.671611 00:0b:82:05:22:0a > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: arp who-has 192.168.101.191 tell 192.168.101.131
14:21:28.671684 00:0b:82:05:22:0a > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: arp who-has 192.168.101.191 tell 192.168.101.131

the packet handed to netif_receive_skb() does not have the vlan tag on
it. this allows all these packets to be processed by not only the
packet ptype handler, but also ip, arp, etc... this seems very wrong
as all vlan packets are stripped of their tag, then processed by the
kernel as if they didn't have a tag to begin with.

Normally vlan_skb_recv() would be the only handler to get these
packets, but ip_rcv(), arp_rcv(), ipv6_rcv() etc.. can be called
instead!

ipv6 for example will pay attention to router advertisements and add
global scope addresses to the base device even though those packets
came in on a vlan that would otherwise be ignored.

the packet socket recipient also has no idea which vlan each packet
came in on.


2)

If there is one or more vlan interfaces configured, it calls
vlan_hwaccel_receive_skb().

# ifconfig eth1 up
# vconfig add eth1 300
# ifconfig eth1.300 up
# tcpdump -s 2000 -e -n -i eth1
tcpdump: WARNING: eth1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 2000 bytes
[Nothing....]

when vlan_hwaccel_receive_skb() gets the packets it discards
everything except vid 300 that I configured above.

Shouldn't it pass these unknown vids back to the base device as tagged
packets, especially if IFF_PROMISC is set?


What is the expected behavior for these? Should
vlan_hwaccel_receive_skb() shim a vlan tag back on the packet and send
it to the base device if there is no vlan device to send to? Also, is
it up to the individual driver to have a vlan tag on the packet if it
uses netif_receive_skb() as in case 1 above?


--
Dave Johnson
Starent Networks


2007-10-31 19:35:09

by Stephen Hemminger

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

On Wed, 31 Oct 2007 14:43:51 -0400
Dave Johnson <[email protected]> wrote:

>
> Depending on the network driver, I'm seeing different behavior if
> a .1q packet is received to an PF_PACKET, SOCK_RAW, ETH_P_ALL socket.
>
>
> On devices what do not use NETIF_F_HW_VLAN_RX, the packet socket gets
> the complete packet with vlan tag included as the driver simply calls
> netif_receive_skb() or equivilant. packet_rcv() then gets the whole
> thing vlan tag included and sends this through the socket.
>
> vlan_skb_recv() also gets these all and will drop them because there
> are no vlans configured.
>

The VLAN acceleration grabs and hides the tag. It is a design flaw
that should be fixed, feel free to post a patch.


--
Stephen Hemminger <[email protected]>

2007-11-01 01:10:25

by David Miller

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

From: Ben Greear <[email protected]>
Date: Wed, 31 Oct 2007 18:06:34 -0700

> DaveM did the HW Accel for VLANs if I remember correctly...perhaps he
> has some input?

Not really, I'm busy and also not motivated to work on this, someone
else will need to.

2007-11-01 01:23:57

by Stephen Hemminger

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

Ben Greear wrote:
> Stephen Hemminger wrote:
>> On Wed, 31 Oct 2007 14:43:51 -0400
>> Dave Johnson <[email protected]> wrote:
>>
>>
>>> Depending on the network driver, I'm seeing different behavior if
>>> a .1q packet is received to an PF_PACKET, SOCK_RAW, ETH_P_ALL socket.
>>>
>>>
>>> On devices what do not use NETIF_F_HW_VLAN_RX, the packet socket gets
>>> the complete packet with vlan tag included as the driver simply calls
>>> netif_receive_skb() or equivilant. packet_rcv() then gets the whole
>>> thing vlan tag included and sends this through the socket.
>>>
>>> vlan_skb_recv() also gets these all and will drop them because there
>>> are no vlans configured.
>>>
>>>
>>
>> The VLAN acceleration grabs and hides the tag. It is a design flaw
>> that should be fixed, feel free to post a patch.
>>
> There may be several ways to 'fix' this. Perhaps it would be worth
> discussing what
> we want the end result to be at least?
>
> Should we always pass the vlan header up to raw sockets as part of the
> data payload?
>
> Or, maybe pass it in an auxiliary message such as how timestamps may
> be passed?
>
> The first option seems cleaner, but maybe there are performance
> problems with this
> approach?
>
> We should also define what a NIC should do with VLANs it doesn't
> explicitly know
> about. I think it should pass them up the stack with VLAN tag
> intact, but again, perhaps
> there are reasons not to do that?
>
> DaveM did the HW Accel for VLANs if I remember correctly...perhaps he
> has some input?

The code in AF_PACKET should fix the skb before passing to user space so
that there is
no difference between accel and non-accel hardware. Internal choices
shouldn't
leak to user space. Ditto, the receive checksum offload should be fixed
up as well.

2007-11-01 01:31:27

by Ben Greear

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

Stephen Hemminger wrote:
>
> The code in AF_PACKET should fix the skb before passing to user space
> so that there is
> no difference between accel and non-accel hardware. Internal choices
> shouldn't
> leak to user space. Ditto, the receive checksum offload should be
> fixed up as well.
Ok, I guess that will fix the sniffing issues and any user-space
bridging type applications.

Currently, VLAN devices offer the ability to 'reorder' the header and
explicitly remove the VLAN
header. I assume we keep this feature and have the AF_PACKET logic
check the device
flags to see if it should insert the VLAN header for hw-accel vlans?

Either way, if we sniff the underlying device, we should always get the
VLAN header.

What about drivers and filtering VLANs? It seems there is still a
difference between software
vlans and hw-accel in this case.

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2007-11-01 01:50:27

by Ben Greear

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

Stephen Hemminger wrote:
> On Wed, 31 Oct 2007 14:43:51 -0400
> Dave Johnson <[email protected]> wrote:
>
>
>> Depending on the network driver, I'm seeing different behavior if
>> a .1q packet is received to an PF_PACKET, SOCK_RAW, ETH_P_ALL socket.
>>
>>
>> On devices what do not use NETIF_F_HW_VLAN_RX, the packet socket gets
>> the complete packet with vlan tag included as the driver simply calls
>> netif_receive_skb() or equivilant. packet_rcv() then gets the whole
>> thing vlan tag included and sends this through the socket.
>>
>> vlan_skb_recv() also gets these all and will drop them because there
>> are no vlans configured.
>>
>>
>
> The VLAN acceleration grabs and hides the tag. It is a design flaw
> that should be fixed, feel free to post a patch.
>
There may be several ways to 'fix' this. Perhaps it would be worth
discussing what
we want the end result to be at least?

Should we always pass the vlan header up to raw sockets as part of the
data payload?

Or, maybe pass it in an auxiliary message such as how timestamps may be
passed?

The first option seems cleaner, but maybe there are performance problems
with this
approach?

We should also define what a NIC should do with VLANs it doesn't
explicitly know
about. I think it should pass them up the stack with VLAN tag intact,
but again, perhaps
there are reasons not to do that?

DaveM did the HW Accel for VLANs if I remember correctly...perhaps he
has some input?

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2007-11-01 04:50:36

by David Miller

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

From: Stephen Hemminger <[email protected]>
Date: Wed, 31 Oct 2007 18:23:37 -0700

> The code in AF_PACKET should fix the skb before passing to user
> space so that there is no difference between accel and non-accel
> hardware. Internal choices shouldn't leak to user space. Ditto,
> the receive checksum offload should be fixed up as well.

The hardware has stripped the VLAN header completely and has not
provided it to us at all.

In my opinion trying to cobble one up by hand using the known TAG is
worse than not providing a VLAN header at all.

2007-11-01 15:04:56

by Ben Greear

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

David Miller wrote:
> From: Stephen Hemminger <[email protected]>
> Date: Wed, 31 Oct 2007 18:23:37 -0700
>
>
>> The code in AF_PACKET should fix the skb before passing to user
>> space so that there is no difference between accel and non-accel
>> hardware. Internal choices shouldn't leak to user space. Ditto,
>> the receive checksum offload should be fixed up as well.
>>
>
> The hardware has stripped the VLAN header completely and has not
> provided it to us at all.
>
Do the NICs not save the QoS bits in the VLAN header anywhere that we could
use to reconstitute the header?

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2007-11-01 21:35:52

by David Miller

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

From: Ben Greear <[email protected]>
Date: Thu, 01 Nov 2007 08:04:31 -0700

> David Miller wrote:
> > The hardware has stripped the VLAN header completely and has not
> > provided it to us at all.
> >
> Do the NICs not save the QoS bits in the VLAN header anywhere that we could
> use to reconstitute the header?

You get the 16-bit VLAN tag.

2007-11-01 21:38:52

by Dave Johnson

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

Ben Greear writes:
> We should also define what a NIC should do with VLANs it doesn't
> explicitly know about. I think it should pass them up the stack
> with VLAN tag intact, but again, perhaps there are reasons not to do
> that?

Unless the device also supports NETIF_F_HW_VLAN_FILTER, it has no idea
which vlans the kernel cares about, it's up to __vlan_hwaccel_rx().


Stephen Hemminger writes:
> The code in AF_PACKET should fix the skb before passing to user
> space so that there is no difference between accel and non-accel
> hardware. Internal choices shouldn't leak to user space. Ditto,
> the receive checksum offload should be fixed up as well.

yep. bad csum on tx packets as reported by tcpdump is also an issue.


Ben Greear writes:
> Currently, VLAN devices offer the ability to 'reorder' the header
> and explicitly remove the VLAN header. I assume we keep this
> feature and have the AF_PACKET logic check the device flags to see
> if it should insert the VLAN header for hw-accel vlans?
>
> Either way, if we sniff the underlying device, we should always get
> the VLAN header.

Yes, but it's more than just a packet socket issue.

A quick look through the hwaccel capable drivers (in 2.6.23) and most
are doing something like:

if (foo->vlgrp && packet_is_tagged)
vlan_hwaccel_receive_skb(skb, foo->vlgrp, vlan_tag);
else
netif_receive_skb(skb);

The important thing here is if the vlan group is NULL, the MAC must
be configured to NOT strip the tag.

users of NETIF_F_HW_VLAN_RX:
---------------------------
./drivers/net/8139cp.c: looks ok
./drivers/net/acenic.c: *1
./drivers/net/amd8111e.c: unsure, probably *1
./drivers/net/atl1/atl1_main.c: looks ok
./drivers/net/bnx2.c: *2
./drivers/net/bonding/bond_main.c: unsure, probably ok
./drivers/net/chelsio/cxgb2.c: looks ok
./drivers/net/cxgb3/cxgb3_main.c: looks ok
./drivers/net/e1000/e1000_main.c: looks ok
./drivers/net/ehea/ehea_main.c: unsure, probably ok
./drivers/net/forcedeth.c: looks ok
./drivers/net/gianfar.c: looks ok
./drivers/net/ixgb/ixgb_main.c: looks ok
./drivers/net/ns83820.c: unsure, probably ok
./drivers/net/r8169.c: looks ok
./drivers/net/s2io.c: *1
./drivers/net/sky2.c: looks ok
./drivers/net/starfire.c: unsure, probably ok
./drivers/net/tg3.c: *2
./drivers/net/typhoon.c: unsure, probably ok
./drivers/s390/net/qeth_main.c: unsure, probably ok

*1: Driver configures the MAC to strip TAGs even if vlan group is
NULL. MAC strips the tag, but driver calls netif_rx() or
netif_receive_skb() with the packet as untagged. Kernel
processes tagged packet as if it was received untagged. Possible
security issue.

*2: If chip supports 'ASF', tag is always stripped (see *1 above).
Looks ok if ASF is not supported.


Ben Greear writes:
> Do the NICs not save the QoS bits in the VLAN header anywhere that
> we could use to reconstitute the header?

Most likely, __vlan_hwaccel_rx() gets the whole 16bit tag and sets
skb->priority from on it.


Besides the accidental removal with the drivers listed above when
there is no vlan group registerd, we're still back to the original
issue.

Having __vlan_hwaccel_rx() send to the base device would likely
require a copy of the skb (at least the head). That completely defeats
the point of hwaccel.

At a minimum, __vlan_hwaccel_rx() should probably add the vlan header
back on if doesn't find a vlan device. This way no copy is needed,
just shove the header back on (we already have the full 16bits). Once
re-added, send to the base device instead of dropping.

That would fix the unknown vlan issue, but known vlans would only go
to the vlan device not the base device. Not sure of an easy fix for
this as af_packet can specifically bind to a specified base device. I
don't this this would be much of an issue and probably doesn't need
fixing.

--
Dave Johnson
Starent Networks

2007-11-01 21:50:22

by Rick Jones

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

>>The code in AF_PACKET should fix the skb before passing to user
>>space so that there is no difference between accel and non-accel
>>hardware. Internal choices shouldn't leak to user space. Ditto,
>>the receive checksum offload should be fixed up as well.
>
>
> yep. bad csum on tx packets as reported by tcpdump is also an issue.

With TX CKO enabled, there isn't any checksum to fixup when a tx packet is
sniffed, so I'm not sure what can be done in the kernel apart from an
unpalatable "disable CKO and all which depend upon it when entering promiscuous
mode." Having the tap calculate a checksum would be equally bad for
performance, and would frankly be incorrect anyway because it would give the
user the false impression that was the checksum which went-out onto the wire.

One could I suppose try to ammend the information passed to allow tcpdump to say
"oh, this was a tx packet on the same machine on which I am tracing so don't
worry about checksum mismatch" but I have to wonder if it is _really_ worth it.
Already someone has to deal with seeing TCP segments >> the MSS thanks to TSO.
(Actually tcpdump got rather confused about that too since the IP length of
those was 0, but IIRC we got that patched to use the length of zero as a "ah,
this was TSO so wing it" heuristic.)

rick jones

2007-11-01 21:58:40

by David Miller

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

From: Dave Johnson <[email protected]>
Date: Thu, 1 Nov 2007 17:36:22 -0400

> bad csum on tx packets as reported by tcpdump is also an issue.

We provide a tag to userspace that tcpdump should use to see that the
HW is going to checksum the packet, and therefore it should elide
trying to verify the checksums.

It's not a kernel issue.

2007-11-01 22:00:13

by David Miller

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

From: Rick Jones <[email protected]>
Date: Thu, 01 Nov 2007 14:48:45 -0700

> One could I suppose try to ammend the information passed to allow
> tcpdump to say "oh, this was a tx packet on the same machine on
> which I am tracing so don't worry about checksum mismatch"

We do this already!

2007-11-01 22:04:25

by Rick Jones

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

David Miller wrote:
> From: Rick Jones <[email protected]>
> Date: Thu, 01 Nov 2007 14:48:45 -0700
>
>
>>One could I suppose try to ammend the information passed to allow
>>tcpdump to say "oh, this was a tx packet on the same machine on
>>which I am tracing so don't worry about checksum mismatch"
>
>
> We do this already!

I'll try to go pester folks in tcpdump-workers then.

rick

2007-11-01 22:07:40

by David Miller

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

From: Rick Jones <[email protected]>
Date: Thu, 01 Nov 2007 15:04:12 -0700

> David Miller wrote:
> > From: Rick Jones <[email protected]>
> > Date: Thu, 01 Nov 2007 14:48:45 -0700
> >
> >
> >>One could I suppose try to ammend the information passed to allow
> >>tcpdump to say "oh, this was a tx packet on the same machine on
> >>which I am tracing so don't worry about checksum mismatch"
> >
> >
> > We do this already!
>
> I'll try to go pester folks in tcpdump-workers then.

The thing to check is "TP_STATUS_CSUMNOTREADY".

When using mmap(), it will be provided in the descriptor. When using
recvmsg() it will be provided via a PACKET_AUXDATA control message
when enabled via the PACKET_AUXDATA socket option.


2007-11-01 23:26:22

by Rick Jones

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

David Miller wrote:
> From: Rick Jones <[email protected]>
>>I'll try to go pester folks in tcpdump-workers then.
>
>
> The thing to check is "TP_STATUS_CSUMNOTREADY".
>
> When using mmap(), it will be provided in the descriptor. When using
> recvmsg() it will be provided via a PACKET_AUXDATA control message
> when enabled via the PACKET_AUXDATA socket option.

Figures... the "dailies" and "weeklies" for tar files of tcpdump and libpcap
source are fubar... again. I've email in to tcpdump-workers on that one. If
that isn't resolved quickly I'll learn how to access their CVS (pick an SCM, any
SCM...)

I did an apt-get of debian lenny's tcpdump and sources:

hpcpc103:~# tcpdump -V
tcpdump version 3.9.8
libpcap version 0.9.8

and that seems to show the false checksum failure and not use the
TP_STATUS_CSUMNOTREADY - at least that didn't appear in a grepping of the
sources. At first I thought it might be, but then I realized that my snaplen
was too short to get the whole TSO'ed frame so tcpdump wasn't even trying to
verify. After disabling TSO on the NIC, leaving CKO on, and making my snaplen >
1500 I could see it was doing undesirable stuff.

I'll see what top of trunk has at some point and what the folks there think of
adding-in a change.

rick jones

2007-11-02 18:08:48

by Dave Johnson

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

Dave Johnson writes:
> Ben Greear writes:
> > Currently, VLAN devices offer the ability to 'reorder' the header
> > and explicitly remove the VLAN header. I assume we keep this
> > feature and have the AF_PACKET logic check the device flags to see
> > if it should insert the VLAN header for hw-accel vlans?
> >
> > Either way, if we sniff the underlying device, we should always get
> > the VLAN header.
>
> Yes, but it's more than just a packet socket issue.
>
> A quick look through the hwaccel capable drivers (in 2.6.23) and most
> are doing something like:
>
> if (foo->vlgrp && packet_is_tagged)
> vlan_hwaccel_receive_skb(skb, foo->vlgrp, vlan_tag);
> else
> netif_receive_skb(skb);
>
> The important thing here is if the vlan group is NULL, the MAC must
> be configured to NOT strip the tag.
>
> users of NETIF_F_HW_VLAN_RX:
> ---------------------------
> ./drivers/net/8139cp.c: looks ok
> ./drivers/net/acenic.c: *1
> ./drivers/net/amd8111e.c: unsure, probably *1
> ./drivers/net/atl1/atl1_main.c: looks ok
> ./drivers/net/bnx2.c: *2
> ./drivers/net/bonding/bond_main.c: unsure, probably ok
> ./drivers/net/chelsio/cxgb2.c: looks ok
> ./drivers/net/cxgb3/cxgb3_main.c: looks ok
> ./drivers/net/e1000/e1000_main.c: looks ok
> ./drivers/net/ehea/ehea_main.c: unsure, probably ok
> ./drivers/net/forcedeth.c: looks ok
> ./drivers/net/gianfar.c: looks ok
> ./drivers/net/ixgb/ixgb_main.c: looks ok
> ./drivers/net/ns83820.c: unsure, probably ok
> ./drivers/net/r8169.c: looks ok
> ./drivers/net/s2io.c: *1
> ./drivers/net/sky2.c: looks ok
> ./drivers/net/starfire.c: unsure, probably ok
> ./drivers/net/tg3.c: *2
> ./drivers/net/typhoon.c: unsure, probably ok
> ./drivers/s390/net/qeth_main.c: unsure, probably ok
>
> *1: Driver configures the MAC to strip TAGs even if vlan group is
> NULL. MAC strips the tag, but driver calls netif_rx() or
> netif_receive_skb() with the packet as untagged. Kernel
> processes tagged packet as if it was received untagged. Possible
> security issue.
>
> *2: If chip supports 'ASF', tag is always stripped (see *1 above).
> Looks ok if ASF is not supported.

Michael,

These changes seems to cause this issue:

> [BNX2]: Fix VLAN on ASF
>
> Always set up the device to strip incoming VLAN tags when ASF is
> enabled. ASF firmware will not parse packets correctly if VLAN tags
> are not stripped.
>
> Signed-off-by: Michael Chan <[email protected]>
> Signed-off-by: David S. Miller <[email protected]>
>
> GIT: e29054f92d7d575631691865c1b95bee5bc974cc

and

> [email protected], 2003-12-02 02:34:13-08:00, [email protected] +1 -0
> [TG3]: Do not set RX_MODE_KEEP_VLAN_TAG when ASF is enabled.


Could you elaborate if this is really needed, if so is there some
workaround that could be done instead?

Simply removing the check seemed to work for me, but I'm unsure if
this is actually a valid thing to do with these MACs.



--
Dave Johnson
Starent Networks

2007-11-02 21:03:00

by Michael Chan

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

On Fri, 2007-11-02 at 14:08 -0400, Dave Johnson wrote:

> >
> > *2: If chip supports 'ASF', tag is always stripped (see *1 above).
> > Looks ok if ASF is not supported.
>
> Michael,
>
> These changes seems to cause this issue:
>
> > [BNX2]: Fix VLAN on ASF
> >
> > Always set up the device to strip incoming VLAN tags when ASF is
> > enabled. ASF firmware will not parse packets correctly if VLAN tags
> > are not stripped.
> >
> > Signed-off-by: Michael Chan <[email protected]>
> > Signed-off-by: David S. Miller <[email protected]>
> >
> > GIT: e29054f92d7d575631691865c1b95bee5bc974cc
>
> and
>
> > [email protected], 2003-12-02 02:34:13-08:00, [email protected] +1 -0
> > [TG3]: Do not set RX_MODE_KEEP_VLAN_TAG when ASF is enabled.
>
>
> Could you elaborate if this is really needed, if so is there some
> workaround that could be done instead?

This is needed for management firmware to work properly. Management
firmware expects any VLAN tags to be stripped. Unfortunately, VLAN
stripping cannot be done independently between the driver and the
firmware. The workaround is to disable management firmware.

>
> Simply removing the check seemed to work for me, but I'm unsure if
> this is actually a valid thing to do with these MACs.

Most of these on-board devices are shipped with management firmware
enabled. Removing the check will make the firmware not functional.

We realize this VLAN limitation is causing problems to many users and we
are looking for ways to address it.

2007-11-02 21:20:33

by David Miller

[permalink] [raw]
Subject: Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

From: Dave Johnson <[email protected]>
Date: Fri, 2 Nov 2007 14:08:32 -0400

> These changes seems to cause this issue:
>
> > [BNX2]: Fix VLAN on ASF
> >
> > Always set up the device to strip incoming VLAN tags when ASF is
> > enabled. ASF firmware will not parse packets correctly if VLAN tags
> > are not stripped.
> >
> > Signed-off-by: Michael Chan <[email protected]>
> > Signed-off-by: David S. Miller <[email protected]>
> >
> > GIT: e29054f92d7d575631691865c1b95bee5bc974cc
>
> and
>
> > [email protected], 2003-12-02 02:34:13-08:00, [email protected] +1 -0
> > [TG3]: Do not set RX_MODE_KEEP_VLAN_TAG when ASF is enabled.
>
>
> Could you elaborate if this is really needed, if so is there some
> workaround that could be done instead?
>
> Simply removing the check seemed to work for me, but I'm unsure if
> this is actually a valid thing to do with these MACs.

Unfortunately the ASF firmware is very picky.

I think were are stuck with this behavior.

2007-11-05 17:47:24

by Dave Johnson

[permalink] [raw]
Subject: [PATCH 1/2] NET: Re-add VLAN tag for devices incapable of keeping it


Some MACs are configured to remove VLAN tags on RX packets even if
the kernel isn't setup to use VLANs.

On these drivers it is bad to simply pass the packets with the VLAN
tag removed to the network stack. This gives the network stack the
impression these packets arrived from the network without a VLAN tag
even though they had one. This has 2 issues: 1, PF_PACKET sockets see
the packet without the VLAN tag when bound to the base device; and 2,
the L3 stacks will process these packets as if they had no tag to
begin with.

This patch changes vlan_hwaccel_rx() and vlan_hwaccel_receive_skb() to
accept a NULL vlan group. This simplifies the drivers as they can
call vlan_hwaccel_rx()/vlan_hwaccel_receive_skb() if the MAC
has removed the VLAN tag, or netif_rx()/netif_receive_skb() if the
MAC hasn't removed a VLAN tag (or the packet arrived untagged).

If the vlan group is NULL or no device for the received VID exists,
vlan_hwaccel_rx()/vlan_hwaccel_receive_skb() will now re-add a VLAN
tag and pass the packet to the base device with tag intact.

It is still preferable for the drivers to disable VLAN tag removal in
the hardware if no vlan group is configured. However for devices that
cannot do this the change will ensure tagged packets remain tagged in
the network stack.

Signed-off-by: Dave Johnson <[email protected]>

---

Note, __vlan_hwaccel_rx() needed to move below __vlan_put_tag() so the
change really isn't as big as it may look below.

===== include/linux/if_vlan.h 1.25 vs edited =====
--- 1.25/include/linux/if_vlan.h 2007-07-14 21:53:28 -04:00
+++ edited/include/linux/if_vlan.h 2007-11-03 14:26:43 -04:00
@@ -164,71 +164,6 @@
(VLAN_TX_SKB_CB(__skb)->magic == VLAN_TX_COOKIE_MAGIC)
#define vlan_tx_tag_get(__skb) (VLAN_TX_SKB_CB(__skb)->vlan_tag)

-/* VLAN rx hw acceleration helper. This acts like netif_{rx,receive_skb}(). */
-static inline int __vlan_hwaccel_rx(struct sk_buff *skb,
- struct vlan_group *grp,
- unsigned short vlan_tag, int polling)
-{
- struct net_device_stats *stats;
-
- if (skb_bond_should_drop(skb)) {
- dev_kfree_skb_any(skb);
- return NET_RX_DROP;
- }
-
- skb->dev = vlan_group_get_device(grp, vlan_tag & VLAN_VID_MASK);
- if (skb->dev == NULL) {
- dev_kfree_skb_any(skb);
-
- /* Not NET_RX_DROP, this is not being dropped
- * due to congestion.
- */
- return 0;
- }
-
- skb->dev->last_rx = jiffies;
-
- stats = vlan_dev_get_stats(skb->dev);
- stats->rx_packets++;
- stats->rx_bytes += skb->len;
-
- skb->priority = vlan_get_ingress_priority(skb->dev, vlan_tag);
- switch (skb->pkt_type) {
- case PACKET_BROADCAST:
- break;
-
- case PACKET_MULTICAST:
- stats->multicast++;
- break;
-
- case PACKET_OTHERHOST:
- /* Our lower layer thinks this is not local, let's make sure.
- * This allows the VLAN to have a different MAC than the underlying
- * device, and still route correctly.
- */
- if (!compare_ether_addr(eth_hdr(skb)->h_dest,
- skb->dev->dev_addr))
- skb->pkt_type = PACKET_HOST;
- break;
- };
-
- return (polling ? netif_receive_skb(skb) : netif_rx(skb));
-}
-
-static inline int vlan_hwaccel_rx(struct sk_buff *skb,
- struct vlan_group *grp,
- unsigned short vlan_tag)
-{
- return __vlan_hwaccel_rx(skb, grp, vlan_tag, 0);
-}
-
-static inline int vlan_hwaccel_receive_skb(struct sk_buff *skb,
- struct vlan_group *grp,
- unsigned short vlan_tag)
-{
- return __vlan_hwaccel_rx(skb, grp, vlan_tag, 1);
-}
-
/**
* __vlan_put_tag - regular VLAN tag inserting
* @skb: skbuff to tag
@@ -243,10 +178,19 @@
static inline struct sk_buff *__vlan_put_tag(struct sk_buff *skb, unsigned short tag)
{
struct vlan_ethhdr *veth;
+ int new_mac_offset;
+
+ if (skb_mac_header_was_set(skb))
+ /* if the MAC header is not at skb->data we need to push to that
+ * place then pull back before returning to the caller.
+ */
+ new_mac_offset = VLAN_HLEN + skb->data - skb_mac_header(skb);
+ else
+ new_mac_offset = VLAN_HLEN;

- if (skb_headroom(skb) < VLAN_HLEN) {
+ if (skb_headroom(skb) < new_mac_offset) {
struct sk_buff *sk_tmp = skb;
- skb = skb_realloc_headroom(sk_tmp, VLAN_HLEN);
+ skb = skb_realloc_headroom(sk_tmp, new_mac_offset);
kfree_skb(sk_tmp);
if (!skb) {
printk(KERN_ERR "vlan: failed to realloc headroom\n");
@@ -260,7 +204,7 @@
}
}

- veth = (struct vlan_ethhdr *)skb_push(skb, VLAN_HLEN);
+ veth = (struct vlan_ethhdr *)skb_push(skb, new_mac_offset);

/* Move the mac addresses to the beginning of the new header. */
memmove(skb->data, skb->data + VLAN_HLEN, 2 * VLAN_ETH_ALEN);
@@ -275,6 +219,8 @@
skb->mac_header -= VLAN_HLEN;
skb->network_header -= VLAN_HLEN;

+ /* back to where we started, minus the new vlan header */
+ skb_pull(skb, new_mac_offset-VLAN_HLEN);
return skb;
}

@@ -372,6 +318,98 @@
} else {
return __vlan_get_tag(skb, tag);
}
+}
+
+/* VLAN rx hw acceleration helper. This acts like netif_{rx,receive_skb}(). */
+static inline int __vlan_hwaccel_rx(struct sk_buff *skb,
+ struct vlan_group *grp,
+ unsigned short vlan_tag, int polling)
+{
+ struct net_device_stats *stats;
+ struct net_device *new_dev;
+
+ if (skb_bond_should_drop(skb)) {
+ dev_kfree_skb_any(skb);
+ return NET_RX_DROP;
+ }
+
+ if (grp)
+ new_dev = vlan_group_get_device(grp, vlan_tag & VLAN_VID_MASK);
+
+ if (!grp || !new_dev) {
+ /* Driver removed vlan tag from the packet, but we have no
+ * vlan device for this VID.
+ *
+ * This can occur for 2 reasons:
+ * 1) Have a vlan group, we want some VIDs, just not this VID.
+ * 2) Have no vlan group, but hardware limitation requires it
+ * to remove vlan tags anyway.
+ *
+ * Normally if no vlan group is configured, the underlying
+ * driver will configure the hardware to NOT strip out the
+ * vlan tag. It can then call netif_receive_skb/netif_rx
+ * with the vlan tag still intact. However some devices
+ * cannot be configured to keep the vlan tag and must not
+ * call netif_receive_skb/netif_rx with the vlan tag
+ * missing. This would allow the normal protocol handlers
+ * to process this tagged packet as if it arived without a
+ * vlan tag.
+ *
+ * We need to re-add the TAG and hand the packet off to the
+ * base device with the TAG present so any protocol handlers
+ * (PF_PACKET, etc...) will get a chance to see it.
+ */
+
+ skb = __vlan_put_tag(skb, vlan_tag);
+
+ if (likely(skb))
+ return (polling ? netif_receive_skb(skb) : netif_rx(skb));
+ else
+ return NET_RX_DROP;
+ }
+
+ skb->dev = new_dev;
+ skb->dev->last_rx = jiffies;
+
+ stats = vlan_dev_get_stats(skb->dev);
+ stats->rx_packets++;
+ stats->rx_bytes += skb->len;
+
+ skb->priority = vlan_get_ingress_priority(skb->dev, vlan_tag);
+ switch (skb->pkt_type) {
+ case PACKET_BROADCAST:
+ break;
+
+ case PACKET_MULTICAST:
+ stats->multicast++;
+ break;
+
+ case PACKET_OTHERHOST:
+ /* Our lower layer thinks this is not local, let's make sure.
+ * This allows the VLAN to have a different MAC than the underlying
+ * device, and still route correctly.
+ */
+ if (!compare_ether_addr(eth_hdr(skb)->h_dest,
+ skb->dev->dev_addr))
+ skb->pkt_type = PACKET_HOST;
+ break;
+ };
+
+ return (polling ? netif_receive_skb(skb) : netif_rx(skb));
+}
+
+static inline int vlan_hwaccel_rx(struct sk_buff *skb,
+ struct vlan_group *grp,
+ unsigned short vlan_tag)
+{
+ return __vlan_hwaccel_rx(skb, grp, vlan_tag, 0);
+}
+
+static inline int vlan_hwaccel_receive_skb(struct sk_buff *skb,
+ struct vlan_group *grp,
+ unsigned short vlan_tag)
+{
+ return __vlan_hwaccel_rx(skb, grp, vlan_tag, 1);
}

#endif /* __KERNEL__ */

2007-11-05 17:47:39

by Dave Johnson

[permalink] [raw]
Subject: [PATCH 2/2] NET: Re-add VLAN tag for devices incapable of keeping it


This patch changes the following drivers to use vlan_hwaccel_rx()
and/or vlan_hwaccel_receive_skb() with a NULL vlan group if needed as
they are not setup to dynamically enable/disable vlan removal in
their MAC based on the vlan group:

drivers/net/tg3.c Tested on BCM5704, looks good
drivers/net/bnx2.c Tested on BCM5708, looks good
drivers/net/acenic.c Not tested, I don't have one of these
drivers/net/s2io.c Not tested, I don't have one of these

In addition, these drivers might also need changes, but should
probably be done by the maintainer as it's not clear exactly what
change is needed:

drivers/net/amd8111e.c
drivers/net/cxgb3/*

Signed-off-by: Dave Johnson <[email protected]>

===== drivers/net/acenic.c 1.77 vs edited =====
--- 1.77/drivers/net/acenic.c 2007-07-24 16:28:41 -04:00
+++ edited/drivers/net/acenic.c 2007-11-03 12:27:40 -04:00
@@ -2036,7 +2036,7 @@

/* send it up */
#if ACENIC_DO_VLAN
- if (ap->vlgrp && (bd_flags & BD_FLG_VLAN_TAG)) {
+ if (bd_flags & BD_FLG_VLAN_TAG) {
vlan_hwaccel_rx(skb, ap->vlgrp, retdesc->vlan);
} else
#endif
===== drivers/net/bnx2.c 1.135 vs edited =====
--- 1.135/drivers/net/bnx2.c 2007-09-20 15:14:21 -04:00
+++ edited/drivers/net/bnx2.c 2007-11-05 09:34:26 -05:00
@@ -2493,7 +2493,8 @@
}

#ifdef BCM_VLAN
- if ((status & L2_FHDR_STATUS_L2_VLAN_TAG) && (bp->vlgrp != 0)) {
+ if ((status & L2_FHDR_STATUS_L2_VLAN_TAG) &&
+ (bp->vlgrp || (bp->flags & ASF_ENABLE_FLAG))) {
vlan_hwaccel_receive_skb(skb, bp->vlgrp,
rx_hdr->l2_fhdr_vlan_tag);
}
===== drivers/net/s2io.c 1.124 vs edited =====
--- 1.124/drivers/net/s2io.c 2007-08-03 18:10:44 -04:00
+++ edited/drivers/net/s2io.c 2007-11-03 12:29:09 -04:00
@@ -6834,8 +6834,7 @@
sp->mac_control.stats_info->sw_stat.mem_freed += skb->truesize;
if (!sp->lro) {
skb->protocol = eth_type_trans(skb, dev);
- if ((sp->vlgrp && RXD_GET_VLAN_TAG(rxdp->Control_2) &&
- vlan_strip_flag)) {
+ if (RXD_GET_VLAN_TAG(rxdp->Control_2) && vlan_strip_flag) {
/* Queueing the vlan frame to the upper layer */
if (napi)
vlan_hwaccel_receive_skb(skb, sp->vlgrp,
===== drivers/net/tg3.c 1.523 vs edited =====
--- 1.523/drivers/net/tg3.c 2007-09-11 04:28:44 -04:00
+++ edited/drivers/net/tg3.c 2007-11-05 09:38:33 -05:00
@@ -3417,7 +3417,7 @@

skb->protocol = eth_type_trans(skb, tp->dev);
#if TG3_VLAN_TAG_USED
- if (tp->vlgrp != NULL &&
+ if ((tp->vlgrp || (tp->tg3_flags & TG3_FLAG_ENABLE_ASF)) &&
desc->type_flags & RXD_FLAG_VLAN) {
tg3_vlan_rx(tp, skb,
desc->err_vlan & RXD_VLAN_MASK);

2007-11-05 18:00:46

by Patrick McHardy

[permalink] [raw]
Subject: Re: [PATCH 1/2] NET: Re-add VLAN tag for devices incapable of keeping it

Dave Johnson wrote:
> +/* VLAN rx hw acceleration helper. This acts like netif_{rx,receive_skb}(). */
> +static inline int __vlan_hwaccel_rx(struct sk_buff *skb,
> + struct vlan_group *grp,
> + unsigned short vlan_tag, int polling)
> +{
> ...
> + /* Driver removed vlan tag from the packet, but we have no
> + * vlan device for this VID.
> + *
> + * This can occur for 2 reasons:
> + * 1) Have a vlan group, we want some VIDs, just not this VID.
> + * 2) Have no vlan group, but hardware limitation requires it
> + * to remove vlan tags anyway.
> + *
> + * Normally if no vlan group is configured, the underlying
> + * driver will configure the hardware to NOT strip out the
> + * vlan tag. It can then call netif_receive_skb/netif_rx
> + * with the vlan tag still intact. However some devices
> + * cannot be configured to keep the vlan tag and must not
> + * call netif_receive_skb/netif_rx with the vlan tag
> + * missing. This would allow the normal protocol handlers
> + * to process this tagged packet as if it arived without a
> + * vlan tag.
> + *
> + * We need to re-add the TAG and hand the packet off to the
> + * base device with the TAG present so any protocol handlers
> + * (PF_PACKET, etc...) will get a chance to see it.
> + */


This looks like a rather expensive operation for the unlikely case
that packets will be received by a packet socket. IMO it should only
be reconstructed if actually needed, by af_packet itself.

As we discussed some time back storing the VLAN tag in the CB on
TX clashes with other users of the CB like qdiscs, so we need a
new field in the skb for this anyway.

2007-11-05 23:15:55

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/2] NET: Re-add VLAN tag for devices incapable of keeping it

From: Patrick McHardy <[email protected]>
Date: Mon, 05 Nov 2007 19:00:19 +0100

> This looks like a rather expensive operation for the unlikely case
> that packets will be received by a packet socket. IMO it should only
> be reconstructed if actually needed, by af_packet itself.

Completely agreed. We should not do this by default when %99
of the networking stack simply does not care about this.

> As we discussed some time back storing the VLAN tag in the CB on
> TX clashes with other users of the CB like qdiscs, so we need a
> new field in the skb for this anyway.

Someone will have to find a way to remove some other fields in
sk_buff before I'm going to allow more space to be eaten up
by this completely fringe case feature.

2007-11-06 00:21:37

by Patrick McHardy

[permalink] [raw]
Subject: Re: [PATCH 1/2] NET: Re-add VLAN tag for devices incapable of keeping it

David Miller wrote:
> From: Patrick McHardy <[email protected]>
> Date: Mon, 05 Nov 2007 19:00:19 +0100
>
>> This looks like a rather expensive operation for the unlikely case
>> that packets will be received by a packet socket. IMO it should only
>> be reconstructed if actually needed, by af_packet itself.
>
> Completely agreed. We should not do this by default when %99
> of the networking stack simply does not care about this.


I think there is one more case that matters, which is briding
from a device with VLAN stripping for a VLAN not configured
locally. The tag will be stripped and will be lost for forwarded
packets. But I'm not exactly sure this really can be configured
(time for bed so I'll check tommorrow).

>> As we discussed some time back storing the VLAN tag in the CB on
>> TX clashes with other users of the CB like qdiscs, so we need a
>> new field in the skb for this anyway.
>
> Someone will have to find a way to remove some other fields in
> sk_buff before I'm going to allow more space to be eaten up
> by this completely fringe case feature.


We have a two byte hole after tc_verd where we could fit this in.
But I'm pretty sure we also could reuse some other fields on input,
like queue_mapping or maybe even destructor for unowned skbs.

2007-11-06 00:36:04

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/2] NET: Re-add VLAN tag for devices incapable of keeping it

From: Patrick McHardy <[email protected]>
Date: Tue, 06 Nov 2007 01:21:09 +0100

> David Miller wrote:
> > From: Patrick McHardy <[email protected]>
> > Date: Mon, 05 Nov 2007 19:00:19 +0100
> >
> >> This looks like a rather expensive operation for the unlikely case
> >> that packets will be received by a packet socket. IMO it should only
> >> be reconstructed if actually needed, by af_packet itself.
> >
> > Completely agreed. We should not do this by default when %99
> > of the networking stack simply does not care about this.
>
>
> I think there is one more case that matters, which is briding
> from a device with VLAN stripping for a VLAN not configured
> locally. The tag will be stripped and will be lost for forwarded
> packets. But I'm not exactly sure this really can be configured
> (time for bed so I'll check tommorrow).

If so then when such rules are loaded, just like PF_PACKET, it can set
the indication to start reconstituting VLAN headers stripped by HW.

> >> As we discussed some time back storing the VLAN tag in the CB on
> >> TX clashes with other users of the CB like qdiscs, so we need a
> >> new field in the skb for this anyway.
> >
> > Someone will have to find a way to remove some other fields in
> > sk_buff before I'm going to allow more space to be eaten up
> > by this completely fringe case feature.
>
> We have a two byte hole after tc_verd where we could fit this in.
> But I'm pretty sure we also could reuse some other fields on input,
> like queue_mapping or maybe even destructor for unowned skbs.

Two bytes is enough, so if there is a hole we can use it.

2007-11-06 02:54:54

by Ramkrishna Vepa

[permalink] [raw]
Subject: RE: [PATCH 2/2] NET: Re-add VLAN tag for devices incapable of keeping it

The Xframe (S2io) adapter can be programmed dynamically to either,
always strip the vlan tag or not. In this case, if the vlan group is
NULL, it can be programmed at run time to NOT strip the vlan tag.

When a packet with a Vlan id is received that is not added to the group,
should it be dropped or indicated up? Either can be handled by the
hardware. The vlan tag in this particular case will be stripped as the
vlan group is not NULL.

Ram

> -----Original Message-----
> From: Dave Johnson
[mailto:[email protected]]
> Sent: Monday, November 05, 2007 9:47 AM
> To: David Miller; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: [PATCH 2/2] NET: Re-add VLAN tag for devices incapable of
keeping
> it
>
>
> This patch changes the following drivers to use vlan_hwaccel_rx()
> and/or vlan_hwaccel_receive_skb() with a NULL vlan group if needed as
> they are not setup to dynamically enable/disable vlan removal in
> their MAC based on the vlan group:
>
> drivers/net/tg3.c Tested on BCM5704, looks good
> drivers/net/bnx2.c Tested on BCM5708, looks good
> drivers/net/acenic.c Not tested, I don't have one of these
> drivers/net/s2io.c Not tested, I don't have one of these
>
> In addition, these drivers might also need changes, but should
> probably be done by the maintainer as it's not clear exactly what
> change is needed:
>
> drivers/net/amd8111e.c
> drivers/net/cxgb3/*
>
> Signed-off-by: Dave Johnson <[email protected]>
>
> ===== drivers/net/acenic.c 1.77 vs edited =====
> --- 1.77/drivers/net/acenic.c 2007-07-24 16:28:41 -04:00
> +++ edited/drivers/net/acenic.c 2007-11-03 12:27:40 -04:00
> @@ -2036,7 +2036,7 @@
>
> /* send it up */
> #if ACENIC_DO_VLAN
> - if (ap->vlgrp && (bd_flags & BD_FLG_VLAN_TAG)) {
> + if (bd_flags & BD_FLG_VLAN_TAG) {
> vlan_hwaccel_rx(skb, ap->vlgrp, retdesc->vlan);
> } else
> #endif
> ===== drivers/net/bnx2.c 1.135 vs edited =====
> --- 1.135/drivers/net/bnx2.c 2007-09-20 15:14:21 -04:00
> +++ edited/drivers/net/bnx2.c 2007-11-05 09:34:26 -05:00
> @@ -2493,7 +2493,8 @@
> }
>
> #ifdef BCM_VLAN
> - if ((status & L2_FHDR_STATUS_L2_VLAN_TAG) && (bp->vlgrp
!= 0))
> {
> + if ((status & L2_FHDR_STATUS_L2_VLAN_TAG) &&
> + (bp->vlgrp || (bp->flags & ASF_ENABLE_FLAG))) {
> vlan_hwaccel_receive_skb(skb, bp->vlgrp,
> rx_hdr->l2_fhdr_vlan_tag);
> }
> ===== drivers/net/s2io.c 1.124 vs edited =====
> --- 1.124/drivers/net/s2io.c 2007-08-03 18:10:44 -04:00
> +++ edited/drivers/net/s2io.c 2007-11-03 12:29:09 -04:00
> @@ -6834,8 +6834,7 @@
> sp->mac_control.stats_info->sw_stat.mem_freed += skb->truesize;
> if (!sp->lro) {
> skb->protocol = eth_type_trans(skb, dev);
> - if ((sp->vlgrp && RXD_GET_VLAN_TAG(rxdp->Control_2) &&
> - vlan_strip_flag)) {
> + if (RXD_GET_VLAN_TAG(rxdp->Control_2) &&
vlan_strip_flag) {
> /* Queueing the vlan frame to the upper layer */
> if (napi)
> vlan_hwaccel_receive_skb(skb, sp->vlgrp,
> ===== drivers/net/tg3.c 1.523 vs edited =====
> --- 1.523/drivers/net/tg3.c 2007-09-11 04:28:44 -04:00
> +++ edited/drivers/net/tg3.c 2007-11-05 09:38:33 -05:00
> @@ -3417,7 +3417,7 @@
>
> skb->protocol = eth_type_trans(skb, tp->dev);
> #if TG3_VLAN_TAG_USED
> - if (tp->vlgrp != NULL &&
> + if ((tp->vlgrp || (tp->tg3_flags & TG3_FLAG_ENABLE_ASF))
&&
> desc->type_flags & RXD_FLAG_VLAN) {
> tg3_vlan_rx(tp, skb,
> desc->err_vlan & RXD_VLAN_MASK);

2007-11-06 18:03:32

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: [PATCH 1/2] NET: Re-add VLAN tag for devices incapable of keeping it

Patrick McHardy <[email protected]> writes:

> I think there is one more case that matters, which is briding
> from a device with VLAN stripping for a VLAN not configured
> locally. The tag will be stripped and will be lost for forwarded
> packets.

I think we should drop such packets on RX. Anyway we shouldn't
forward them.
--
Krzysztof Halasa

2007-11-06 18:29:17

by Ramkrishna Vepa

[permalink] [raw]
Subject: RE: [PATCH 2/2] NET: Re-add VLAN tag for devices incapable of keeping it

Dave,

If you can remove the patch for the s2io driver, we can submit a patch
that can dynamically program Xframe to strip or not strip the vlan tag
based on whether the vlan group is not NULL or NULL respectively. For
this, we have to modify the initialization as well as the vlan
registration.

With regards to the question on the Vlan id received not added to the
group, I think we should drop these packets. This change requires
additional changes to support multiple receive rings. But will add it to
the driver on our website, which supports multiple rx rings, tx fifos,
multiqueue etc, and is a very small incremental change.

Thanks,
Ram

> -----Original Message-----
> From: Ramkrishna Vepa [mailto:[email protected]]
> Sent: Monday, November 05, 2007 6:40 PM
> To: Dave Johnson; David Miller; [email protected];
> [email protected]; [email protected];
[email protected];
> [email protected]; [email protected]
> Subject: RE: [PATCH 2/2] NET: Re-add VLAN tag for devices incapable of
> keeping it
>
> The Xframe (S2io) adapter can be programmed dynamically to either,
> always strip the vlan tag or not. In this case, if the vlan group is
> NULL, it can be programmed at run time to NOT strip the vlan tag.
>
> When a packet with a Vlan id is received that is not added to the
group,
> should it be dropped or indicated up? Either can be handled by the
> hardware. The vlan tag in this particular case will be stripped as the
> vlan group is not NULL.
>
> Ram
>
> > -----Original Message-----
> > From: Dave Johnson
> [mailto:[email protected]]
> > Sent: Monday, November 05, 2007 9:47 AM
> > To: David Miller; [email protected]; [email protected];
> > [email protected]; [email protected];
> > [email protected]; [email protected]
> > Subject: [PATCH 2/2] NET: Re-add VLAN tag for devices incapable of
> keeping
> > it
> >
> >
> > This patch changes the following drivers to use vlan_hwaccel_rx()
> > and/or vlan_hwaccel_receive_skb() with a NULL vlan group if needed
as
> > they are not setup to dynamically enable/disable vlan removal in
> > their MAC based on the vlan group:
> >
> > drivers/net/tg3.c Tested on BCM5704, looks good
> > drivers/net/bnx2.c Tested on BCM5708, looks good
> > drivers/net/acenic.c Not tested, I don't have one of these
> > drivers/net/s2io.c Not tested, I don't have one of these
> >
> > In addition, these drivers might also need changes, but should
> > probably be done by the maintainer as it's not clear exactly what
> > change is needed:
> >
> > drivers/net/amd8111e.c
> > drivers/net/cxgb3/*
> >
> > Signed-off-by: Dave Johnson <[email protected]>
> >
> > ===== drivers/net/acenic.c 1.77 vs edited =====
> > --- 1.77/drivers/net/acenic.c 2007-07-24 16:28:41 -04:00
> > +++ edited/drivers/net/acenic.c 2007-11-03 12:27:40 -04:00
> > @@ -2036,7 +2036,7 @@
> >
> > /* send it up */
> > #if ACENIC_DO_VLAN
> > - if (ap->vlgrp && (bd_flags & BD_FLG_VLAN_TAG)) {
> > + if (bd_flags & BD_FLG_VLAN_TAG) {
> > vlan_hwaccel_rx(skb, ap->vlgrp, retdesc->vlan);
> > } else
> > #endif
> > ===== drivers/net/bnx2.c 1.135 vs edited =====
> > --- 1.135/drivers/net/bnx2.c 2007-09-20 15:14:21 -04:00
> > +++ edited/drivers/net/bnx2.c 2007-11-05 09:34:26 -05:00
> > @@ -2493,7 +2493,8 @@
> > }
> >
> > #ifdef BCM_VLAN
> > - if ((status & L2_FHDR_STATUS_L2_VLAN_TAG) && (bp->vlgrp
> != 0))
> > {
> > + if ((status & L2_FHDR_STATUS_L2_VLAN_TAG) &&
> > + (bp->vlgrp || (bp->flags & ASF_ENABLE_FLAG))) {
> > vlan_hwaccel_receive_skb(skb, bp->vlgrp,
> > rx_hdr->l2_fhdr_vlan_tag);
> > }
> > ===== drivers/net/s2io.c 1.124 vs edited =====
> > --- 1.124/drivers/net/s2io.c 2007-08-03 18:10:44 -04:00
> > +++ edited/drivers/net/s2io.c 2007-11-03 12:29:09 -04:00
> > @@ -6834,8 +6834,7 @@
> > sp->mac_control.stats_info->sw_stat.mem_freed += skb->truesize;
> > if (!sp->lro) {
> > skb->protocol = eth_type_trans(skb, dev);
> > - if ((sp->vlgrp && RXD_GET_VLAN_TAG(rxdp->Control_2) &&
> > - vlan_strip_flag)) {
> > + if (RXD_GET_VLAN_TAG(rxdp->Control_2) &&
> vlan_strip_flag) {
> > /* Queueing the vlan frame to the upper layer */
> > if (napi)
> > vlan_hwaccel_receive_skb(skb, sp->vlgrp,
> > ===== drivers/net/tg3.c 1.523 vs edited =====
> > --- 1.523/drivers/net/tg3.c 2007-09-11 04:28:44 -04:00
> > +++ edited/drivers/net/tg3.c 2007-11-05 09:38:33 -05:00
> > @@ -3417,7 +3417,7 @@
> >
> > skb->protocol = eth_type_trans(skb, tp->dev);
> > #if TG3_VLAN_TAG_USED
> > - if (tp->vlgrp != NULL &&
> > + if ((tp->vlgrp || (tp->tg3_flags & TG3_FLAG_ENABLE_ASF))
> &&
> > desc->type_flags & RXD_FLAG_VLAN) {
> > tg3_vlan_rx(tp, skb,
> > desc->err_vlan & RXD_VLAN_MASK);

2007-11-06 18:34:42

by Dave Johnson

[permalink] [raw]
Subject: RE: [PATCH 2/2] NET: Re-add VLAN tag for devices incapable of keeping it

Ramkrishna Vepa writes:
> Dave,
>
> If you can remove the patch for the s2io driver, we can submit a patch
> that can dynamically program Xframe to strip or not strip the vlan tag
> based on whether the vlan group is not NULL or NULL respectively. For
> this, we have to modify the initialization as well as the vlan
> registration.

Sure, not having this hardware I didn't want to attempt a complicated
change. I'll let you take care of this.

--
Dave Johnson
Starent Networks

2007-11-06 19:01:54

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH 1/2] NET: Re-add VLAN tag for devices incapable of keeping it

Krzysztof Halasa wrote:
> Patrick McHardy <[email protected]> writes:
>
>> I think there is one more case that matters, which is briding
>> from a device with VLAN stripping for a VLAN not configured
>> locally. The tag will be stripped and will be lost for forwarded
>> packets.
>
> I think we should drop such packets on RX. Anyway we shouldn't
> forward them.

Bridging eth0 to eth1 should not pay attention to VLAN tags
at all (if the pkt comes in on VLAN 7, it should go out on VLAN 7),
in my opinion. If the NIC is stripping the VLAN header, then this
cannot work unless something re-builds the VLAN header. If the stripped
VLAN header is placed into the skb, then any code that does need to
rebuild it can do so. It may be less efficient, but users can just
not use that NIC hardware for high-end solutions, and at any rate,
less efficient is better than broken.

Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2007-11-06 20:08:34

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: [PATCH 1/2] NET: Re-add VLAN tag for devices incapable of keeping it

Ben Greear <[email protected]> writes:

> Bridging eth0 to eth1 should not pay attention to VLAN tags
> at all (if the pkt comes in on VLAN 7, it should go out on VLAN 7),
> in my opinion. If the NIC is stripping the VLAN header, then this
> cannot work unless something re-builds the VLAN header. If the stripped
> VLAN header is placed into the skb, then any code that does need to
> rebuild it can do so. It may be less efficient, but users can just
> not use that NIC hardware for high-end solutions, and at any rate,
> less efficient is better than broken.

That is all true. The problem arises when you receive a tagged frame
on eth0, the chip removes the tag, and then the bridge sends it
out untagged on eth1.

I think there are two valid models for VLAN + bridging:

a) bridging works on "physical" interfaces, all tags are transmitted
unchanged.

b) every VLAN is a different logical interface, packets from unknown
VLANs are dropped on RX (and thus don't show up anywhere, except
counters), bridging uses logical interfaces. VLAN 100 on eth0 may
become VLAN 200 on eth1 and may be untagged on eth2.


"a" requires "soft" VLANs and/or adding the tags back (with
accelerated VLANs). This is how unmanaged switches labeled
"802.1Q - transparent" work. Not very flexible but usually good
enough.

"b" is how switches supporting VLANs (and 802.1Q) usually work.


I know ability to see exactly all packets as they are received
(including tags) is a really nice thing. But maybe we should change
the model? Making the ethX only carry untagged frames (even without
hw VLAN acceleration)?
--
Krzysztof Halasa

2007-11-06 23:55:27

by Patrick McHardy

[permalink] [raw]
Subject: Re: [PATCH 1/2] NET: Re-add VLAN tag for devices incapable of keeping it

Krzysztof Halasa wrote:
> Patrick McHardy <[email protected]> writes:
>
>> I think there is one more case that matters, which is briding
>> from a device with VLAN stripping for a VLAN not configured
>> locally. The tag will be stripped and will be lost for forwarded
>> packets.
>
> I think we should drop such packets on RX. Anyway we shouldn't
> forward them.

I believe you misunderstand the configuration I was thinking about:

eth0.1000 br0
| |
eth0 with VLAN stripping

Its slightly different that I thought though, __vlan_hwaccel_rx
drops packets for unknown vids, so as soon as you configure a VLAN
locally on the same device that is used for a bridge, no VLAN
packets will be forwarded at all.

Slightly inconsistent with how other layered devices work, but
can be avoided by adding the VLAN to br0.