2023-01-10 19:41:38

by Hervé Boisse

[permalink] [raw]
Subject: [PATCH net 1/2] net/af_packet: fix tx skb protocol on SOCK_PACKET sockets

Commit 75c65772c3d1 ("net/packet: Ask driver for protocol if not provided
by user") introduces packet_parse_headers() to extract protocol for
SOCK_RAW sockets.
But, SOCK_PACKET sockets which provide similar behaviour are not considered
so far and packets sent by those sockets will have their protocol unset.

Extract the skb protocol value from the packet for SOCK_PACKET sockets, as
currently done for SOCK_RAW sockets.

Fixes: 75c65772c3d1 ("net/packet: Ask driver for protocol if not provided by user")
Signed-off-by: Hervé Boisse <[email protected]>
---
net/packet/af_packet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index b5ab98ca2511..c18274f65b17 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1928,7 +1928,7 @@ static void packet_parse_headers(struct sk_buff *skb, struct socket *sock)
int depth;

if ((!skb->protocol || skb->protocol == htons(ETH_P_ALL)) &&
- sock->type == SOCK_RAW) {
+ (sock->type == SOCK_RAW || sock->type == SOCK_PACKET)) {
skb_reset_mac_header(skb);
skb->protocol = dev_parse_header_protocol(skb);
}
--
2.38.2


2023-01-10 19:58:27

by Hervé Boisse

[permalink] [raw]
Subject: [PATCH net 2/2] net/af_packet: fix tx skb network header on SOCK_RAW sockets over VLAN device

When an application sends a packet on a SOCK_RAW socket over a VLAN device,
there is a possibility that the skb network header is incorrectly set.

The issue happens when the device used to send the packet is a VLAN device
whose underlying device has no VLAN tx hardware offloading support.
In that case, the VLAN driver reports a LL header size increased by 4 bytes
to take into account the tag that will be added in software.

However, the socket user has no clue about that and still provides a normal
LL header without tag.
This results in the network header of the skb being shifted 4 bytes too far
in the packet. This shift makes tc classifiers fail as they point to
incorrect data.

Move network header right after underlying VLAN device LL header size
without tag, regardless of hardware offloading support. That is, the
expected LL header size from socket user.

Signed-off-by: Hervé Boisse <[email protected]>
---
net/packet/af_packet.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c18274f65b17..be892fd498a6 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1933,6 +1933,18 @@ static void packet_parse_headers(struct sk_buff *skb, struct socket *sock)
skb->protocol = dev_parse_header_protocol(skb);
}

+ /* VLAN device may report bigger LL header size due to reserved room for
+ * tag on devices without hardware offloading support
+ */
+ if (is_vlan_dev(skb->dev) &&
+ (sock->type == SOCK_RAW || sock->type == SOCK_PACKET)) {
+ struct net_device *real_dev = vlan_dev_real_dev(skb->dev);
+
+ depth = real_dev->hard_header_len;
+ if (pskb_may_pull(skb, depth))
+ skb_set_network_header(skb, depth);
+ }
+
/* Move network header to the right position for VLAN tagged packets */
if (likely(skb->dev->type == ARPHRD_ETHER) &&
eth_type_vlan(skb->protocol) &&
--
2.38.2

2023-01-12 13:22:37

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH net 2/2] net/af_packet: fix tx skb network header on SOCK_RAW sockets over VLAN device

Hello,

On Tue, 2023-01-10 at 20:17 +0100, Hervé Boisse wrote:
> When an application sends a packet on a SOCK_RAW socket over a VLAN device,
> there is a possibility that the skb network header is incorrectly set.
>
> The issue happens when the device used to send the packet is a VLAN device
> whose underlying device has no VLAN tx hardware offloading support.
> In that case, the VLAN driver reports a LL header size increased by 4 bytes
> to take into account the tag that will be added in software.
>
> However, the socket user has no clue about that and still provides a normal
> LL header without tag.
> This results in the network header of the skb being shifted 4 bytes too far
> in the packet. This shift makes tc classifiers fail as they point to
> incorrect data.

I'm unsure I read correctly the use case: teh user-space application is
providing an L2 header and is expecting the Linux stack to add a vlan
tag? Or the linux application is sending packets on top of a vlan
device and desire no tag on the egress packet? or something else?

(Note: I think that in the fisrt 2 cases above the fix belong to the
user-space application).

Thanks!

Paolo

2023-01-12 15:56:43

by Hervé Boisse

[permalink] [raw]
Subject: Re: [PATCH net 2/2] net/af_packet: fix tx skb network header on SOCK_RAW sockets over VLAN device


Hello,

On Thu, Jan 12, 2023 at 01:48:51PM +0100, Paolo Abeni wrote:
> I'm unsure I read correctly the use case: teh user-space application is
> providing an L2 header and is expecting the Linux stack to add a vlan
> tag? Or the linux application is sending packets on top of a vlan
> device and desire no tag on the egress packet? or something else?

The userland app does not care about the device being a VLAN one or not. Just a regular Ethernet device on which to send raw frames.
This means the app provides a standard 14 byte Ethernet header on the socket and does not matter about any VLAN tag.

Then, the goal is to be able to alter those packets at the qdisc level with tc filters.
But, when such packets are sent on top of a VLAN device whose real device does not support VLAN tx offloading, the bad position of the skb network header makes this task impossible.

To give a concrete example, here are few commands to show the problem easily:

# modprobe dummy
# ip link add link dummy0 dummy0.832 type vlan id 832
# tc qdisc replace dev dummy0.832 root handle 1: prio
# tc filter del dev dummy0.832
# tc filter add dev dummy0.832 parent 1: prio 1 protocol ip u32 match u8 0 0 action pedit pedit munge ip tos set 0xc0
# ip link set dummy0 up
# ip link set dummy0.832 up

Then start an application that uses AF_PACKET+SOCK_RAW sockets over the VLAN device:

# dhclient -v dummy0.832

If you look at the emitted packets on dummy0, you will see that the 0xc0 byte of the IPv4 TOS/DSCP field is not set.
Instead, the 0xc0 tos byte is written 4 bytes too far, in the last byte of the IPv4 Identification field.


Hervé

2023-01-12 16:05:55

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH net 2/2] net/af_packet: fix tx skb network header on SOCK_RAW sockets over VLAN device

On Thu, 2023-01-12 at 16:27 +0100, Hervé Boisse wrote:
> On Thu, Jan 12, 2023 at 01:48:51PM +0100, Paolo Abeni wrote:
> > I'm unsure I read correctly the use case: teh user-space application is
> > providing an L2 header and is expecting the Linux stack to add a vlan
> > tag? Or the linux application is sending packets on top of a vlan
> > device and desire no tag on the egress packet? or something else?
>
> The userland app does not care about the device being a VLAN one or not. Just a regular Ethernet device on which to send raw frames.
> This means the app provides a standard 14 byte Ethernet header on the socket and does not matter about any VLAN tag.
>
> Then, the goal is to be able to alter those packets at the qdisc level with tc filters.
> But, when such packets are sent on top of a VLAN device whose real device does not support VLAN tx offloading, the bad position of the skb network header makes this task impossible.
>
> To give a concrete example, here are few commands to show the problem easily:
>
> # modprobe dummy
> # ip link add link dummy0 dummy0.832 type vlan id 832
> # tc qdisc replace dev dummy0.832 root handle 1: prio
> # tc filter del dev dummy0.832
> # tc filter add dev dummy0.832 parent 1: prio 1 protocol ip u32 match u8 0 0 action pedit pedit munge ip tos set 0xc0
> # ip link set dummy0 up
> # ip link set dummy0.832 up
>
> Then start an application that uses AF_PACKET+SOCK_RAW sockets over the VLAN device:
>
> # dhclient -v dummy0.832
>
> If you look at the emitted packets on dummy0, you will see that the 0xc0 byte of the IPv4 TOS/DSCP field is not set.
> Instead, the 0xc0 tos byte is written 4 bytes too far, in the last byte of the IPv4 Identification field.

I understand, thanks. Still is not clear why the user-space application
would attach to dummy0.832 instead of dummy0.

With your patch the filter will match, but the dhcp packet will reach
the wire untagged, so the app will behave exactly as it would do
if/when attached to dummy0.

To me it looks like the dhcp client has a bad configuration (wrong
interface) and these patches address the issue in the wrong place
(inside the kernel).

Cheers,

Paolo

2023-01-12 16:07:01

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH net 1/2] net/af_packet: fix tx skb protocol on SOCK_PACKET sockets

On Tue, Jan 10, 2023 at 2:38 PM Hervé Boisse <[email protected]> wrote:
>
> Commit 75c65772c3d1 ("net/packet: Ask driver for protocol if not provided
> by user") introduces packet_parse_headers() to extract protocol for
> SOCK_RAW sockets.
> But, SOCK_PACKET sockets which provide similar behaviour are not considered
> so far and packets sent by those sockets will have their protocol unset.
>
> Extract the skb protocol value from the packet for SOCK_PACKET sockets, as
> currently done for SOCK_RAW sockets.
>
> Fixes: 75c65772c3d1 ("net/packet: Ask driver for protocol if not provided by user")
> Signed-off-by: Hervé Boisse <[email protected]>

Reviewed-by: Willem de Bruijn <[email protected]>

packet_sendmsg_spkt (SOCK_PACKET) also prepares raw packets and
validates their link layer header length with dev_validate_header
before calling packet_parse_headers. It should be fine to call
dev_parse_protocol.

Technically this adds a feature to packets of type SOCK_PACKET. I would
not qualify it as a bug fix. And not sure we care about adding new features to
SOCK_PACKET. New applications should use SOCK_RAW.

2023-01-12 16:31:30

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH net 2/2] net/af_packet: fix tx skb network header on SOCK_RAW sockets over VLAN device

On Tue, Jan 10, 2023 at 2:38 PM Hervé Boisse <[email protected]> wrote:
>
> When an application sends a packet on a SOCK_RAW socket over a VLAN device,
> there is a possibility that the skb network header is incorrectly set.
>
> The issue happens when the device used to send the packet is a VLAN device
> whose underlying device has no VLAN tx hardware offloading support.
> In that case, the VLAN driver reports a LL header size increased by 4 bytes
> to take into account the tag that will be added in software.
>
> However, the socket user has no clue about that and still provides a normal
> LL header without tag.

This is arguably a mistake.

A process using PF_PACKET to write to a device must know the link layer type.
SOCK_RAW should prepare a header equivalent to dev_hard_header (which
prepares it for SOCK_DGRAM). vlan_dev_hard_header clearly adds both the
Ethernet header and VLAN tag.

If applications assume virtual VLAN device exposes an Ethernet link layer,
then net/8021q/vlan_dev.c needs to expose that, including that hard_header_len.


> This results in the network header of the skb being shifted 4 bytes too far
> in the packet. This shift makes tc classifiers fail as they point to
> incorrect data.
>
> Move network header right after underlying VLAN device LL header size
> without tag, regardless of hardware offloading support. That is, the
> expected LL header size from socket user.
>
> Signed-off-by: Hervé Boisse <[email protected]>
> ---
> net/packet/af_packet.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index c18274f65b17..be892fd498a6 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1933,6 +1933,18 @@ static void packet_parse_headers(struct sk_buff *skb, struct socket *sock)
> skb->protocol = dev_parse_header_protocol(skb);
> }
>
> + /* VLAN device may report bigger LL header size due to reserved room for
> + * tag on devices without hardware offloading support
> + */
> + if (is_vlan_dev(skb->dev) &&
> + (sock->type == SOCK_RAW || sock->type == SOCK_PACKET)) {

Let's also try very hard to avoid adding branches in the hot path for
cases this rare.


> + struct net_device *real_dev = vlan_dev_real_dev(skb->dev);
> +
> + depth = real_dev->hard_header_len;
> + if (pskb_may_pull(skb, depth))
> + skb_set_network_header(skb, depth);
> + }
> +
> /* Move network header to the right position for VLAN tagged packets */
> if (likely(skb->dev->type == ARPHRD_ETHER) &&
> eth_type_vlan(skb->protocol) &&
> --
> 2.38.2
>

2023-01-12 17:36:55

by Hervé Boisse

[permalink] [raw]
Subject: Re: [PATCH net 2/2] net/af_packet: fix tx skb network header on SOCK_RAW sockets over VLAN device

On Thu, Jan 12, 2023 at 04:47:38PM +0100, Paolo Abeni wrote:
> I understand, thanks. Still is not clear why the user-space application
> would attach to dummy0.832 instead of dummy0.
>
> With your patch the filter will match, but the dhcp packet will reach
> the wire untagged, so the app will behave exactly as it would do
> if/when attached to dummy0.
>
> To me it looks like the dhcp client has a bad configuration (wrong
> interface) and these patches address the issue in the wrong place
> (inside the kernel).

No, the packet will actually reach the wire as a properly tagged 802.1Q frame.
For devices that do not support VLAN offloading (such as dummy but also the network card I am using), the kernel adds the tag itself in software before transmitting the packet to the real device.

You can verify this with a capture using tcpdump/wireshark on dummy0 versus dummy0.832.
That's why dhclient has to send its packets over dummy0.832 and not dummy0.

The same will happen on a real device. I checked on real hardware, with two boxes and their network cards connected through a cable.
If dhclient is started directly on the first box real device (eth0), the frame is received untagged by the second box, as intended.
But, if dhclient is started on top of the VLAN device (eth0.832), the second box receives a properly tagged frame.

Herv?

2023-01-12 19:46:51

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH net 2/2] net/af_packet: fix tx skb network header on SOCK_RAW sockets over VLAN device

On Thu, Jan 12, 2023 at 11:25 AM Hervé Boisse <[email protected]> wrote:
>
> On Thu, Jan 12, 2023 at 04:47:38PM +0100, Paolo Abeni wrote:
> > I understand, thanks. Still is not clear why the user-space application
> > would attach to dummy0.832 instead of dummy0.
> >
> > With your patch the filter will match, but the dhcp packet will reach
> > the wire untagged, so the app will behave exactly as it would do
> > if/when attached to dummy0.
> >
> > To me it looks like the dhcp client has a bad configuration (wrong
> > interface) and these patches address the issue in the wrong place
> > (inside the kernel).
>
> No, the packet will actually reach the wire as a properly tagged 802.1Q frame.
> For devices that do not support VLAN offloading (such as dummy but also the network card I am using), the kernel adds the tag itself in software before transmitting the packet to the real device.
>
> You can verify this with a capture using tcpdump/wireshark on dummy0 versus dummy0.832.
> That's why dhclient has to send its packets over dummy0.832 and not dummy0.
>
> The same will happen on a real device. I checked on real hardware, with two boxes and their network cards connected through a cable.
> If dhclient is started directly on the first box real device (eth0), the frame is received untagged by the second box, as intended.
> But, if dhclient is started on top of the VLAN device (eth0.832), the second box receives a properly tagged frame.

SOCK_DGRAM writing the tag and SOCK_RAW not writing it is inconsistent.

The driver clearly anticipates SOCK_RAW writers that write only
Ethernet, and fixes up the difference in its ndo_start_xmit:

/* Handle non-VLAN frames if they are sent to us, for example by DHCP.

That workaround only comes too late for code between dev_queue_xmit
and ndo_start_xmit: tc filters.

Strictly, dhclient is just not writing the right link layer, as
advertised by this device in dev->hard_header_len and
vlan_dev_hard_header. But being pedantic won't make the application
work (I assume it never has).

Perhaps the device can have an optional mode where it does present as
a pure Ethernet device, and handles all the VLAN insertion purely in
the driver code in ndo_start_xmit?