2024-05-20 07:10:08

by Chengen Du

[permalink] [raw]
Subject: [PATCH] af_packet: Handle outgoing VLAN packets without hardware offloading

In the outbound packet path, if hardware VLAN offloading is unavailable,
the VLAN tag is inserted into the payload but then cleared from the
metadata. Consequently, this could lead to a false negative result when
checking for the presence of a VLAN tag using skb_vlan_tag_present(),
causing the packet sniffing outcome to lack VLAN tag information. As a
result, the packet capturing tool may be unable to parse packets as
expected.

Signed-off-by: Chengen Du <[email protected]>
---
net/packet/af_packet.c | 25 +++++++++++++++++++------
1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index ea3ebc160e25..73e9acb1875b 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1010,12 +1010,15 @@ static void prb_fill_vlan_info(struct tpacket_kbdq_core *pkc,
if (skb_vlan_tag_present(pkc->skb)) {
ppd->hv1.tp_vlan_tci = skb_vlan_tag_get(pkc->skb);
ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->vlan_proto);
- ppd->tp_status = TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
+ } else if (eth_type_vlan(pkc->skb->protocol)) {
+ ppd->hv1.tp_vlan_tci = ntohs(vlan_eth_hdr(pkc->skb)->h_vlan_TCI);
+ ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->protocol);
} else {
ppd->hv1.tp_vlan_tci = 0;
ppd->hv1.tp_vlan_tpid = 0;
- ppd->tp_status = TP_STATUS_AVAILABLE;
}
+ ppd->tp_status = (ppd->hv1.tp_vlan_tci || ppd->hv1.tp_vlan_tpid) ?
+ TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID : TP_STATUS_AVAILABLE;
}

static void prb_run_all_ft_ops(struct tpacket_kbdq_core *pkc,
@@ -2427,11 +2430,15 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
if (skb_vlan_tag_present(skb)) {
h.h2->tp_vlan_tci = skb_vlan_tag_get(skb);
h.h2->tp_vlan_tpid = ntohs(skb->vlan_proto);
- status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
+ } else if (eth_type_vlan(skb->protocol)) {
+ h.h2->tp_vlan_tci = ntohs(vlan_eth_hdr(skb)->h_vlan_TCI);
+ h.h2->tp_vlan_tpid = ntohs(skb->protocol);
} else {
h.h2->tp_vlan_tci = 0;
h.h2->tp_vlan_tpid = 0;
}
+ if (h.h2->tp_vlan_tci || h.h2->tp_vlan_tpid)
+ status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
memset(h.h2->tp_padding, 0, sizeof(h.h2->tp_padding));
hdrlen = sizeof(*h.h2);
break;
@@ -2457,7 +2464,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
sll->sll_halen = dev_parse_header(skb, sll->sll_addr);
sll->sll_family = AF_PACKET;
sll->sll_hatype = dev->type;
- sll->sll_protocol = skb->protocol;
+ sll->sll_protocol = eth_type_vlan(skb->protocol) ?
+ vlan_eth_hdr(skb)->h_vlan_encapsulated_proto : skb->protocol;
sll->sll_pkttype = skb->pkt_type;
if (unlikely(packet_sock_flag(po, PACKET_SOCK_ORIGDEV)))
sll->sll_ifindex = orig_dev->ifindex;
@@ -3482,7 +3490,8 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
/* Original length was stored in sockaddr_ll fields */
origlen = PACKET_SKB_CB(skb)->sa.origlen;
sll->sll_family = AF_PACKET;
- sll->sll_protocol = skb->protocol;
+ sll->sll_protocol = eth_type_vlan(skb->protocol) ?
+ vlan_eth_hdr(skb)->h_vlan_encapsulated_proto : skb->protocol;
}

sock_recv_cmsgs(msg, sk, skb);
@@ -3538,11 +3547,15 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
if (skb_vlan_tag_present(skb)) {
aux.tp_vlan_tci = skb_vlan_tag_get(skb);
aux.tp_vlan_tpid = ntohs(skb->vlan_proto);
- aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
+ } else if (eth_type_vlan(skb->protocol)) {
+ aux.tp_vlan_tci = ntohs(vlan_eth_hdr(skb)->h_vlan_TCI);
+ aux.tp_vlan_tpid = ntohs(skb->protocol);
} else {
aux.tp_vlan_tci = 0;
aux.tp_vlan_tpid = 0;
}
+ if (aux.tp_vlan_tci || aux.tp_vlan_tpid)
+ aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
put_cmsg(msg, SOL_PACKET, PACKET_AUXDATA, sizeof(aux), &aux);
}

--
2.40.1



2024-05-20 18:35:41

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH] af_packet: Handle outgoing VLAN packets without hardware offloading

Chengen Du wrote:
> In the outbound packet path, if hardware VLAN offloading is unavailable,
> the VLAN tag is inserted into the payload but then cleared from the
> metadata. Consequently, this could lead to a false negative result when
> checking for the presence of a VLAN tag using skb_vlan_tag_present(),
> causing the packet sniffing outcome to lack VLAN tag information. As a
> result, the packet capturing tool may be unable to parse packets as
> expected.
>
> Signed-off-by: Chengen Du <[email protected]>

This is changing established behavior, which itself may confuse
existing PF_PACKET receivers.

The contract is that the VLAN tag can be observed in the payload or
as tp_vlan_* fields if it is offloaded.

> @@ -2457,7 +2464,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> sll->sll_halen = dev_parse_header(skb, sll->sll_addr);
> sll->sll_family = AF_PACKET;
> sll->sll_hatype = dev->type;
> - sll->sll_protocol = skb->protocol;
> + sll->sll_protocol = eth_type_vlan(skb->protocol) ?
> + vlan_eth_hdr(skb)->h_vlan_encapsulated_proto : skb->protocol;

This is a particularly subtle change of behavior.

> if (skb_vlan_tag_present(skb)) {
> aux.tp_vlan_tci = skb_vlan_tag_get(skb);
> aux.tp_vlan_tpid = ntohs(skb->vlan_proto);
> - aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> + } else if (eth_type_vlan(skb->protocol)) {
> + aux.tp_vlan_tci = ntohs(vlan_eth_hdr(skb)->h_vlan_TCI);
> + aux.tp_vlan_tpid = ntohs(skb->protocol);
> } else {
> aux.tp_vlan_tci = 0;
> aux.tp_vlan_tpid = 0;
> }
> + if (aux.tp_vlan_tci || aux.tp_vlan_tpid)
> + aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> put_cmsg(msg, SOL_PACKET, PACKET_AUXDATA, sizeof(aux), &aux);

vlan_tci 0 is valid identifier. That's the reason explicit field
TP_STATUS_VLAN_VALID was added.


2024-05-21 03:32:25

by Chengen Du

[permalink] [raw]
Subject: Re: [PATCH] af_packet: Handle outgoing VLAN packets without hardware offloading

Hi Willem,

Thank you for your response.

I would appreciate any suggestions you could offer, as I am not as
familiar with this area as you are.

I encountered an issue while capturing packets using tcpdump, which
leverages the libpcap library for sniffing functionalities.
Specifically, when I use "tcpdump -i any" to capture packets and
hardware VLAN offloading is unavailable, some bogus packets appear.
In this scenario, Linux uses cooked-mode capture (SLL) for the "any"
device, reading from a PF_PACKET/SOCK_DGRAM socket instead of the
usual PF_PACKET/SOCK_RAW socket.

Using SOCK_DGRAM instead of SOCK_RAW means that the Linux socket code
does not supply the packet's link-layer header.
Based on the code in af_packet.c, SOCK_DGRAM strips L2 headers from
the original packets and provides SLL for some L2 information.
From the receiver's perspective, the VLAN information can only be
parsed from SLL, which causes issues if the kernel stores VLAN
information in the payload.

As you mentioned, this modification affects existing PF_PACKET receivers.
For example, libpcap needs to change how it parses VLAN packets with
the PF_PACKET/SOCK_RAW socket.
The lack of VLAN information in SLL may prevent the receiver from
properly decoding the L3 frame in cooked mode.

I am new to this area and would appreciate it if you could kindly
correct any misunderstandings I might have about the mechanism.
I would also be grateful for any insights you could share on this issue.
Additionally, I am passionate about contributing to resolving this
issue and am willing to work on patches based on your suggestions.

Thank you for your time and assistance.

Best regards,
Chengen Du

On Tue, May 21, 2024 at 2:35 AM Willem de Bruijn
<[email protected]> wrote:
>
> Chengen Du wrote:
> > In the outbound packet path, if hardware VLAN offloading is unavailable,
> > the VLAN tag is inserted into the payload but then cleared from the
> > metadata. Consequently, this could lead to a false negative result when
> > checking for the presence of a VLAN tag using skb_vlan_tag_present(),
> > causing the packet sniffing outcome to lack VLAN tag information. As a
> > result, the packet capturing tool may be unable to parse packets as
> > expected.
> >
> > Signed-off-by: Chengen Du <[email protected]>
>
> This is changing established behavior, which itself may confuse
> existing PF_PACKET receivers.
>
> The contract is that the VLAN tag can be observed in the payload or
> as tp_vlan_* fields if it is offloaded.
>
> > @@ -2457,7 +2464,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> > sll->sll_halen = dev_parse_header(skb, sll->sll_addr);
> > sll->sll_family = AF_PACKET;
> > sll->sll_hatype = dev->type;
> > - sll->sll_protocol = skb->protocol;
> > + sll->sll_protocol = eth_type_vlan(skb->protocol) ?
> > + vlan_eth_hdr(skb)->h_vlan_encapsulated_proto : skb->protocol;
>
> This is a particularly subtle change of behavior.
>
> > if (skb_vlan_tag_present(skb)) {
> > aux.tp_vlan_tci = skb_vlan_tag_get(skb);
> > aux.tp_vlan_tpid = ntohs(skb->vlan_proto);
> > - aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> > + } else if (eth_type_vlan(skb->protocol)) {
> > + aux.tp_vlan_tci = ntohs(vlan_eth_hdr(skb)->h_vlan_TCI);
> > + aux.tp_vlan_tpid = ntohs(skb->protocol);
> > } else {
> > aux.tp_vlan_tci = 0;
> > aux.tp_vlan_tpid = 0;
> > }
> > + if (aux.tp_vlan_tci || aux.tp_vlan_tpid)
> > + aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> > put_cmsg(msg, SOL_PACKET, PACKET_AUXDATA, sizeof(aux), &aux);
>
> vlan_tci 0 is valid identifier. That's the reason explicit field
> TP_STATUS_VLAN_VALID was added.
>

2024-05-21 08:37:03

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH] af_packet: Handle outgoing VLAN packets without hardware offloading

On Tue, 2024-05-21 at 11:31 +0800, Chengen Du wrote:
> I would appreciate any suggestions you could offer, as I am not as
> familiar with this area as you are.
>
> I encountered an issue while capturing packets using tcpdump, which
> leverages the libpcap library for sniffing functionalities.
> Specifically, when I use "tcpdump -i any" to capture packets and
> hardware VLAN offloading is unavailable, some bogus packets appear.
> In this scenario, Linux uses cooked-mode capture (SLL) for the "any"
> device, reading from a PF_PACKET/SOCK_DGRAM socket instead of the
> usual PF_PACKET/SOCK_RAW socket.
>
> Using SOCK_DGRAM instead of SOCK_RAW means that the Linux socket code
> does not supply the packet's link-layer header.
> Based on the code in af_packet.c, SOCK_DGRAM strips L2 headers from
> the original packets and provides SLL for some L2 information.

> From the receiver's perspective, the VLAN information can only be
> parsed from SLL, which causes issues if the kernel stores VLAN
> information in the payload.
>
> As you mentioned, this modification affects existing PF_PACKET receivers.
> For example, libpcap needs to change how it parses VLAN packets with
> the PF_PACKET/SOCK_RAW socket.
> The lack of VLAN information in SLL may prevent the receiver from
> properly decoding the L3 frame in cooked mode.
>
> I am new to this area and would appreciate it if you could kindly
> correct any misunderstandings I might have about the mechanism.
> I would also be grateful for any insights you could share on this issue.
> Additionally, I am passionate about contributing to resolving this
> issue and am willing to work on patches based on your suggestions.

One possible way to address the above in a less invasive manner, could
be allocating a new TP_STATUS_VLAN_HEADER_IS_PRESENT bit, set it for
SLL when the vlan is not stripped by H/W and patch tcpdump to interpret
such info.

Side note: net-next is currently closed, and this patch should target
such tree (in the subj prefix).

The merge window for v6.10 has begun and we have already posted our
pull request. Therefore net-next is closed for new drivers, features,
code refactoring and optimizations. We are currently accepting bug
fixes only.

Please repost when net-next reopens after May 26th.

RFC patches sent for review only are obviously welcome at any time.

See:
https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#development-cycle

Cheers,

Paolo


2024-05-21 13:28:27

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH] af_packet: Handle outgoing VLAN packets without hardware offloading

Paolo Abeni wrote:
> On Tue, 2024-05-21 at 11:31 +0800, Chengen Du wrote:
> > I would appreciate any suggestions you could offer, as I am not as
> > familiar with this area as you are.
> >
> > I encountered an issue while capturing packets using tcpdump, which
> > leverages the libpcap library for sniffing functionalities.
> > Specifically, when I use "tcpdump -i any" to capture packets and
> > hardware VLAN offloading is unavailable, some bogus packets appear.

Bogus how exactly?

> > In this scenario, Linux uses cooked-mode capture (SLL) for the "any"
> > device, reading from a PF_PACKET/SOCK_DGRAM socket instead of the
> > usual PF_PACKET/SOCK_RAW socket.

Trying to extract L2 or VLAN information from the any device may be
the real issue here.

> >
> > Using SOCK_DGRAM instead of SOCK_RAW means that the Linux socket code
> > does not supply the packet's link-layer header.
> > Based on the code in af_packet.c, SOCK_DGRAM strips L2 headers from
> > the original packets and provides SLL for some L2 information.
>
> > From the receiver's perspective, the VLAN information can only be
> > parsed from SLL, which causes issues if the kernel stores VLAN
> > information in the payload.

ETH_HLEN is pulled, but the VLAN tag is still present, right?

> >
> > As you mentioned, this modification affects existing PF_PACKET receivers.
> > For example, libpcap needs to change how it parses VLAN packets with
> > the PF_PACKET/SOCK_RAW socket.
> > The lack of VLAN information in SLL may prevent the receiver from
> > properly decoding the L3 frame in cooked mode.
> >
> > I am new to this area and would appreciate it if you could kindly
> > correct any misunderstandings I might have about the mechanism.
> > I would also be grateful for any insights you could share on this issue.
> > Additionally, I am passionate about contributing to resolving this
> > issue and am willing to work on patches based on your suggestions.
>
> One possible way to address the above in a less invasive manner, could
> be allocating a new TP_STATUS_VLAN_HEADER_IS_PRESENT bit, set it for
> SLL when the vlan is not stripped by H/W and patch tcpdump to interpret
> such info.

Any change must indeed not break existing users. It's not sufficient
to change pcap/tcpdump. There are lots of other PF_PACKET users out
there. Related, it is helpful to verify that tcpdump agrees to a patch
before we change the ABI for it.

2024-05-22 14:27:58

by Chengen Du

[permalink] [raw]
Subject: Re: [PATCH] af_packet: Handle outgoing VLAN packets without hardware offloading

Hi Paolo,

Thank you for your useful suggestions and information.

Hi Willem,

The issue initially stems from libpcap [1].
Upon their further investigation, another issue was discovered,
leading to a kernel request [2] that describes the problem in detail.

In essence, the kernel does not provide VLAN information if hardware
VLAN offloading is unavailable in cooked mode.
The TCI-TPID is missing because the prb_fill_vlan_info() function in
af_packet.c does not modify the tp_vlan_tci/tp_vlan_tpid values since
the information is in the payload and not in the sk_buff struct.
In cooked mode, the L2 header is stripped, preventing the receiver
from determining the correct TCI-TPID value.
Additionally, the protocol in SLL is incorrect, which means the
receiver cannot parse the L3 header correctly.

To reproduce the issue, please follow these steps:
1. ip link add link ens18 ens18.24 type vlan id 24
2. ifconfig ens18.24 1.0.24.1/24
3. ping -n 1.0.24.3 > /dev/null 2>&1 &
4. tcpdump -nn -i any -Q out not tcp and not udp

The attached experiment results show that the protocol is incorrectly
parsed as IPv4, which leads to inaccurate outcomes.

Thanks to Paolo's suggestion, I propose that we add a new bit in the
status to indicate the presence of VLAN information in the payload and
modify the header's entry (i.e., tp_vlan_tci/tp_vlan_tpid)
accordingly.
For the sll_protocol part, we can introduce a new member in the
sockaddr_ll struct to represent the VLAN-encapsulated protocol, if
applicable.

In my humble opinion, this approach will not affect current users who
rely on the status to handle VLAN parsing, and the sll_protocol will
remain unchanged.
Please kindly provide your feedback on this proposal, as there may be
important points I have overlooked.
If this approach seems feasible, I will submit a new version next week.
Your assistance and opinions on this issue are important to me, and I
truly appreciate them.

Best regards,
Chengen Du

[1] https://github.com/the-tcpdump-group/libpcap/issues/1105
[2] https://marc.info/?l=linux-netdev&m=165074467517201&w=4

On Tue, May 21, 2024 at 9:28 PM Willem de Bruijn
<[email protected]> wrote:
>
> Paolo Abeni wrote:
> > On Tue, 2024-05-21 at 11:31 +0800, Chengen Du wrote:
> > > I would appreciate any suggestions you could offer, as I am not as
> > > familiar with this area as you are.
> > >
> > > I encountered an issue while capturing packets using tcpdump, which
> > > leverages the libpcap library for sniffing functionalities.
> > > Specifically, when I use "tcpdump -i any" to capture packets and
> > > hardware VLAN offloading is unavailable, some bogus packets appear.
>
> Bogus how exactly?
>
> > > In this scenario, Linux uses cooked-mode capture (SLL) for the "any"
> > > device, reading from a PF_PACKET/SOCK_DGRAM socket instead of the
> > > usual PF_PACKET/SOCK_RAW socket.
>
> Trying to extract L2 or VLAN information from the any device may be
> the real issue here.
>
> > >
> > > Using SOCK_DGRAM instead of SOCK_RAW means that the Linux socket code
> > > does not supply the packet's link-layer header.
> > > Based on the code in af_packet.c, SOCK_DGRAM strips L2 headers from
> > > the original packets and provides SLL for some L2 information.
> >
> > > From the receiver's perspective, the VLAN information can only be
> > > parsed from SLL, which causes issues if the kernel stores VLAN
> > > information in the payload.
>
> ETH_HLEN is pulled, but the VLAN tag is still present, right?
>
> > >
> > > As you mentioned, this modification affects existing PF_PACKET receivers.
> > > For example, libpcap needs to change how it parses VLAN packets with
> > > the PF_PACKET/SOCK_RAW socket.
> > > The lack of VLAN information in SLL may prevent the receiver from
> > > properly decoding the L3 frame in cooked mode.
> > >
> > > I am new to this area and would appreciate it if you could kindly
> > > correct any misunderstandings I might have about the mechanism.
> > > I would also be grateful for any insights you could share on this issue.
> > > Additionally, I am passionate about contributing to resolving this
> > > issue and am willing to work on patches based on your suggestions.
> >
> > One possible way to address the above in a less invasive manner, could
> > be allocating a new TP_STATUS_VLAN_HEADER_IS_PRESENT bit, set it for
> > SLL when the vlan is not stripped by H/W and patch tcpdump to interpret
> > such info.
>
> Any change must indeed not break existing users. It's not sufficient
> to change pcap/tcpdump. There are lots of other PF_PACKET users out
> there. Related, it is helpful to verify that tcpdump agrees to a patch
> before we change the ABI for it.


Attachments:
any_sll.pcap (504.00 B)

2024-05-22 18:39:55

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH] af_packet: Handle outgoing VLAN packets without hardware offloading

Chengen Du wrote:
> Hi Paolo,
>
> Thank you for your useful suggestions and information.
>
> Hi Willem,
>
> The issue initially stems from libpcap [1].
> Upon their further investigation, another issue was discovered,
> leading to a kernel request [2] that describes the problem in detail.
>
> In essence, the kernel does not provide VLAN information if hardware
> VLAN offloading is unavailable in cooked mode.
> The TCI-TPID is missing because the prb_fill_vlan_info() function in
> af_packet.c does not modify the tp_vlan_tci/tp_vlan_tpid values since
> the information is in the payload and not in the sk_buff struct.
> In cooked mode, the L2 header is stripped, preventing the receiver
> from determining the correct TCI-TPID value.
> Additionally, the protocol in SLL is incorrect, which means the
> receiver cannot parse the L3 header correctly.
>
> To reproduce the issue, please follow these steps:
> 1. ip link add link ens18 ens18.24 type vlan id 24
> 2. ifconfig ens18.24 1.0.24.1/24
> 3. ping -n 1.0.24.3 > /dev/null 2>&1 &
> 4. tcpdump -nn -i any -Q out not tcp and not udp
>
> The attached experiment results show that the protocol is incorrectly
> parsed as IPv4, which leads to inaccurate outcomes.
>
> Thanks to Paolo's suggestion, I propose that we add a new bit in the
> status to indicate the presence of VLAN information in the payload and
> modify the header's entry (i.e., tp_vlan_tci/tp_vlan_tpid)
> accordingly.
> For the sll_protocol part, we can introduce a new member in the
> sockaddr_ll struct to represent the VLAN-encapsulated protocol, if
> applicable.
>
> In my humble opinion, this approach will not affect current users who
> rely on the status to handle VLAN parsing, and the sll_protocol will
> remain unchanged.
> Please kindly provide your feedback on this proposal, as there may be
> important points I have overlooked.
> If this approach seems feasible, I will submit a new version next week.
> Your assistance and opinions on this issue are important to me, and I
> truly appreciate them.
>
> Best regards,
> Chengen Du
>
> [1] https://github.com/the-tcpdump-group/libpcap/issues/1105
> [2] https://marc.info/?l=linux-netdev&m=165074467517201&w=4

This is all super helpful context and will have to make it into the
commit message.

So if I understand correctly the issue is inconsistency about whether
VLAN tags are L2 or L3, and information getting lost along the way.

SOCK_DGRAM mode removes everything up to skb_network_offset, which
also removes the VLAN tags. But it does not update skb->protocol.

msg_name includes sockaddr_ll.sll_protocol which is set to
skb->protocol.

So the process gets a packet with purported protocol ETH_P_8021Q
starting beginning at an IP or IPv6 header.

A few alternatives to address this:

1. insert the VLAN tag back into the packet, with an skb_push.
2. prepare the data as if it is a VLAN offloaded packet:
pass the VLAN information through PACKET_AUXDATA.
3. pull not up to skb_network_offset, but pull mac_len.

Your patch does the second.

I think the approach is largely sound, with a few issues to consider:
- QinQ. The current solution just passes the protocol in the outer tag
- Other L2.5, like MPLS. This solution does not work for those.
(if they need a fix, and the same network_offset issue applies.)

3 would solve all these cases, I think. But is a larger diversion from
established behavior.

> On Tue, May 21, 2024 at 9:28 PM Willem de Bruijn
> <[email protected]> wrote:
> >
> > Paolo Abeni wrote:
> > > On Tue, 2024-05-21 at 11:31 +0800, Chengen Du wrote:
> > > > I would appreciate any suggestions you could offer, as I am not as
> > > > familiar with this area as you are.
> > > >
> > > > I encountered an issue while capturing packets using tcpdump, which
> > > > leverages the libpcap library for sniffing functionalities.
> > > > Specifically, when I use "tcpdump -i any" to capture packets and
> > > > hardware VLAN offloading is unavailable, some bogus packets appear.
> >
> > Bogus how exactly?
> >
> > > > In this scenario, Linux uses cooked-mode capture (SLL) for the "any"
> > > > device, reading from a PF_PACKET/SOCK_DGRAM socket instead of the
> > > > usual PF_PACKET/SOCK_RAW socket.
> >
> > Trying to extract L2 or VLAN information from the any device may be
> > the real issue here.
> >
> > > >
> > > > Using SOCK_DGRAM instead of SOCK_RAW means that the Linux socket code
> > > > does not supply the packet's link-layer header.
> > > > Based on the code in af_packet.c, SOCK_DGRAM strips L2 headers from
> > > > the original packets and provides SLL for some L2 information.
> > >
> > > > From the receiver's perspective, the VLAN information can only be
> > > > parsed from SLL, which causes issues if the kernel stores VLAN
> > > > information in the payload.
> >
> > ETH_HLEN is pulled, but the VLAN tag is still present, right?
> >
> > > >
> > > > As you mentioned, this modification affects existing PF_PACKET receivers.
> > > > For example, libpcap needs to change how it parses VLAN packets with
> > > > the PF_PACKET/SOCK_RAW socket.
> > > > The lack of VLAN information in SLL may prevent the receiver from
> > > > properly decoding the L3 frame in cooked mode.
> > > >
> > > > I am new to this area and would appreciate it if you could kindly
> > > > correct any misunderstandings I might have about the mechanism.
> > > > I would also be grateful for any insights you could share on this issue.
> > > > Additionally, I am passionate about contributing to resolving this
> > > > issue and am willing to work on patches based on your suggestions.
> > >
> > > One possible way to address the above in a less invasive manner, could
> > > be allocating a new TP_STATUS_VLAN_HEADER_IS_PRESENT bit, set it for
> > > SLL when the vlan is not stripped by H/W and patch tcpdump to interpret
> > > such info.
> >
> > Any change must indeed not break existing users. It's not sufficient
> > to change pcap/tcpdump. There are lots of other PF_PACKET users out
> > there. Related, it is helpful to verify that tcpdump agrees to a patch
> > before we change the ABI for it.



2024-05-22 19:55:02

by alexandre.ferrieux

[permalink] [raw]
Subject: Re: [PATCH] af_packet: Handle outgoing VLAN packets without hardware offloading

Hi Willem, all.

On 22/05/2024 20:39, Willem de Bruijn wrote:
> Chengen Du wrote:
> >
> > [1] https://github.com/the-tcpdump-group/libpcap/issues/1105
> > [2] https://marc.info/?l=linux-netdev&m=165074467517201&w=4
First, a huge thanks to you guys for digging this out, and for taking it on for
good.
> This is all super helpful context and will have to make it into the
> commit message.
>
> So if I understand correctly the issue is inconsistency about whether
> VLAN tags are L2 or L3, and information getting lost along the way.
Exactly. As you put it, L2.5 is the hot potato nobody wants to handle; each side
assumes the other will take care of it :)
> SOCK_DGRAM mode removes everything up to skb_network_offset, which
> also removes the VLAN tags. But it does not update skb->protocol.
>
> msg_name includes sockaddr_ll.sll_protocol which is set to
> skb->protocol.
>
> So the process gets a packet with purported protocol ETH_P_8021Q
> starting beginning at an IP or IPv6 header.
>
> A few alternatives to address this:
>
> 1. insert the VLAN tag back into the packet, with an skb_push.
> 2. prepare the data as if it is a VLAN offloaded packet:
> pass the VLAN information through PACKET_AUXDATA.
> 3. pull not up to skb_network_offset, but pull mac_len.
>
> Your patch does the second.
>
> I think the approach is largely sound, with a few issues to consider:
> - QinQ. The current solution just passes the protocol in the outer tag
> - Other L2.5, like MPLS. This solution does not work for those.
> (if they need a fix, and the same network_offset issue applies.)
>
> 3 would solve all these cases, I think. But is a larger diversion from
> established behavior.
I had somehow formed the same analysis and list of available options (with the
difference that, being a kernel newbie, I was 10% sure).
A few extra things you might consider before making the decision:

(a) If the absolute highest priority goes to backwards compatibility, then
Chengen's approach (your 2.) is the clear winner.

(b) However, from my standpoint as a protocol-agnostic packet-cruncher
(capturing all sorts of encapsulations at midpoint in promiscuous mode - none of
the packets is really meant for the local stack), the very idea of "special
casing" *one* level of VLAN and burying it inside an obscure, lossy pseudo-L2
(SLL), is completely alien. So, any setting (be it not the default) that would
allow to wipe this wart, would be a huge bonus. So I'd happily go with (1.) or
(3.) which both do that. I'll defer to your appreciation of which is the least
disruptive.

(c) Speaking of backwards compatibility, I would respectfully like to point out
that a behavior that has been utterly broken for so many years, might qualify
for a rather decisive correction. You won't break much anyway :)

TL;DR: all three options are enormously better than the statu quo => by all
means, fix it either way :)

Best regards,

-Alex


____________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

2024-05-23 14:18:36

by Chengen Du

[permalink] [raw]
Subject: Re: [PATCH] af_packet: Handle outgoing VLAN packets without hardware offloading

Hi Willem, all,

Thank you for highlighting the QinQ and L2.5 issues.
These are areas I am not very familiar with, and I appreciate your guidance.

To address the QinQ and L2.5 issues, the third approach seems like a
promising solution.
If I understand correctly, in the QinQ scenario, we need to preserve
the link layer header because it includes two VLAN tags.
For the L2.5 issue, we can adjust by pulling mac_len instead of
skb_network_offset.
In summary, we may need to retain the link layer header to enable
receivers to parse different protocol scenarios.

Although this approach can resolve all issues, it requires the
receiver's cooperation to implement the parsing logic of the link
layer header.
I am concerned that this implementation may take time and adding it
directly into the kernel could place time pressure on existing users.

I would like to propose some ideas and welcome your thoughts on them.
Firstly, I suggest we address the VLAN issue using the second
approach, as it appears to be a bug and can be resolved without
affecting current users.
Secondly, we could introduce the link layer header preservation via a
new packet socket flag.
Receivers could implement and test this by enabling the socket flag.
This way, we avoid disrupting existing receiver behavior while
providing the capability to parse more complex protocols.

This is a rough idea and may require further discussion.
Please share your opinions if you have any concerns.
Your suggestions and assistance are crucial to resolving this issue,
and I greatly appreciate your input.

Best regards,
Chengen Du

On Thu, May 23, 2024 at 3:54 AM <[email protected]> wrote:
>
> Hi Willem, all.
>
> On 22/05/2024 20:39, Willem de Bruijn wrote:
> > Chengen Du wrote:
> > >
> > > [1] https://github.com/the-tcpdump-group/libpcap/issues/1105
> > > [2] https://marc.info/?l=linux-netdev&m=165074467517201&w=4
> First, a huge thanks to you guys for digging this out, and for taking it on for
> good.
> > This is all super helpful context and will have to make it into the
> > commit message.
> >
> > So if I understand correctly the issue is inconsistency about whether
> > VLAN tags are L2 or L3, and information getting lost along the way.
> Exactly. As you put it, L2.5 is the hot potato nobody wants to handle; each side
> assumes the other will take care of it :)
> > SOCK_DGRAM mode removes everything up to skb_network_offset, which
> > also removes the VLAN tags. But it does not update skb->protocol.
> >
> > msg_name includes sockaddr_ll.sll_protocol which is set to
> > skb->protocol.
> >
> > So the process gets a packet with purported protocol ETH_P_8021Q
> > starting beginning at an IP or IPv6 header.
> >
> > A few alternatives to address this:
> >
> > 1. insert the VLAN tag back into the packet, with an skb_push.
> > 2. prepare the data as if it is a VLAN offloaded packet:
> > pass the VLAN information through PACKET_AUXDATA.
> > 3. pull not up to skb_network_offset, but pull mac_len.
> >
> > Your patch does the second.
> >
> > I think the approach is largely sound, with a few issues to consider:
> > - QinQ. The current solution just passes the protocol in the outer tag
> > - Other L2.5, like MPLS. This solution does not work for those.
> > (if they need a fix, and the same network_offset issue applies.)
> >
> > 3 would solve all these cases, I think. But is a larger diversion from
> > established behavior.
> I had somehow formed the same analysis and list of available options (with the
> difference that, being a kernel newbie, I was 10% sure).
> A few extra things you might consider before making the decision:
>
> (a) If the absolute highest priority goes to backwards compatibility, then
> Chengen's approach (your 2.) is the clear winner.
>
> (b) However, from my standpoint as a protocol-agnostic packet-cruncher
> (capturing all sorts of encapsulations at midpoint in promiscuous mode - none of
> the packets is really meant for the local stack), the very idea of "special
> casing" *one* level of VLAN and burying it inside an obscure, lossy pseudo-L2
> (SLL), is completely alien. So, any setting (be it not the default) that would
> allow to wipe this wart, would be a huge bonus. So I'd happily go with (1) or
> (3.) which both do that. I'll defer to your appreciation of which is the least
> disruptive.
>
> (c) Speaking of backwards compatibility, I would respectfully like to point out
> that a behavior that has been utterly broken for so many years, might qualify
> for a rather decisive correction. You won't break much anyway :)
>
> TL;DR: all three options are enormously better than the statu quo => by all
> means, fix it either way :)
>
> Best regards,
>
> -Alex
>
>
> ____________________________________________________________________________________________________________
> Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
> Thank you.

2024-05-23 14:53:21

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH] af_packet: Handle outgoing VLAN packets without hardware offloading

Chengen Du wrote:
> Hi Willem, all,
>
> Thank you for highlighting the QinQ and L2.5 issues.
> These are areas I am not very familiar with, and I appreciate your guidance.
>
> To address the QinQ and L2.5 issues, the third approach seems like a
> promising solution.
> If I understand correctly, in the QinQ scenario, we need to preserve
> the link layer header because it includes two VLAN tags.
> For the L2.5 issue, we can adjust by pulling mac_len instead of
> skb_network_offset.
> In summary, we may need to retain the link layer header to enable
> receivers to parse different protocol scenarios.
>
> Although this approach can resolve all issues, it requires the
> receiver's cooperation to implement the parsing logic of the link
> layer header.

I was about to bring up the same.

Existing BPF filters may expect L3 headers. A change like this will
have to be opt-in through a socket option.

Since I see no better option to support all L2.5 protocols, if this
is something that tcpdump/pcap would use, we should do it.

> I am concerned that this implementation may take time and adding it
> directly into the kernel could place time pressure on existing users.
>
> I would like to propose some ideas and welcome your thoughts on them.
> Firstly, I suggest we address the VLAN issue using the second
> approach, as it appears to be a bug and can be resolved without
> affecting current users.

Agreed. Let me respond in more detail to your original patch.

> Secondly, we could introduce the link layer header preservation via a
> new packet socket flag.
> Receivers could implement and test this by enabling the socket flag.
> This way, we avoid disrupting existing receiver behavior while
> providing the capability to parse more complex protocols.

2024-05-23 14:57:13

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH] af_packet: Handle outgoing VLAN packets without hardware offloading

Chengen Du wrote:
> In the outbound packet path, if hardware VLAN offloading is unavailable,
> the VLAN tag is inserted into the payload but then cleared from the
> metadata. Consequently, this could lead to a false negative result when
> checking for the presence of a VLAN tag using skb_vlan_tag_present(),
> causing the packet sniffing outcome to lack VLAN tag information. As a
> result, the packet capturing tool may be unable to parse packets as
> expected.
>
> Signed-off-by: Chengen Du <[email protected]>

Fixes tag and Cc: stable.

As discussed please add more detail to the commit message that
explains the bug. And/or add a Link: for instance to the github
issue.

> ---
> net/packet/af_packet.c | 25 +++++++++++++++++++------
> 1 file changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index ea3ebc160e25..73e9acb1875b 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1010,12 +1010,15 @@ static void prb_fill_vlan_info(struct tpacket_kbdq_core *pkc,
> if (skb_vlan_tag_present(pkc->skb)) {
> ppd->hv1.tp_vlan_tci = skb_vlan_tag_get(pkc->skb);
> ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->vlan_proto);
> - ppd->tp_status = TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> + } else if (eth_type_vlan(pkc->skb->protocol)) {
> + ppd->hv1.tp_vlan_tci = ntohs(vlan_eth_hdr(pkc->skb)->h_vlan_TCI);
> + ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->protocol);
> } else {
> ppd->hv1.tp_vlan_tci = 0;
> ppd->hv1.tp_vlan_tpid = 0;
> - ppd->tp_status = TP_STATUS_AVAILABLE;
> }
> + ppd->tp_status = (ppd->hv1.tp_vlan_tci || ppd->hv1.tp_vlan_tpid) ?
> + TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID : TP_STATUS_AVAILABLE;

Don't move this out of the original branch and don't make the valid
conditional on the value of tp_vlan_tci. Just duplicating the line
to both branches is fine. Here and below.

> }
>
> static void prb_run_all_ft_ops(struct tpacket_kbdq_core *pkc,
> @@ -2427,11 +2430,15 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> if (skb_vlan_tag_present(skb)) {
> h.h2->tp_vlan_tci = skb_vlan_tag_get(skb);
> h.h2->tp_vlan_tpid = ntohs(skb->vlan_proto);
> - status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> + } else if (eth_type_vlan(skb->protocol)) {
> + h.h2->tp_vlan_tci = ntohs(vlan_eth_hdr(skb)->h_vlan_TCI);
> + h.h2->tp_vlan_tpid = ntohs(skb->protocol);
> } else {
> h.h2->tp_vlan_tci = 0;
> h.h2->tp_vlan_tpid = 0;
> }
> + if (h.h2->tp_vlan_tci || h.h2->tp_vlan_tpid)
> + status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> memset(h.h2->tp_padding, 0, sizeof(h.h2->tp_padding));
> hdrlen = sizeof(*h.h2);
> break;
> @@ -2457,7 +2464,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> sll->sll_halen = dev_parse_header(skb, sll->sll_addr);
> sll->sll_family = AF_PACKET;
> sll->sll_hatype = dev->type;
> - sll->sll_protocol = skb->protocol;
> + sll->sll_protocol = eth_type_vlan(skb->protocol) ?
> + vlan_eth_hdr(skb)->h_vlan_encapsulated_proto : skb->protocol;

For QinQ you probably want the true network protocol, not the inner
VLAN tag.
> sll->sll_pkttype = skb->pkt_type;
> if (unlikely(packet_sock_flag(po, PACKET_SOCK_ORIGDEV)))
> sll->sll_ifindex = orig_dev->ifindex;
> @@ -3482,7 +3490,8 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
> /* Original length was stored in sockaddr_ll fields */
> origlen = PACKET_SKB_CB(skb)->sa.origlen;
> sll->sll_family = AF_PACKET;
> - sll->sll_protocol = skb->protocol;
> + sll->sll_protocol = eth_type_vlan(skb->protocol) ?
> + vlan_eth_hdr(skb)->h_vlan_encapsulated_proto : skb->protocol;
> }
>
> sock_recv_cmsgs(msg, sk, skb);
> @@ -3538,11 +3547,15 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
> if (skb_vlan_tag_present(skb)) {
> aux.tp_vlan_tci = skb_vlan_tag_get(skb);
> aux.tp_vlan_tpid = ntohs(skb->vlan_proto);
> - aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> + } else if (eth_type_vlan(skb->protocol)) {
> + aux.tp_vlan_tci = ntohs(vlan_eth_hdr(skb)->h_vlan_TCI);
> + aux.tp_vlan_tpid = ntohs(skb->protocol);
> } else {
> aux.tp_vlan_tci = 0;
> aux.tp_vlan_tpid = 0;
> }
> + if (aux.tp_vlan_tci || aux.tp_vlan_tpid)
> + aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> put_cmsg(msg, SOL_PACKET, PACKET_AUXDATA, sizeof(aux), &aux);
> }
>
> --
> 2.40.1
>



2024-05-24 14:04:47

by Chengen Du

[permalink] [raw]
Subject: Re: [PATCH] af_packet: Handle outgoing VLAN packets without hardware offloading

Hi Willem,

Thank you for your reply.

Please allow me to briefly summarize to ensure we are in agreement on
the solution.

Firstly, I will submit a patch to add a new bit in the status to
indicate the presence of VLAN information in the payload and modify
the header's entry accordingly.
A new member will be added to the sockaddr_ll struct to represent the
VLAN-encapsulated protocol, thus avoiding direct modification of the
sll_protocol.

Following this patch, I will work on enabling the link layer header
via a socket option.

If there are no ambiguities, I will submit the patch next week.
Your assistance and suggestions are highly appreciated.

Best regards,
Chengen Du

On Thu, May 23, 2024 at 10:57 PM Willem de Bruijn
<[email protected]> wrote:
>
> Chengen Du wrote:
> > In the outbound packet path, if hardware VLAN offloading is unavailable,
> > the VLAN tag is inserted into the payload but then cleared from the
> > metadata. Consequently, this could lead to a false negative result when
> > checking for the presence of a VLAN tag using skb_vlan_tag_present(),
> > causing the packet sniffing outcome to lack VLAN tag information. As a
> > result, the packet capturing tool may be unable to parse packets as
> > expected.
> >
> > Signed-off-by: Chengen Du <[email protected]>
>
> Fixes tag and Cc: stable.
>
> As discussed please add more detail to the commit message that
> explains the bug. And/or add a Link: for instance to the github
> issue.
>
> > ---
> > net/packet/af_packet.c | 25 +++++++++++++++++++------
> > 1 file changed, 19 insertions(+), 6 deletions(-)
> >
> > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > index ea3ebc160e25..73e9acb1875b 100644
> > --- a/net/packet/af_packet.c
> > +++ b/net/packet/af_packet.c
> > @@ -1010,12 +1010,15 @@ static void prb_fill_vlan_info(struct tpacket_kbdq_core *pkc,
> > if (skb_vlan_tag_present(pkc->skb)) {
> > ppd->hv1.tp_vlan_tci = skb_vlan_tag_get(pkc->skb);
> > ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->vlan_proto);
> > - ppd->tp_status = TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> > + } else if (eth_type_vlan(pkc->skb->protocol)) {
> > + ppd->hv1.tp_vlan_tci = ntohs(vlan_eth_hdr(pkc->skb)->h_vlan_TCI);
> > + ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->protocol);
> > } else {
> > ppd->hv1.tp_vlan_tci = 0;
> > ppd->hv1.tp_vlan_tpid = 0;
> > - ppd->tp_status = TP_STATUS_AVAILABLE;
> > }
> > + ppd->tp_status = (ppd->hv1.tp_vlan_tci || ppd->hv1.tp_vlan_tpid) ?
> > + TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID : TP_STATUS_AVAILABLE;
>
> Don't move this out of the original branch and don't make the valid
> conditional on the value of tp_vlan_tci. Just duplicating the line
> to both branches is fine. Here and below.
>
> > }
> >
> > static void prb_run_all_ft_ops(struct tpacket_kbdq_core *pkc,
> > @@ -2427,11 +2430,15 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> > if (skb_vlan_tag_present(skb)) {
> > h.h2->tp_vlan_tci = skb_vlan_tag_get(skb);
> > h.h2->tp_vlan_tpid = ntohs(skb->vlan_proto);
> > - status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> > + } else if (eth_type_vlan(skb->protocol)) {
> > + h.h2->tp_vlan_tci = ntohs(vlan_eth_hdr(skb)->h_vlan_TCI);
> > + h.h2->tp_vlan_tpid = ntohs(skb->protocol);
> > } else {
> > h.h2->tp_vlan_tci = 0;
> > h.h2->tp_vlan_tpid = 0;
> > }
> > + if (h.h2->tp_vlan_tci || h.h2->tp_vlan_tpid)
> > + status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> > memset(h.h2->tp_padding, 0, sizeof(h.h2->tp_padding));
> > hdrlen = sizeof(*h.h2);
> > break;
> > @@ -2457,7 +2464,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> > sll->sll_halen = dev_parse_header(skb, sll->sll_addr);
> > sll->sll_family = AF_PACKET;
> > sll->sll_hatype = dev->type;
> > - sll->sll_protocol = skb->protocol;
> > + sll->sll_protocol = eth_type_vlan(skb->protocol) ?
> > + vlan_eth_hdr(skb)->h_vlan_encapsulated_proto : skb->protocol;
>
> For QinQ you probably want the true network protocol, not the inner
> VLAN tag.
> > sll->sll_pkttype = skb->pkt_type;
> > if (unlikely(packet_sock_flag(po, PACKET_SOCK_ORIGDEV)))
> > sll->sll_ifindex = orig_dev->ifindex;
> > @@ -3482,7 +3490,8 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
> > /* Original length was stored in sockaddr_ll fields */
> > origlen = PACKET_SKB_CB(skb)->sa.origlen;
> > sll->sll_family = AF_PACKET;
> > - sll->sll_protocol = skb->protocol;
> > + sll->sll_protocol = eth_type_vlan(skb->protocol) ?
> > + vlan_eth_hdr(skb)->h_vlan_encapsulated_proto : skb->protocol;
> > }
> >
> > sock_recv_cmsgs(msg, sk, skb);
> > @@ -3538,11 +3547,15 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
> > if (skb_vlan_tag_present(skb)) {
> > aux.tp_vlan_tci = skb_vlan_tag_get(skb);
> > aux.tp_vlan_tpid = ntohs(skb->vlan_proto);
> > - aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> > + } else if (eth_type_vlan(skb->protocol)) {
> > + aux.tp_vlan_tci = ntohs(vlan_eth_hdr(skb)->h_vlan_TCI);
> > + aux.tp_vlan_tpid = ntohs(skb->protocol);
> > } else {
> > aux.tp_vlan_tci = 0;
> > aux.tp_vlan_tpid = 0;
> > }
> > + if (aux.tp_vlan_tci || aux.tp_vlan_tpid)
> > + aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> > put_cmsg(msg, SOL_PACKET, PACKET_AUXDATA, sizeof(aux), &aux);
> > }
> >
> > --
> > 2.40.1
> >
>
>

2024-05-25 15:52:23

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH] af_packet: Handle outgoing VLAN packets without hardware offloading

> Firstly, I will submit a patch to add a new bit in the status to
> indicate the presence of VLAN information in the payload and modify
> the header's entry accordingly.
> A new member will be added to the sockaddr_ll struct to represent the
> VLAN-encapsulated protocol, thus avoiding direct modification of the
> sll_protocol.

See my earlier comments to your v1 patch. It's a minor respin. Please
don't add new members.

> Following this patch, I will work on enabling the link layer header
> via a socket option.

Let's focus on the VLAN for now.

Any expansion of the ABI requires a very clear path to use. I'd like
to see the pcap and tcpdump (or wireshark) changes that use it.

First, we need to even understand better why anything is using
SOCK_DGRAM when access to L2.5 headers is important, and whether the
process can convert to using SOCK_RAW instead.

2024-05-25 20:57:30

by alexandre.ferrieux

[permalink] [raw]
Subject: Re: [PATCH] af_packet: Handle outgoing VLAN packets without hardware offloading

On 25/05/2024 17:51, Willem de Bruijn wrote:
>
> First, we need to even understand better why anything is using
> SOCK_DGRAM when access to L2.5 headers is important, and whether the
> process can convert to using SOCK_RAW instead.

For libpcap, it seems to be linked to the fact that the "any" device can
aggregate links with varied L2 header sizes, which in turn complicates filtering
(see Guy Harris' comment on this [1]).

Given that 99% of useful traffic is Ethernet, such considerations look awkward
now. I for one would love to see an "any2" based on SOCK_RAW. And while you're
at it, please let the new variant of SLL contain the full Ethernet header at the
end, so that a simple offset gives access to the whole linear wire image...

-Alex

[1] https://github.com/the-tcpdump-group/libpcap/issues/1105#issuecomment-1092221785
____________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

2024-05-26 15:25:56

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH] af_packet: Handle outgoing VLAN packets without hardware offloading

alexandre.ferrieux@ wrote:
> On 25/05/2024 17:51, Willem de Bruijn wrote:
> >
> > First, we need to even understand better why anything is using
> > SOCK_DGRAM when access to L2.5 headers is important, and whether the
> > process can convert to using SOCK_RAW instead.
>
> For libpcap, it seems to be linked to the fact that the "any" device can
> aggregate links with varied L2 header sizes, which in turn complicates filtering
> (see Guy Harris' comment on this [1]).
>
> Given that 99% of useful traffic is Ethernet, such considerations look awkward
> now. I for one would love to see an "any2" based on SOCK_RAW. And while you're
> at it, please let the new variant of SLL contain the full Ethernet header at the
> end, so that a simple offset gives access to the whole linear wire image...

Complicating factors are loopback and tunnel devices, which are
common. Loopback (ARPHRD_LOOPBACK) is pseudo Ethernet. Libpcap does
convert this to DLT_EN10MB. But it converts ARPHRD_TUNNEL to DLT_RAW,
as can be expected.

I don't think a new DLT_LINUX_SLL3 is a solution. The application
just wants to receive the full L2.5 header, not yet another parsed
version.

Libpcap can conceivably already read with SOCK_RAW and still convert
each frame to SLL internally.

Separate from this, I'd like to see where exactly these L2.5 tags are
inserted and whether any besides VLAN (incl. QinQ special case) are
even susceptible.

For VLAN on the normal egress path, like the ICMP reproducer, this
probably is in vlan_insert_tag_set_proto. Not 100% sure. That indeed
inserts the header and updates skb->protocol, without changing the
network header.

MPLS in mpls_forward does appear to update network_header, and uses
this in mpls_hdr. So perhaps the issue does not extend beyond VLAN.

2024-05-27 00:19:50

by alexandre.ferrieux

[permalink] [raw]
Subject: Re: [PATCH] af_packet: Handle outgoing VLAN packets without hardware offloading

On 26/05/2024 17:25, Willem de Bruijn wrote:
> I don't think a new DLT_LINUX_SLL3 is a solution. The application
> just wants to receive the full L2.5 header, not yet another parsed
> version

Yes, I did not intend any parsing either: the SLLvx should end with the ethtype
(the one that occurs right after the MAC addresses in Ethernet). But I was just
pointing out that it is a pity that SSLv1 and SSLv2 only keep the source MAC
address when all physical interfaces in a system are Ethernet. Why not keep the
whole MAC header ?
____________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.