The issue initially stems from libpcap [1]. In the outbound packet path,
if hardware VLAN offloading is unavailable, the VLAN tag is inserted into
the payload but then cleared from the sk_buff struct. Consequently, this
can lead to a false negative when checking for the presence of a VLAN tag,
causing the packet sniffing outcome to lack VLAN tag information (i.e.,
TCI-TPID). As a result, the packet capturing tool may be unable to parse
packets as expected.
The TCI-TPID is missing because the prb_fill_vlan_info() function does not
modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the
payload and not in the sk_buff struct. The skb_vlan_tag_present() function
only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header
is stripped, preventing the packet capturing tool from determining the
correct TCI-TPID value. Additionally, the protocol in SLL is incorrect,
which means the packet capturing tool cannot parse the L3 header correctly.
[1] https://github.com/the-tcpdump-group/libpcap/issues/1105
Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
Cc: [email protected]
Signed-off-by: Chengen Du <[email protected]>
---
net/packet/af_packet.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index ea3ebc160e25..82b36e90d73b 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1011,6 +1011,10 @@ static void prb_fill_vlan_info(struct tpacket_kbdq_core *pkc,
ppd->hv1.tp_vlan_tci = skb_vlan_tag_get(pkc->skb);
ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->vlan_proto);
ppd->tp_status = TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
+ } else if (eth_type_vlan(pkc->skb->protocol)) {
+ ppd->hv1.tp_vlan_tci = ntohs(vlan_eth_hdr(pkc->skb)->h_vlan_TCI);
+ ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->protocol);
+ ppd->tp_status = TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
} else {
ppd->hv1.tp_vlan_tci = 0;
ppd->hv1.tp_vlan_tpid = 0;
@@ -2428,6 +2432,10 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
h.h2->tp_vlan_tci = skb_vlan_tag_get(skb);
h.h2->tp_vlan_tpid = ntohs(skb->vlan_proto);
status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
+ } else if (eth_type_vlan(skb->protocol)) {
+ h.h2->tp_vlan_tci = ntohs(vlan_eth_hdr(skb)->h_vlan_TCI);
+ h.h2->tp_vlan_tpid = ntohs(skb->protocol);
+ status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
} else {
h.h2->tp_vlan_tci = 0;
h.h2->tp_vlan_tpid = 0;
@@ -2457,7 +2465,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
sll->sll_halen = dev_parse_header(skb, sll->sll_addr);
sll->sll_family = AF_PACKET;
sll->sll_hatype = dev->type;
- sll->sll_protocol = skb->protocol;
+ sll->sll_protocol = (skb->protocol == htons(ETH_P_8021Q)) ?
+ vlan_eth_hdr(skb)->h_vlan_encapsulated_proto : skb->protocol;
sll->sll_pkttype = skb->pkt_type;
if (unlikely(packet_sock_flag(po, PACKET_SOCK_ORIGDEV)))
sll->sll_ifindex = orig_dev->ifindex;
@@ -3482,7 +3491,8 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
/* Original length was stored in sockaddr_ll fields */
origlen = PACKET_SKB_CB(skb)->sa.origlen;
sll->sll_family = AF_PACKET;
- sll->sll_protocol = skb->protocol;
+ sll->sll_protocol = (skb->protocol == htons(ETH_P_8021Q)) ?
+ vlan_eth_hdr(skb)->h_vlan_encapsulated_proto : skb->protocol;
}
sock_recv_cmsgs(msg, sk, skb);
@@ -3539,6 +3549,10 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
aux.tp_vlan_tci = skb_vlan_tag_get(skb);
aux.tp_vlan_tpid = ntohs(skb->vlan_proto);
aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
+ } else if (eth_type_vlan(skb->protocol)) {
+ aux.tp_vlan_tci = ntohs(vlan_eth_hdr(skb)->h_vlan_TCI);
+ aux.tp_vlan_tpid = ntohs(skb->protocol);
+ aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
} else {
aux.tp_vlan_tci = 0;
aux.tp_vlan_tpid = 0;
--
2.40.1
Include target: net vs net-next
[PATCH net v3]
Chengen Du wrote:
> The issue initially stems from libpcap [1]. In the outbound packet path,
> if hardware VLAN offloading is unavailable, the VLAN tag is inserted into
> the payload but then cleared from the sk_buff struct. Consequently, this
> can lead to a false negative when checking for the presence of a VLAN tag,
> causing the packet sniffing outcome to lack VLAN tag information (i.e.,
> TCI-TPID). As a result, the packet capturing tool may be unable to parse
> packets as expected.
>
> The TCI-TPID is missing because the prb_fill_vlan_info() function does not
> modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the
> payload and not in the sk_buff struct. The skb_vlan_tag_present() function
> only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header
> is stripped, preventing the packet capturing tool from determining the
> correct TCI-TPID value. Additionally, the protocol in SLL is incorrect,
> which means the packet capturing tool cannot parse the L3 header correctly.
>
This does not add much context over v1 of the patch. But at least a
pointer to context.
> [1] https://github.com/the-tcpdump-group/libpcap/issues/1105
Prefer Link: $URL
Please also add a Link to the conversation on patch 1:
Link: https://lore.kernel.org/netdev/[email protected]/T/#u
> Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
The referenced commit only introduces v3. The code changes to
tpacket_rcv and packet_recvmsg indicate that this goes back further.
Let's say to the introduction of explicitly passing VLAN information:
Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace")
> Cc: [email protected]
> Signed-off-by: Chengen Du <[email protected]>
> ---
> net/packet/af_packet.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index ea3ebc160e25..82b36e90d73b 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1011,6 +1011,10 @@ static void prb_fill_vlan_info(struct tpacket_kbdq_core *pkc,
> ppd->hv1.tp_vlan_tci = skb_vlan_tag_get(pkc->skb);
> ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->vlan_proto);
> ppd->tp_status = TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> + } else if (eth_type_vlan(pkc->skb->protocol)) {
> + ppd->hv1.tp_vlan_tci = ntohs(vlan_eth_hdr(pkc->skb)->h_vlan_TCI);
Careful about packet length. A malicious packet can be inserted that
is an Ethernet header with zero payload, but ETH_P_8021Q as h_proto.
See how __vlan_get_protocol carefully reads the headers.
> + ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->protocol);
> + ppd->tp_status = TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> } else {
> ppd->hv1.tp_vlan_tci = 0;
> ppd->hv1.tp_vlan_tpid = 0;
> @@ -2428,6 +2432,10 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> h.h2->tp_vlan_tci = skb_vlan_tag_get(skb);
> h.h2->tp_vlan_tpid = ntohs(skb->vlan_proto);
> status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> + } else if (eth_type_vlan(skb->protocol)) {
> + h.h2->tp_vlan_tci = ntohs(vlan_eth_hdr(skb)->h_vlan_TCI);
> + h.h2->tp_vlan_tpid = ntohs(skb->protocol);
> + status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> } else {
> h.h2->tp_vlan_tci = 0;
> h.h2->tp_vlan_tpid = 0;
> @@ -2457,7 +2465,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> sll->sll_halen = dev_parse_header(skb, sll->sll_addr);
> sll->sll_family = AF_PACKET;
> sll->sll_hatype = dev->type;
> - sll->sll_protocol = skb->protocol;
> + sll->sll_protocol = (skb->protocol == htons(ETH_P_8021Q)) ?
> + vlan_eth_hdr(skb)->h_vlan_encapsulated_proto : skb->protocol;
In SOCK_RAW mode, the VLAN tag will be present, so should be returned.
I'm concerned about returning a different value between SOCK_RAW and
SOCK_DGRAM. But don't immediately see a better option. And for
SOCK_DGRAM this approach is indistinguishable from the result on a
device with hardware offload, so is acceptable.
This test for ETH_P_8021Q ignores the QinQ stacked VLAN case. When
fixing VLAN encap, both variants should be addressed at the same time.
Note that ETH_P_8021AD is included in the eth_type_vlan test you call
above.
All these extra branches also makes the common case slower. Let's try
to mitigate that as much as possible.
> sll->sll_pkttype = skb->pkt_type;
> if (unlikely(packet_sock_flag(po, PACKET_SOCK_ORIGDEV)))
> sll->sll_ifindex = orig_dev->ifindex;
> @@ -3482,7 +3491,8 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
> /* Original length was stored in sockaddr_ll fields */
> origlen = PACKET_SKB_CB(skb)->sa.origlen;
> sll->sll_family = AF_PACKET;
> - sll->sll_protocol = skb->protocol;
> + sll->sll_protocol = (skb->protocol == htons(ETH_P_8021Q)) ?
> + vlan_eth_hdr(skb)->h_vlan_encapsulated_proto : skb->protocol;
> }
>
> sock_recv_cmsgs(msg, sk, skb);
> @@ -3539,6 +3549,10 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
> aux.tp_vlan_tci = skb_vlan_tag_get(skb);
> aux.tp_vlan_tpid = ntohs(skb->vlan_proto);
> aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> + } else if (eth_type_vlan(skb->protocol)) {
> + aux.tp_vlan_tci = ntohs(vlan_eth_hdr(skb)->h_vlan_TCI);
> + aux.tp_vlan_tpid = ntohs(skb->protocol);
> + aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID;
> } else {
> aux.tp_vlan_tci = 0;
> aux.tp_vlan_tpid = 0;
> --
> 2.40.1
>
On Mon, May 27, 2024 at 11:40 PM Chengen Du <[email protected]> wrote:
>
> Hi Willem,
>
> Thank you for your suggestions on the patch.
> However, there are some parts I am not familiar with, and I would appreciate more detailed information from your side.
Please respond with plain-text email. This message did not make it to
the list. Also no top posting.
https://docs.kernel.org/process/submitting-patches.html
https://subspace.kernel.org/etiquette.html
> > > @@ -2457,7 +2465,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> > > sll->sll_halen = dev_parse_header(skb, sll->sll_addr);
> > > sll->sll_family = AF_PACKET;
> > > sll->sll_hatype = dev->type;
> > > - sll->sll_protocol = skb->protocol;
> > > + sll->sll_protocol = (skb->protocol == htons(ETH_P_8021Q)) ?
> > > + vlan_eth_hdr(skb)->h_vlan_encapsulated_proto : skb->protocol;
> >
> > In SOCK_RAW mode, the VLAN tag will be present, so should be returned.
>
> Based on libpcap's handling, the SLL may not be used in SOCK_RAW mode.
The kernel fills in the sockaddr_ll fields in tpacket_rcv for both
SOCK_RAW and SOCK_DGRAM. Libpcap already can use both SOCK_RAW and
SOCK_DGRAM. And constructs the sll2_header pseudo header that tcpdump
sees itself, in pcap_handle_packet_mmap.
> Do you recommend evaluating the mode and maintaining the original logic in SOCK_RAW mode,
> or should we use the same logic for both SOCK_DGRAM and SOCK_RAW modes?
I suggest keeping as is for SOCK_RAW, as returning data that starts at
a VLAN header together with skb->protocol of ETH_P_IPV6 would be just
as confusing as the inverse that we do today on SOCK_DGRAM.
> >
> > I'm concerned about returning a different value between SOCK_RAW and
> > SOCK_DGRAM. But don't immediately see a better option. And for
> > SOCK_DGRAM this approach is indistinguishable from the result on a
> > device with hardware offload, so is acceptable.
> >
> > This test for ETH_P_8021Q ignores the QinQ stacked VLAN case. When
> > fixing VLAN encap, both variants should be addressed at the same time.
> > Note that ETH_P_8021AD is included in the eth_type_vlan test you call
> > above.
>
> In patch 1, the eth_type_vlan() function is used to determine if we need to set the sll_protocol to the VLAN-encapsulated protocol, which includes both ETH_P_8021Q and ETH_P_8021AD.
> You mentioned previously that we might want the true network protocol instead of the inner VLAN tag in the QinQ case (which means 802.1ad?).
> I believe I may have misunderstood your point.
I mean that if SOCK_DGRAM strips all VLAN headers to return the data
from the start of the true network header, then skb->protocol should
return that network protocol.
With vlan stacking, your patch currently returns ETH_P_8021Q.
See the packet formats in
https://en.wikipedia.org/wiki/IEEE_802.1ad#Frame_format if you're
confused about how stacking works.
> Could you please confirm if both ETH_P_8021Q and ETH_P_8021AD should use the VLAN-encapsulated protocol when VLAN hardware offloading is unavailable?
> Or are there other aspects that this judgment does not handle correctly?