IPv6/TCP and GRO stacks can build big TCP packets with an added
temporary Hop By Hop header.
Is GSO is not involved, then the temporary header needs to be removed in
the driver. This patch provides a generic helper for drivers that need
to modify their headers in place.
Signed-off-by: Coco Li <[email protected]>
---
include/net/ipv6.h | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index d383c895592a..a11d58c85c05 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -500,6 +500,39 @@ static inline int ipv6_has_hopopt_jumbo(const struct sk_buff *skb)
return jhdr->nexthdr;
}
+/* Return 0 if HBH header is successfully removed
+ * Or if HBH removal is unnecessary (packet is not big TCP)
+ * Return error to indicate dropping the packet
+ */
+static inline int ipv6_hopopt_jumbo_remove(struct sk_buff *skb)
+{
+ const int hophdr_len = sizeof(struct hop_jumbo_hdr);
+ int nexthdr = ipv6_has_hopopt_jumbo(skb);
+ struct ipv6hdr *h6;
+
+ if (!nexthdr)
+ return 0;
+
+ if (skb_cow_head(skb, 0))
+ return -1;
+
+ /* Remove the HBH header.
+ * Layout: [Ethernet header][IPv6 header][HBH][L4 Header]
+ */
+ memmove(skb->data + hophdr_len,
+ skb->data,
+ ETH_HLEN + sizeof(struct ipv6hdr));
+
+ skb->data += hophdr_len;
+ skb->len -= hophdr_len;
+ skb->network_header += hophdr_len;
+
+ h6 = ipv6_hdr(skb);
+ h6->nexthdr = nexthdr;
+
+ return 0;
+}
+
static inline bool ipv6_accept_ra(struct inet6_dev *idev)
{
/* If forwarding is enabled, RA are not accepted unless the special
--
2.38.1.584.g0f3c55d4c2-goog
Eric Dumazet implemented Big TCP that allowed bigger TSO/GRO packet sizes
for IPv6 traffic. See patch series:
'commit 89527be8d8d6 ("net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes")'
This reduces the number of packets traversing the networking stack and
should usually improves performance. However, it also inserts a
temporary Hop-by-hop IPv6 extension header.
Using the HBH header removal method in the previous path, the extra header
be removed in bnxt drivers to allow it to send big TCP packets (bigger
TSO packets) as well.
If bnxt folks could help with testing this patch on the driver (as I
don't have access to one) that would be wonderful. Thank you!
Tested:
Compiled locally
To further test functional correctness, update the GSO/GRO limit on the
physical NIC:
ip link set eth0 gso_max_size 181000
ip link set eth0 gro_max_size 181000
Note that if there are bonding or ipvan devices on top of the physical
NIC, their GSO sizes need to be updated as well.
Then, IPv6/TCP packets with sizes larger than 64k can be observed.
Signed-off-by: Coco Li <[email protected]>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 0fe164b42c5d..2bfa5e9fb179 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -389,6 +389,9 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
return NETDEV_TX_BUSY;
}
+ if (unlikely(ipv6_hopopt_jumbo_remove(skb)))
+ goto tx_free;
+
length = skb->len;
len = skb_headlen(skb);
last_frag = skb_shinfo(skb)->nr_frags;
@@ -13657,6 +13660,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
dev->features &= ~NETIF_F_LRO;
dev->priv_flags |= IFF_UNICAST_FLT;
+ netif_set_tso_max_size(dev, GSO_MAX_SIZE);
#ifdef CONFIG_BNXT_SRIOV
init_waitqueue_head(&bp->sriov_cfg_wait);
#endif
--
2.38.1.584.g0f3c55d4c2-goog
From: Coco Li <[email protected]>
Date: Tue, 22 Nov 2022 15:27:39 -0800
> IPv6/TCP and GRO stacks can build big TCP packets with an added
> temporary Hop By Hop header.
>
> Is GSO is not involved, then the temporary header needs to be removed in
> the driver. This patch provides a generic helper for drivers that need
> to modify their headers in place.
>
> Signed-off-by: Coco Li <[email protected]>
> ---
> include/net/ipv6.h | 33 +++++++++++++++++++++++++++++++++
> 1 file changed, 33 insertions(+)
>
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index d383c895592a..a11d58c85c05 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -500,6 +500,39 @@ static inline int ipv6_has_hopopt_jumbo(const struct sk_buff *skb)
> return jhdr->nexthdr;
> }
>
> +/* Return 0 if HBH header is successfully removed
> + * Or if HBH removal is unnecessary (packet is not big TCP)
> + * Return error to indicate dropping the packet
> + */
> +static inline int ipv6_hopopt_jumbo_remove(struct sk_buff *skb)
> +{
> + const int hophdr_len = sizeof(struct hop_jumbo_hdr);
> + int nexthdr = ipv6_has_hopopt_jumbo(skb);
> + struct ipv6hdr *h6;
> +
> + if (!nexthdr)
> + return 0;
> +
> + if (skb_cow_head(skb, 0))
> + return -1;
err = skb_cow_head(skb, 0);
if (err)
return err;
Alternatively, if you want to keep it simple, make the function bool
and return false on `if (skb_cow_head(skb, 0)` and true otherwise.
> +
> + /* Remove the HBH header.
> + * Layout: [Ethernet header][IPv6 header][HBH][L4 Header]
> + */
> + memmove(skb->data + hophdr_len,
> + skb->data,
This can fit into the previous line.
> + ETH_HLEN + sizeof(struct ipv6hdr));
Not correct at this point. I assume you took the implementation from
ip6_offload.c[0], but ::gso_segment() and ::ndo_start_xmit() are two
different entry points. Here you may have not only Eth header, but
also VLAN, MPLS and whatnot.
Correct way would be:
memmove(skb_mac_header(skb) + hophdr_len, skb_mac_header(skb),
ipv6_hdr(skb) - skb_mac_header(skb) +
sizeof(struct ipv6hdr));
> +
> + skb->data += hophdr_len;
> + skb->len -= hophdr_len;
> + skb->network_header += hophdr_len;
skb->mac_header also needs to be adjusted, the fact that it's equal
to skb->data at the entry of ::ndo_start_xmit() doesn't mean
anything.
> +
> + h6 = ipv6_hdr(skb);
> + h6->nexthdr = nexthdr;
> +
> + return 0;
> +}
Please switch all the places where the same logics is used to your
new helper.
> +
> static inline bool ipv6_accept_ra(struct inet6_dev *idev)
> {
> /* If forwarding is enabled, RA are not accepted unless the special
> --
> 2.38.1.584.g0f3c55d4c2-goog
Thanks,
Olek
From: Coco Li <[email protected]>
Date: Tue, 22 Nov 2022 15:27:40 -0800
> Eric Dumazet implemented Big TCP that allowed bigger TSO/GRO packet sizes
> for IPv6 traffic. See patch series:
> 'commit 89527be8d8d6 ("net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes")'
>
> This reduces the number of packets traversing the networking stack and
> should usually improves performance. However, it also inserts a
> temporary Hop-by-hop IPv6 extension header.
>
> Using the HBH header removal method in the previous path, the extra header
> be removed in bnxt drivers to allow it to send big TCP packets (bigger
> TSO packets) as well.
>
> If bnxt folks could help with testing this patch on the driver (as I
> don't have access to one) that would be wonderful. Thank you!
>
> Tested:
> Compiled locally
Please mark "potential" patches with 'RFC'. Then, if/when you get a
'Tested-by:', you can spin a "true" v1.
>
> To further test functional correctness, update the GSO/GRO limit on the
> physical NIC:
>
> ip link set eth0 gso_max_size 181000
> ip link set eth0 gro_max_size 181000
>
> Note that if there are bonding or ipvan devices on top of the physical
> NIC, their GSO sizes need to be updated as well.
>
> Then, IPv6/TCP packets with sizes larger than 64k can be observed.
>
> Signed-off-by: Coco Li <[email protected]>
> ---
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 0fe164b42c5d..2bfa5e9fb179 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -389,6 +389,9 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> return NETDEV_TX_BUSY;
> }
>
> + if (unlikely(ipv6_hopopt_jumbo_remove(skb)))
> + goto tx_free;
> +
> length = skb->len;
> len = skb_headlen(skb);
> last_frag = skb_shinfo(skb)->nr_frags;
> @@ -13657,6 +13660,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> dev->features &= ~NETIF_F_LRO;
> dev->priv_flags |= IFF_UNICAST_FLT;
>
> + netif_set_tso_max_size(dev, GSO_MAX_SIZE);
> #ifdef CONFIG_BNXT_SRIOV
> init_waitqueue_head(&bp->sriov_cfg_wait);
> #endif
> --
> 2.38.1.584.g0f3c55d4c2-goog
Thanks,
Olek
From: Alexander Lobakin <[email protected]>
Date: Wed, 23 Nov 2022 17:38:25 +0100
> From: Coco Li <[email protected]>
> Date: Tue, 22 Nov 2022 15:27:39 -0800
>
> > IPv6/TCP and GRO stacks can build big TCP packets with an added
> > temporary Hop By Hop header.
> >
> > Is GSO is not involved, then the temporary header needs to be removed in
> > the driver. This patch provides a generic helper for drivers that need
> > to modify their headers in place.
> >
> > Signed-off-by: Coco Li <[email protected]>
> > ---
> > include/net/ipv6.h | 33 +++++++++++++++++++++++++++++++++
> > 1 file changed, 33 insertions(+)
> >
> > diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> > index d383c895592a..a11d58c85c05 100644
> > --- a/include/net/ipv6.h
> > +++ b/include/net/ipv6.h
> > @@ -500,6 +500,39 @@ static inline int ipv6_has_hopopt_jumbo(const struct sk_buff *skb)
> > return jhdr->nexthdr;
> > }
> >
> > +/* Return 0 if HBH header is successfully removed
> > + * Or if HBH removal is unnecessary (packet is not big TCP)
> > + * Return error to indicate dropping the packet
> > + */
> > +static inline int ipv6_hopopt_jumbo_remove(struct sk_buff *skb)
> > +{
> > + const int hophdr_len = sizeof(struct hop_jumbo_hdr);
> > + int nexthdr = ipv6_has_hopopt_jumbo(skb);
> > + struct ipv6hdr *h6;
> > +
> > + if (!nexthdr)
> > + return 0;
> > +
> > + if (skb_cow_head(skb, 0))
> > + return -1;
>
> err = skb_cow_head(skb, 0);
> if (err)
> return err;
>
> Alternatively, if you want to keep it simple, make the function bool
> and return false on `if (skb_cow_head(skb, 0)` and true otherwise.
>
> > +
> > + /* Remove the HBH header.
> > + * Layout: [Ethernet header][IPv6 header][HBH][L4 Header]
> > + */
> > + memmove(skb->data + hophdr_len,
> > + skb->data,
>
> This can fit into the previous line.
>
> > + ETH_HLEN + sizeof(struct ipv6hdr));
>
> Not correct at this point. I assume you took the implementation from
> ip6_offload.c[0], but ::gso_segment() and ::ndo_start_xmit() are two
Traditionally forgot to paste the links to the end of the mail. Pls
look at the end of this one for them (if I don't forget to paste
them again :clownface:).
> different entry points. Here you may have not only Eth header, but
> also VLAN, MPLS and whatnot.
> Correct way would be:
>
> memmove(skb_mac_header(skb) + hophdr_len, skb_mac_header(skb),
> ipv6_hdr(skb) - skb_mac_header(skb) +
> sizeof(struct ipv6hdr));
>
> > +
> > + skb->data += hophdr_len;
> > + skb->len -= hophdr_len;
> > + skb->network_header += hophdr_len;
>
> skb->mac_header also needs to be adjusted, the fact that it's equal
> to skb->data at the entry of ::ndo_start_xmit() doesn't mean
> anything.
Also, while I'm here: you should use skb_may_pull() +
{,__}skb_pull() here instead of manual maths. ::network_header and
::mac_header still need to be adjusted manually tho.
>
> > +
> > + h6 = ipv6_hdr(skb);
> > + h6->nexthdr = nexthdr;
> > +
> > + return 0;
> > +}
>
> Please switch all the places where the same logics is used to your
> new helper.
>
> > +
> > static inline bool ipv6_accept_ra(struct inet6_dev *idev)
> > {
> > /* If forwarding is enabled, RA are not accepted unless the special
> > --
> > 2.38.1.584.g0f3c55d4c2-goog
>
> Thanks,
> Olek
[0] https://elixir.bootlin.com/linux/v6.1-rc6/source/net/ipv6/ip6_offload.c#L92
Thanks,
Olek
On Wed, Nov 23, 2022 at 05:41:59PM +0100, Alexander Lobakin wrote:
> From: Coco Li <[email protected]>
> Date: Tue, 22 Nov 2022 15:27:40 -0800
>
> > Eric Dumazet implemented Big TCP that allowed bigger TSO/GRO packet sizes
> > for IPv6 traffic. See patch series:
> > 'commit 89527be8d8d6 ("net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes")'
> >
> > This reduces the number of packets traversing the networking stack and
> > should usually improves performance. However, it also inserts a
> > temporary Hop-by-hop IPv6 extension header.
> >
> > Using the HBH header removal method in the previous path, the extra header
> > be removed in bnxt drivers to allow it to send big TCP packets (bigger
> > TSO packets) as well.
> >
> > If bnxt folks could help with testing this patch on the driver (as I
> > don't have access to one) that would be wonderful. Thank you!
> >
> > Tested:
> > Compiled locally
>
> Please mark "potential" patches with 'RFC'. Then, if/when you get a
> 'Tested-by:', you can spin a "true" v1.
We are getting ton of patches which are "compiled-only".
I won't be such strict with them as long as they stated clearly about it.
Thanks