2012-11-09 02:10:33

by Joseph Gasparakis

[permalink] [raw]
Subject: [PATCH net-next 0/3 ] tunneling: Add support for hardware-offloaded encapsulation

The series contains updates to add in the NIC Rx and Tx checksumming support
for encapsulated packets.

The sk_buff needs to somehow have information of the inner packet, and adding
three fields for the inner mac, network and transport headers was the prefered
approach.

Not adding these fields would mean that the drivers would need to parse the
sk_buff data in hot-path, having a negative impact in the performance.

Adding in sk_buff a pointer to the skbuff of the inner packet made sense, but
would be a complicated change as assumptions needed to be made with regards to
helper functions such as skb_clone() skb_copy(). Also code for the existing
encapsulation protocols (such as VXLAN and IP GRE) had to be reworked, so the
decision was to have the simple approach of adding these three fields.


2012-11-09 02:10:35

by Joseph Gasparakis

[permalink] [raw]
Subject: [PATCH 2/3] vxlan: capture inner headers during encapsulation

Populating the inner header pointers of skb for vxlan

Signed-off-by: Joseph Gasparakis <[email protected]>
Signed-off-by: Peter P Waskiewicz Jr <[email protected]>
---
drivers/net/vxlan.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 030559d..14e6c8f 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -694,11 +694,23 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
__be16 df = 0;
__u8 tos, ttl;
int err;
+ unsigned int offset;

dst = vxlan_find_dst(vxlan, skb);
if (!dst)
goto drop;

+ offset = skb->data - skb->head;
+
+ skb_reset_inner_mac_header(skb);
+
+ if (skb->network_header)
+ skb_set_inner_network_header(skb, skb->network_header - offset);
+
+ if (skb->transport_header)
+ skb_set_inner_transport_header(skb, skb->transport_header -
+ offset);
+
/* Need space for new headers (invalidates iph ptr) */
if (skb_cow_head(skb, VXLAN_HEADROOM))
goto drop;
--
1.7.11.7

2012-11-09 02:10:59

by Joseph Gasparakis

[permalink] [raw]
Subject: [PATCH 1/3] net: Add support for hardware-offloaded encapsulation

This patch adds support in the kernel for offloading in the NIC Tx and Rx checksumming for encapsulated packets (such as VXLAN and IP GRE)

Signed-off-by: Joseph Gasparakis <[email protected]>
Signed-off-by: Peter P Waskiewicz Jr <[email protected]>
---
Documentation/networking/netdev-features.txt | 10 +++
include/linux/if_ether.h | 5 ++
include/linux/ip.h | 5 ++
include/linux/netdev_features.h | 3 +
include/linux/skbuff.h | 114 +++++++++++++++++++++++++++
include/linux/udp.h | 5 ++
net/core/ethtool.c | 2 +
net/core/skbuff.c | 17 ++++
8 files changed, 161 insertions(+)

diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt
index 4164f5c..82695c0 100644
--- a/Documentation/networking/netdev-features.txt
+++ b/Documentation/networking/netdev-features.txt
@@ -165,3 +165,13 @@ This requests that the NIC receive all possible frames, including errored
frames (such as bad FCS, etc). This can be helpful when sniffing a link with
bad packets on it. Some NICs may receive more packets if also put into normal
PROMISC mdoe.
+
+* tx-enc-checksum-offload
+
+This feature implies that the NIC will be able to calculate the Tx checksums
+for both inner and outer packets in the case of vxlan and ipgre encapsulation.
+
+* rx-enc-checksum-offload
+
+This feature implies that the NIC will be able to verify the Rx checksums
+for both inner and outer packets in the case of vxlan and ipgre encapsulation.
diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h
index 12b4d55..195376b 100644
--- a/include/linux/if_ether.h
+++ b/include/linux/if_ether.h
@@ -28,6 +28,11 @@ static inline struct ethhdr *eth_hdr(const struct sk_buff *skb)
return (struct ethhdr *)skb_mac_header(skb);
}

+static inline struct ethhdr *eth_inner_hdr(const struct sk_buff *skb)
+{
+ return (struct ethhdr *)skb_inner_mac_header(skb);
+}
+
int eth_header_parse(const struct sk_buff *skb, unsigned char *haddr);

int mac_pton(const char *s, u8 *mac);
diff --git a/include/linux/ip.h b/include/linux/ip.h
index 58b82a2..e084de7 100644
--- a/include/linux/ip.h
+++ b/include/linux/ip.h
@@ -25,6 +25,11 @@ static inline struct iphdr *ip_hdr(const struct sk_buff *skb)
return (struct iphdr *)skb_network_header(skb);
}

+static inline struct iphdr *ip_inner_hdr(const struct sk_buff *skb)
+{
+ return (struct iphdr *)skb_inner_network_header(skb);
+}
+
static inline struct iphdr *ipip_hdr(const struct sk_buff *skb)
{
return (struct iphdr *)skb_transport_header(skb);
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 5ac3212..6dd59a5 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -19,6 +19,7 @@ enum {
NETIF_F_IP_CSUM_BIT, /* Can checksum TCP/UDP over IPv4. */
__UNUSED_NETIF_F_1,
NETIF_F_HW_CSUM_BIT, /* Can checksum all the packets. */
+ NETIF_F_HW_CSUM_ENC_BIT, /* Can checksum all inner headers */
NETIF_F_IPV6_CSUM_BIT, /* Can checksum TCP/UDP over IPV6 */
NETIF_F_HIGHDMA_BIT, /* Can DMA to high memory. */
NETIF_F_FRAGLIST_BIT, /* Scatter/gather IO. */
@@ -52,6 +53,8 @@ enum {
NETIF_F_NTUPLE_BIT, /* N-tuple filters supported */
NETIF_F_RXHASH_BIT, /* Receive hashing offload */
NETIF_F_RXCSUM_BIT, /* Receive checksumming offload */
+ NETIF_F_RXCSUM_ENC_BIT, /* Receive checksuming offload */
+ /* for encapsulation */
NETIF_F_NOCACHE_COPY_BIT, /* Use no-cache copyfromuser */
NETIF_F_LOOPBACK_BIT, /* Enable loopback */
NETIF_F_RXFCS_BIT, /* Append FCS to skb pkt data */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f2af494..4b9b50b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -379,6 +379,9 @@ typedef unsigned char *sk_buff_data_t;
* @transport_header: Transport layer header
* @network_header: Network layer header
* @mac_header: Link layer header
+ * @inner_transport_header: Inner transport layer header (encapsulation)
+ * @inner_network_header: Network layer header (encapsulation)
+ * @inner_mac_header: Link layer header (encapsulation)
* @tail: Tail pointer
* @end: End pointer
* @head: Head of buffer
@@ -489,6 +492,9 @@ struct sk_buff {
sk_buff_data_t transport_header;
sk_buff_data_t network_header;
sk_buff_data_t mac_header;
+ sk_buff_data_t inner_transport_header;
+ sk_buff_data_t inner_network_header;
+ sk_buff_data_t inner_mac_header;
/* These elements must be at the end, see alloc_skb() for details. */
sk_buff_data_t tail;
sk_buff_data_t end;
@@ -1441,6 +1447,63 @@ static inline void skb_reset_mac_len(struct sk_buff *skb)
}

#ifdef NET_SKBUFF_DATA_USES_OFFSET
+static inline unsigned char *skb_inner_transport_header(const struct sk_buff
+ *skb)
+{
+ return skb->head + skb->inner_transport_header;
+}
+
+static inline void skb_reset_inner_transport_header(struct sk_buff *skb)
+{
+ skb->inner_transport_header = skb->data - skb->head;
+}
+
+static inline void skb_set_inner_transport_header(struct sk_buff *skb,
+ const int offset)
+{
+ skb_reset_inner_transport_header(skb);
+ skb->inner_transport_header += offset;
+}
+
+static inline unsigned char *skb_inner_network_header(const struct sk_buff *skb)
+{
+ return skb->head + skb->inner_network_header;
+}
+
+static inline void skb_reset_inner_network_header(struct sk_buff *skb)
+{
+ skb->inner_network_header = skb->data - skb->head;
+}
+
+static inline void skb_set_inner_network_header(struct sk_buff *skb,
+ const int offset)
+{
+ skb_reset_inner_network_header(skb);
+ skb->inner_network_header += offset;
+}
+
+static inline unsigned char *skb_inner_mac_header(const struct sk_buff *skb)
+{
+ return skb->head + skb->inner_mac_header;
+}
+
+static inline int skb_inner_mac_header_was_set(const struct sk_buff *skb)
+{
+ return skb->inner_mac_header != ~0U;
+}
+
+static inline void skb_reset_inner_mac_header(struct sk_buff *skb)
+{
+ skb->inner_mac_header = skb->data - skb->head;
+}
+
+static inline void skb_set_inner_mac_header(struct sk_buff *skb,
+ const int offset)
+{
+ skb_reset_inner_mac_header(skb);
+ skb->inner_mac_header += offset;
+}
+
static inline unsigned char *skb_transport_header(const struct sk_buff *skb)
{
return skb->head + skb->transport_header;
@@ -1496,7 +1559,58 @@ static inline void skb_set_mac_header(struct sk_buff *skb, const int offset)
}

#else /* NET_SKBUFF_DATA_USES_OFFSET */
+static inline unsigned char *skb_inner_transport_header(const struct sk_buff
+ *skb)
+{
+ return skb->inner_transport_header;
+}
+
+static inline void skb_reset_inner_transport_header(struct sk_buff *skb)
+{
+ skb->inner_transport_header = skb->data;
+}
+
+static inline void skb_set_inner_transport_header(struct sk_buff *skb,
+ const int offset)
+{
+ skb->inner_transport_header = skb->data + offset;
+}
+
+static inline unsigned char *skb_inner_network_header(const struct sk_buff *skb)
+{
+ return skb->inner_network_header;
+}
+
+static inline void skb_reset_inner_network_header(struct sk_buff *skb)
+{
+ skb->inner_network_header = skb->data;
+}
+
+static inline void skb_set_inner_network_header(struct sk_buff *skb,
+ const int offset)
+{
+ skb->inner_network_header = skb->data + offset;
+}
+
+static inline unsigned char *skb_inner_mac_header(const struct sk_buff *skb)
+{
+ return skb->inner_mac_header;
+}
+
+static inline int skb_inner_mac_header_was_set(const struct sk_buff *skb)
+{
+ return skb->inner_mac_header != NULL;
+}

+static inline void skb_reset_mac_header(struct sk_buff *skb)
+{
+ skb->inner_mac_header = skb->data;
+}
+
+static inline void skb_set_mac_header(struct sk_buff *skb, const int offset)
+{
+ skb->inner_mac_header = skb->data + offset;
+}
static inline unsigned char *skb_transport_header(const struct sk_buff *skb)
{
return skb->transport_header;
diff --git a/include/linux/udp.h b/include/linux/udp.h
index 0b67d77..bd49c56 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -27,6 +27,11 @@ static inline struct udphdr *udp_hdr(const struct sk_buff *skb)
return (struct udphdr *)skb_transport_header(skb);
}

+static inline struct udphdr *udp_inner_hdr(const struct sk_buff *skb)
+{
+ return (struct udphdr *)skb_inner_transport_header(skb);
+}
+
#define UDP_HTABLE_SIZE_MIN (CONFIG_BASE_SMALL ? 128 : 256)

static inline int udp_hashfn(struct net *net, unsigned num, unsigned mask)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 4d64cc2..11f928d 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -58,6 +58,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
[NETIF_F_IP_CSUM_BIT] = "tx-checksum-ipv4",
[NETIF_F_HW_CSUM_BIT] = "tx-checksum-ip-generic",
[NETIF_F_IPV6_CSUM_BIT] = "tx-checksum-ipv6",
+ [NETIF_F_HW_CSUM_ENC_BIT] = "tx-checksum-enc-offload",
[NETIF_F_HIGHDMA_BIT] = "highdma",
[NETIF_F_FRAGLIST_BIT] = "tx-scatter-gather-fraglist",
[NETIF_F_HW_VLAN_TX_BIT] = "tx-vlan-hw-insert",
@@ -84,6 +85,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
[NETIF_F_NTUPLE_BIT] = "rx-ntuple-filter",
[NETIF_F_RXHASH_BIT] = "rx-hashing",
[NETIF_F_RXCSUM_BIT] = "rx-checksum",
+ [NETIF_F_RXCSUM_ENC_BIT] = "rx-enc-checksum-offload",
[NETIF_F_NOCACHE_COPY_BIT] = "tx-nocache-copy",
[NETIF_F_LOOPBACK_BIT] = "loopback",
[NETIF_F_RXFCS_BIT] = "rx-fcs",
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d9addea..4be312b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -259,6 +259,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
skb->end = skb->tail + size;
#ifdef NET_SKBUFF_DATA_USES_OFFSET
skb->mac_header = ~0U;
+ skb->inner_mac_header = ~0U;
#endif

/* make sure we initialize shinfo sequentially */
@@ -327,6 +328,7 @@ struct sk_buff *build_skb(void *data, unsigned int frag_size)
skb->end = skb->tail + size;
#ifdef NET_SKBUFF_DATA_USES_OFFSET
skb->mac_header = ~0U;
+ skb->inner_mac_header = ~0U;
#endif

/* make sure we initialize shinfo sequentially */
@@ -682,6 +684,9 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
new->transport_header = old->transport_header;
new->network_header = old->network_header;
new->mac_header = old->mac_header;
+ new->inner_transport_header = old->inner_transport_header;
+ new->inner_network_header = old->inner_transport_header;
+ new->inner_mac_header = old->inner_mac_header;
skb_dst_copy(new, old);
new->rxhash = old->rxhash;
new->ooo_okay = old->ooo_okay;
@@ -892,6 +897,10 @@ static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
new->network_header += offset;
if (skb_mac_header_was_set(new))
new->mac_header += offset;
+ new->inner_transport_header += offset;
+ new->inner_network_header += offset;
+ if (skb_inner_mac_header_was_set(new))
+ new->inner_mac_header += offset;
#endif
skb_shinfo(new)->gso_size = skb_shinfo(old)->gso_size;
skb_shinfo(new)->gso_segs = skb_shinfo(old)->gso_segs;
@@ -1089,6 +1098,10 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
skb->network_header += off;
if (skb_mac_header_was_set(skb))
skb->mac_header += off;
+ skb->inner_transport_header += off;
+ skb->inner_network_header += off;
+ if (skb_inner_mac_header_was_set(skb))
+ skb->inner_mac_header += off;
/* Only adjust this if it actually is csum_start rather than csum */
if (skb->ip_summed == CHECKSUM_PARTIAL)
skb->csum_start += nhead;
@@ -1188,6 +1201,10 @@ struct sk_buff *skb_copy_expand(const struct sk_buff *skb,
n->network_header += off;
if (skb_mac_header_was_set(skb))
n->mac_header += off;
+ n->inner_transport_header += off;
+ n->inner_network_header += off;
+ if (skb_inner_mac_header_was_set(skb))
+ n->inner_mac_header += off;
#endif

return n;
--
1.7.11.7

2012-11-09 02:11:36

by Joseph Gasparakis

[permalink] [raw]
Subject: [PATCH 3/3] ipgre: capture inner headers during encapsulation

Populating the inner header pointers of skb for ipgre
This patch has been compile-tested only.

Signed-off-by: Joseph Gasparakis <[email protected]>
Signed-off-by: Peter P Waskiewicz Jr <[email protected]>
---
net/ipv4/ip_gre.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 7240f8e..ec3ebb1 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -766,6 +766,7 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
int gre_hlen;
__be32 dst;
int mtu;
+ unsigned int offset;

if (skb->ip_summed == CHECKSUM_PARTIAL &&
skb_checksum_help(skb))
@@ -902,6 +903,17 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
tunnel->err_count = 0;
}

+ offset = skb->data - skb->head;
+
+ skb_reset_inner_mac_header(skb);
+
+ if (skb->network_header)
+ skb_set_inner_network_header(skb, skb->network_header - offset);
+
+ if (skb->transport_header)
+ skb_set_inner_transport_header(skb, skb->transport_header -
+ offset);
+
max_headroom = LL_RESERVED_SPACE(tdev) + gre_hlen + rt->dst.header_len;

if (skb_headroom(skb) < max_headroom || skb_shared(skb)||
--
1.7.11.7

2012-11-11 15:18:44

by Dmitry Kravkov

[permalink] [raw]
Subject: Re: [PATCH 3/3] ipgre: capture inner headers during encapsulation

On Thu, 2012-11-08 at 18:18 -0800, Joseph Gasparakis wrote:
>
> if (skb->ip_summed == CHECKSUM_PARTIAL &&
> skb_checksum_help(skb))
> @@ -902,6 +903,17 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
> tunnel->err_count = 0;
> }
>
> + offset = skb->data - skb->head;
> +
> + skb_reset_inner_mac_header(skb);
> +
> + if (skb->network_header)
> + skb_set_inner_network_header(skb, skb->network_header - offset);
> +
> + if (skb->transport_header)
> + skb_set_inner_transport_header(skb, skb->transport_header -
> + offset);
> +
> max_headroom = LL_RESERVED_SPACE(tdev) + gre_hlen + rt->dst.header_len;
>
> if (skb_headroom(skb) < max_headroom || skb_shared(skb)||

How it will be useful if skb_checksum_help(skb) will calculate csum? And
leaves nothing to offload


2012-11-12 01:40:10

by Joseph Gasparakis

[permalink] [raw]
Subject: Re: [PATCH 3/3] ipgre: capture inner headers during encapsulation



On Sun, 11 Nov 2012, Dmitry Kravkov wrote:

> On Thu, 2012-11-08 at 18:18 -0800, Joseph Gasparakis wrote:
> >
> > if (skb->ip_summed == CHECKSUM_PARTIAL &&
> > skb_checksum_help(skb))
> > @@ -902,6 +903,17 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
> > tunnel->err_count = 0;
> > }
> >
> > + offset = skb->data - skb->head;
> > +
> > + skb_reset_inner_mac_header(skb);
> > +
> > + if (skb->network_header)
> > + skb_set_inner_network_header(skb, skb->network_header - offset);
> > +
> > + if (skb->transport_header)
> > + skb_set_inner_transport_header(skb, skb->transport_header -
> > + offset);
> > +
> > max_headroom = LL_RESERVED_SPACE(tdev) + gre_hlen + rt->dst.header_len;
> >
> > if (skb_headroom(skb) < max_headroom || skb_shared(skb)||
>
> How it will be useful if skb_checksum_help(skb) will calculate csum? And
> leaves nothing to offload
>
Thanks for catching this Dmitry. Will fix it in v2

2012-11-14 12:24:29

by Dmitry Kravkov

[permalink] [raw]
Subject: RE: [PATCH 1/3] net: Add support for hardware-offloaded encapsulation

> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> On Behalf Of Joseph Gasparakis
> Sent: Friday, November 09, 2012 4:18 AM
> To: [email protected]; [email protected]; [email protected]
> Cc: Joseph Gasparakis; [email protected]; [email protected];
> Peter P Waskiewicz Jr
> Subject: [PATCH 1/3] net: Add support for hardware-offloaded encapsulation
>
> This patch adds support in the kernel for offloading in the NIC Tx and Rx
> checksumming for encapsulated packets (such as VXLAN and IP GRE)
>
> Signed-off-by: Joseph Gasparakis <[email protected]>
> Signed-off-by: Peter P Waskiewicz Jr <[email protected]>
[D.K.]

> NETIF_F_HW_CSUM_BIT, /* Can checksum all the packets.
> */
> + NETIF_F_HW_CSUM_ENC_BIT, /* Can checksum all inner headers */
> NETIF_F_IPV6_CSUM_BIT, /* Can checksum TCP/UDP over
> IPV6 */

Also #define NETIF_F_HW_CSUM_ENC should be added