Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161546AbaDQDf6 (ORCPT ); Wed, 16 Apr 2014 23:35:58 -0400 Received: from mail-qc0-f175.google.com ([209.85.216.175]:65294 "EHLO mail-qc0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756358AbaDQDfz (ORCPT ); Wed, 16 Apr 2014 23:35:55 -0400 Message-ID: <534F4C1E.1000006@gmail.com> Date: Thu, 17 Apr 2014 11:35:58 +0800 From: zhuyj User-Agent: Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: "David S. Miller" , netdev@vger.kernel.org, joe@perches.com, julia.lawall@lip6.fr, dingtianhong@huawei.com, linux-kernel@vger.kernel.org, jasowang@redhat.com, mst@redhat.com, Willy Tarreau , "Yang, Zhangle (Eric)" , "Wu, Kuaikuai" , "Tao, Yue" , zhuyj Subject: in kernel 2.6.x, tun/tap nic supports vlan packets Content-Type: multipart/mixed; boundary="------------050007040902000302070508" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a multi-part message in MIME format. --------------050007040902000302070508 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi, all In kernel 2.6.x, linux depends on nic vlan hardware acceleration to insert/extract vlan tag. In this scene, in kernel 2.6.x _____ ________ A | | B | | C vlan packets-->| tap |----->|vlan nic|---> |_____| |________| We hope vlan packets pass through tap and vlan nic from A to c. But in kernel 2.6.x, linux kernel can not extract vlan tag. It depends on nic vlan hardware acceleration. It is well known that tap nic has no vlan acceleration. So in the above scene, vlan packets can not be handled by tap nic. These vlan packets will be discarded in B. They can not arrive at C. In kernel 3.x, linux can handle vlan packets. It does not depend on nic vlan hardware acceleration. So the above scene can work well in kernel 3.x. To resolve the above in kernel 2.6.x, we simulated vlan hardware acceleration in tun/tap driver. Then followed the logic of commit commit 4fba4ca4 [vlan: Centralize handling of hardware acceleration] to modify the vlan packets process in kernel 2.6.x. In the end, the above scene can work well in patched kernel 2.6.x. Please comment on it. Any reply is appreciated. Hi, Willy These 2 patches are for linux2.6.x. These can work well here. Please help to merge linux 2.6.32.x. Thanks a lot. Best Regards! Zhu Yanjun --------------050007040902000302070508 Content-Type: text/x-patch; name="0001-tun-tap-add-the-feature-of-vlan-rx-extraction.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-tun-tap-add-the-feature-of-vlan-rx-extraction.patch" >From 66db0748fc0f932496100789eb319ca5884c0694 Mon Sep 17 00:00:00 2001 From: Zhu Yanjun Date: Wed, 16 Apr 2014 18:19:42 +0800 Subject: [PATCH 1/2] tun/tap: add the feature of vlan rx extraction Tap is a virtual net device that has no vlan rx untag feature. So this virtual device can not send/receive vlan packets in kernel 2.6.x. To make this device support vlan send/receive vlan packets in kernel 2.6.x, a vlan rx extraction feature is simulated in its driver. Signed-off-by: Zhu Yanjun --- drivers/net/tun.c | 118 ++++++++++++++++++++++++++++++++++++++++++++- include/linux/netdevice.h | 1 + net/core/dev.c | 13 +++++ 3 files changed, 131 insertions(+), 1 deletion(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 894ad84..029e6cf 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -69,6 +69,8 @@ #include #include +#include + /* Uncomment to enable debugging */ /* #define TUN_DEBUG 1 */ @@ -426,6 +428,8 @@ static const struct net_device_ops tun_netdev_ops = { .ndo_change_mtu = tun_net_change_mtu, }; +static void tap_vlan_rx_register(struct net_device *dev, struct vlan_group *grp); + static const struct net_device_ops tap_netdev_ops = { .ndo_uninit = tun_net_uninit, .ndo_open = tun_net_open, @@ -435,6 +439,7 @@ static const struct net_device_ops tap_netdev_ops = { .ndo_set_multicast_list = tun_net_mclist, .ndo_set_mac_address = eth_mac_addr, .ndo_validate_addr = eth_validate_addr, + .ndo_vlan_rx_register = tap_vlan_rx_register, }; /* Initialize net device. */ @@ -464,6 +469,8 @@ static void tun_net_init(struct net_device *dev) random_ether_addr(dev->dev_addr); + dev->features |= NETIF_F_HW_VLAN_RX; + dev->tx_queue_len = TUN_READQ_SIZE; /* We prefer our own queue length */ break; } @@ -530,6 +537,105 @@ static inline struct sk_buff *tun_alloc_skb(struct tun_struct *tun, return skb; } +static struct sk_buff *vlan_reorder_header(struct sk_buff *skb) +{ + if (skb_cow(skb, skb_headroom(skb)) < 0) + return NULL; + memmove(skb->data - ETH_HLEN, skb->data - VLAN_ETH_HLEN, 2 * ETH_ALEN); + skb->mac_header += VLAN_HLEN; + return skb; +} + +static void vlan_set_encap_proto(struct sk_buff *skb, struct vlan_hdr *vhdr) +{ + __be16 proto; + unsigned char *rawp; + + /* + * * Was a VLAN packet, grab the encapsulated protocol, which the layer + * * three protocols care about. + * */ + + proto = vhdr->h_vlan_encapsulated_proto; + if (ntohs(proto) >= 1536) { + skb->protocol = proto; + return; + } + + rawp = skb->data; + if (*(unsigned short *) rawp == 0xFFFF) + /* + * * This is a magic hack to spot IPX packets. Older Novell + * * breaks the protocol design and runs IPX over 802.3 without + * * an 802.2 LLC layer. We look for FFFF which isn't a used + * * 802.2 SSAP/DSAP. This won't work for fault tolerant netware + * * but does for the rest. + * */ + skb->protocol = htons(ETH_P_802_3); + else + /* + * * Real 802.2 LLC + * */ + skb->protocol = htons(ETH_P_802_2); +} + +static void skb_reset_mac_len(struct sk_buff *skb) +{ + skb->mac_len = skb->network_header - skb->mac_header; +} + +static struct sk_buff *vlan_untag(struct sk_buff *skb) +{ + struct vlan_hdr *vhdr; + u16 vlan_tci; + + if (unlikely(vlan_tx_tag_present(skb))) { + /* vlan_tci is already set-up so leave this for another time */ + return skb; + } + + skb = skb_share_check(skb, GFP_ATOMIC); + if (unlikely(!skb)) + goto err_free; + + if (unlikely(!pskb_may_pull(skb, VLAN_HLEN))) + goto err_free; + + vhdr = (struct vlan_hdr *) skb->data; + vlan_tci = ntohs(vhdr->h_vlan_TCI); + __vlan_hwaccel_put_tag(skb, vlan_tci); + + skb_pull_rcsum(skb, VLAN_HLEN); + vlan_set_encap_proto(skb, vhdr); + + skb = vlan_reorder_header(skb); + if (unlikely(!skb)) + goto err_free; + + skb_reset_network_header(skb); + skb_reset_transport_header(skb); + skb_reset_mac_len(skb); + + return skb; + +err_free: + kfree_skb(skb); + return NULL; +} + +static struct vlan_group *g_vlgrp = NULL; +static void tap_vlan_rx_register(struct net_device *dev, + struct vlan_group *grp) +{ + unsigned long flags; + local_irq_save(flags); + + printk(KERN_DEBUG "zhuyj func:%s,line:%d\n", __FUNCTION__, __LINE__); + g_vlgrp = grp; + tun_net_change_mtu(dev, dev->mtu); + local_irq_restore(flags); +} + /* Get packet from user space buffer */ static __inline__ ssize_t tun_get_user(struct tun_struct *tun, const struct iovec *iv, size_t count, @@ -655,7 +761,17 @@ static __inline__ ssize_t tun_get_user(struct tun_struct *tun, skb_shinfo(skb)->gso_segs = 0; } - netif_rx_ni(skb); + if (g_vlgrp && (skb->protocol == cpu_to_be16(ETH_P_8021Q))){ + struct vlan_hdr *vhdr; + u16 vlan_tci; + int ret; + vhdr = (struct vlan_hdr *) skb->data; + vlan_tci = ntohs(vhdr->h_vlan_TCI); + skb = vlan_untag(skb); + ret = vlan_netif_rx(skb, g_vlgrp, vlan_tci); + } else { + netif_rx_ni(skb); + } tun->dev->stats.rx_packets++; tun->dev->stats.rx_bytes += len; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 9d7e8f7..04c659b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1467,6 +1467,7 @@ extern void dev_kfree_skb_any(struct sk_buff *skb); #define HAVE_NETIF_RX 1 extern int netif_rx(struct sk_buff *skb); extern int netif_rx_ni(struct sk_buff *skb); +extern int vlan_netif_rx(struct sk_buff *skb, struct vlan_group *grp, u16 vlan_tci); #define HAVE_NETIF_RECEIVE_SKB 1 extern int netif_receive_skb(struct sk_buff *skb); extern void napi_gro_flush(struct napi_struct *napi); diff --git a/net/core/dev.c b/net/core/dev.c index d775563..a3802ca 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2067,6 +2067,19 @@ int netif_rx_ni(struct sk_buff *skb) } EXPORT_SYMBOL(netif_rx_ni); +int vlan_netif_rx(struct sk_buff *skb, struct vlan_group *grp, u16 vlan_tci) +{ + int ret; + + preempt_disable(); + ret = __vlan_hwaccel_rx(skb, grp, vlan_tci, 0); + if (local_softirq_pending()) + do_softirq(); + preempt_enable(); + return ret; +} +EXPORT_SYMBOL(vlan_netif_rx); + static void net_tx_action(struct softirq_action *h) { struct softnet_data *sd = &__get_cpu_var(softnet_data); -- 1.7.9.5 --------------050007040902000302070508 Content-Type: text/x-patch; name="0002-vlan-Centralize-handling-of-hardware-acceleration.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0002-vlan-Centralize-handling-of-hardware-acceleration.patch" >From 86fff983b7f36750301aab537dfd4e5744d929e7 Mon Sep 17 00:00:00 2001 From: Zhu Yanjun Date: Wed, 16 Apr 2014 18:57:23 +0800 Subject: [PATCH 2/2] vlan: Centralize handling of hardware acceleration 2.6.x kernels require a similar logic change as commit 4fba4ca4 [vlan: Centralize handling of hardware acceleration] introduces for newer kernels. Since there is something wrong with sending/receiving vlan packets of tun/tap of kernel 2.6.x. In kernel(3.0+), sending/receiving vlan packets is centralize handling in kernel. But in kernel 2.6.x, inserting/extraction vlan tag is still based on nic hardware. Thus, tun/tap nic driver can not support vlan packets. It is necessary to centralize handling of hardware acceleration and simulate vlan rx extraction in tun/tap nic driver to make tun/tap support vlan packets sending/receiving in kernel 2.6.x. Signed-off-by: Zhu Yanjun --- include/linux/if_vlan.h | 4 +- include/linux/netdevice.h | 1 - net/8021q/vlan.c | 49 ++++++++++++++++++++ net/8021q/vlan_core.c | 110 +++++---------------------------------------- net/core/dev.c | 42 +++++++---------- 5 files changed, 76 insertions(+), 130 deletions(-) diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h index 7ff9af1..5538dda 100644 --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -114,7 +114,7 @@ extern u16 vlan_dev_vlan_id(const struct net_device *dev); extern int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp, u16 vlan_tci, int polling); -extern int vlan_hwaccel_do_receive(struct sk_buff *skb); +extern bool vlan_hwaccel_do_receive(struct sk_buff **skb); extern int vlan_gro_receive(struct napi_struct *napi, struct vlan_group *grp, unsigned int vlan_tci, struct sk_buff *skb); extern int vlan_gro_frags(struct napi_struct *napi, struct vlan_group *grp, @@ -140,7 +140,7 @@ static inline int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp, return NET_XMIT_SUCCESS; } -static inline int vlan_hwaccel_do_receive(struct sk_buff *skb) +static inline bool vlan_hwaccel_do_receive(struct sk_buff **skb) { return 0; } diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 04c659b..bdb6b82 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1490,7 +1490,6 @@ static inline void napi_free_frags(struct napi_struct *napi) napi->skb = NULL; } -extern void netif_nit_deliver(struct sk_buff *skb); extern int dev_valid_name(const char *name); extern int dev_ioctl(struct net *net, unsigned int cmd, void __user *); extern int dev_ethtool(struct net *net, struct ifreq *); diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index a29c5ab..385c6e4 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -92,6 +92,54 @@ struct net_device *__find_vlan_dev(struct net_device *real_dev, u16 vlan_id) return NULL; } +bool vlan_hwaccel_do_receive(struct sk_buff **skbp) +{ + struct sk_buff *skb = *skbp; + u16 vlan_id = skb->vlan_tci & VLAN_VID_MASK; + struct net_device *vlan_dev; + struct vlan_rx_stats *rx_stats; + + vlan_dev = __find_vlan_dev(skb->dev, vlan_id); + if (!vlan_dev) { + if (vlan_id) + skb->pkt_type = PACKET_OTHERHOST; + return false; + } + + skb = *skbp = skb_share_check(skb, GFP_ATOMIC); + if (unlikely(!skb)) + return false; + + skb->dev = vlan_dev; + skb->priority = vlan_get_ingress_priority(vlan_dev, skb->vlan_tci); + skb->vlan_tci = 0; + + rx_stats = per_cpu_ptr(vlan_dev_info(vlan_dev)->vlan_rx_stats, + smp_processor_id()); + + rx_stats->rx_packets++; + rx_stats->rx_bytes += skb->len; + + switch (skb->pkt_type) { + case PACKET_BROADCAST: + break; + case PACKET_MULTICAST: + rx_stats->multicast++; + break; + case PACKET_OTHERHOST: + /* Our lower layer thinks this is not local, let's make sure. + * This allows the VLAN to have a different MAC than the + * underlying device, and still route correctly. */ + if (!compare_ether_addr(eth_hdr(skb)->h_dest, + vlan_dev->dev_addr)) + skb->pkt_type = PACKET_HOST; + break; + }; + + return true; +} +extern bool (*__vlan_do_receive)(struct sk_buff **skbp); + static void vlan_group_free(struct vlan_group *grp) { int i; @@ -744,6 +792,7 @@ static int __init vlan_proto_init(void) dev_add_pack(&vlan_packet_type); vlan_ioctl_set(vlan_ioctl_handler); + __vlan_do_receive = vlan_hwaccel_do_receive; return 0; err4: diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c index 7f7de1a..c679535 100644 --- a/net/8021q/vlan_core.c +++ b/net/8021q/vlan_core.c @@ -4,64 +4,6 @@ #include #include "vlan.h" -/* VLAN rx hw acceleration helper. This acts like netif_{rx,receive_skb}(). */ -int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp, - u16 vlan_tci, int polling) -{ - if (netpoll_rx(skb)) - return NET_RX_DROP; - - if (skb_bond_should_drop(skb)) - goto drop; - - skb->vlan_tci = vlan_tci; - skb->dev = vlan_group_get_device(grp, vlan_tci & VLAN_VID_MASK); - - if (!skb->dev) - goto drop; - - return (polling ? netif_receive_skb(skb) : netif_rx(skb)); - -drop: - dev_kfree_skb_any(skb); - return NET_RX_DROP; -} -EXPORT_SYMBOL(__vlan_hwaccel_rx); - -int vlan_hwaccel_do_receive(struct sk_buff *skb) -{ - struct net_device *dev = skb->dev; - struct net_device_stats *stats; - - skb->dev = vlan_dev_info(dev)->real_dev; - netif_nit_deliver(skb); - - skb->dev = dev; - skb->priority = vlan_get_ingress_priority(dev, skb->vlan_tci); - skb->vlan_tci = 0; - - stats = &dev->stats; - stats->rx_packets++; - stats->rx_bytes += skb->len; - - switch (skb->pkt_type) { - case PACKET_BROADCAST: - break; - case PACKET_MULTICAST: - stats->multicast++; - break; - case PACKET_OTHERHOST: - /* Our lower layer thinks this is not local, let's make sure. - * This allows the VLAN to have a different MAC than the - * underlying device, and still route correctly. */ - if (!compare_ether_addr(eth_hdr(skb)->h_dest, - dev->dev_addr)) - skb->pkt_type = PACKET_HOST; - break; - }; - return 0; -} - struct net_device *vlan_dev_real_dev(const struct net_device *dev) { return vlan_dev_info(dev)->real_dev; @@ -74,59 +16,27 @@ u16 vlan_dev_vlan_id(const struct net_device *dev) } EXPORT_SYMBOL(vlan_dev_vlan_id); -static int vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp, - unsigned int vlan_tci, struct sk_buff *skb) +/* VLAN rx hw acceleration helper. This acts like netif_{rx,receive_skb}(). */ +int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp, + u16 vlan_tci, int polling) { - struct sk_buff *p; - - if (skb_bond_should_drop(skb)) - goto drop; - - skb->vlan_tci = vlan_tci; - skb->dev = vlan_group_get_device(grp, vlan_tci & VLAN_VID_MASK); - - if (!skb->dev) - goto drop; - - for (p = napi->gro_list; p; p = p->next) { - NAPI_GRO_CB(p)->same_flow = - p->dev == skb->dev && !compare_ether_header( - skb_mac_header(p), skb_gro_mac_header(skb)); - NAPI_GRO_CB(p)->flush = 0; - } - - return dev_gro_receive(napi, skb); - -drop: - return GRO_DROP; + __vlan_hwaccel_put_tag(skb, vlan_tci); + return polling ? netif_receive_skb(skb) : netif_rx(skb); } +EXPORT_SYMBOL(__vlan_hwaccel_rx); int vlan_gro_receive(struct napi_struct *napi, struct vlan_group *grp, unsigned int vlan_tci, struct sk_buff *skb) { - if (netpoll_rx_on(skb)) - return vlan_hwaccel_receive_skb(skb, grp, vlan_tci); - - skb_gro_reset_offset(skb); - - return napi_skb_finish(vlan_gro_common(napi, grp, vlan_tci, skb), skb); + __vlan_hwaccel_put_tag(skb, vlan_tci); + return napi_gro_receive(napi, skb); } EXPORT_SYMBOL(vlan_gro_receive); int vlan_gro_frags(struct napi_struct *napi, struct vlan_group *grp, unsigned int vlan_tci) { - struct sk_buff *skb = napi_frags_skb(napi); - - if (!skb) - return NET_RX_DROP; - - if (netpoll_rx_on(skb)) { - skb->protocol = eth_type_trans(skb, skb->dev); - return vlan_hwaccel_receive_skb(skb, grp, vlan_tci); - } - - return napi_frags_finish(napi, skb, - vlan_gro_common(napi, grp, vlan_tci, skb)); + __vlan_hwaccel_put_tag(napi->skb, vlan_tci); + return napi_gro_frags(napi); } EXPORT_SYMBOL(vlan_gro_frags); diff --git a/net/core/dev.c b/net/core/dev.c index a3802ca..b69487e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2272,33 +2272,8 @@ out: } #endif -/* - * netif_nit_deliver - deliver received packets to network taps - * @skb: buffer - * - * This function is used to deliver incoming packets to network - * taps. It should be used when the normal netif_receive_skb path - * is bypassed, for example because of VLAN acceleration. - */ -void netif_nit_deliver(struct sk_buff *skb) -{ - struct packet_type *ptype; - - if (list_empty(&ptype_all)) - return; - - skb_reset_network_header(skb); - skb_reset_transport_header(skb); - skb->mac_len = skb->network_header - skb->mac_header; - - rcu_read_lock(); - list_for_each_entry_rcu(ptype, &ptype_all, list) { - if (!ptype->dev || ptype->dev == skb->dev) - deliver_skb(skb, ptype, skb->dev); - } - rcu_read_unlock(); -} - +bool (*__vlan_do_receive)(struct sk_buff **skbp) = NULL; +EXPORT_SYMBOL(__vlan_do_receive); /** * netif_receive_skb - process receive buffer from network * @skb: buffer to process @@ -2354,6 +2329,8 @@ int netif_receive_skb(struct sk_buff *skb) rcu_read_lock(); +another_round: + #ifdef CONFIG_NET_CLS_ACT if (skb->tc_verd & TC_NCLS) { skb->tc_verd = CLR_TC_NCLS(skb->tc_verd); @@ -2377,6 +2354,17 @@ int netif_receive_skb(struct sk_buff *skb) ncls: #endif + if (vlan_tx_tag_present(skb)) { + if (pt_prev) { + ret = deliver_skb(skb, pt_prev, orig_dev); + pt_prev = NULL; + } + if (__vlan_do_receive && __vlan_do_receive(&skb)) { + goto another_round; + } else if (unlikely(!skb)) + goto out; + } + skb = handle_bridge(skb, &pt_prev, &ret, orig_dev); if (!skb) goto out; -- 1.7.9.5 --------------050007040902000302070508-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/