2019-10-28 23:04:08

by Haiyang Zhang

[permalink] [raw]
Subject: [PATCH net-next, 0/4] hv_netvsc: Add XDP support and some error handling fixes

This patch set fixes some error handling issues in netvsc driver,
and add XDP support.

Haiyang Zhang (4):
hv_netvsc: Fix error handling in netvsc_set_features()
hv_netvsc: Fix error handling in netvsc_attach()
hv_netvsc: Add XDP support
hv_netvsc: Update document for XDP support

.../networking/device_drivers/microsoft/netvsc.txt | 14 ++
drivers/net/hyperv/Makefile | 2 +-
drivers/net/hyperv/hyperv_net.h | 15 ++
drivers/net/hyperv/netvsc.c | 8 +-
drivers/net/hyperv/netvsc_bpf.c | 211 +++++++++++++++++++++
drivers/net/hyperv/netvsc_drv.c | 150 ++++++++++++---
6 files changed, 374 insertions(+), 26 deletions(-)
create mode 100644 drivers/net/hyperv/netvsc_bpf.c

--
1.8.3.1


2019-10-28 23:05:33

by Haiyang Zhang

[permalink] [raw]
Subject: [PATCH net-next, 2/4] hv_netvsc: Fix error handling in netvsc_attach()

If rndis_filter_open() fails, we need to remove the rndis device created
in earlier steps, before returning an error code. Otherwise, the retry of
netvsc_attach() from its callers will fail and hang.

Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
Signed-off-by: Haiyang Zhang <[email protected]>
---
drivers/net/hyperv/netvsc_drv.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 734e411..a14fc8e 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -982,7 +982,7 @@ static int netvsc_attach(struct net_device *ndev,
if (netif_running(ndev)) {
ret = rndis_filter_open(nvdev);
if (ret)
- return ret;
+ goto err;

rdev = nvdev->extension;
if (!rdev->link_state)
@@ -990,6 +990,13 @@ static int netvsc_attach(struct net_device *ndev,
}

return 0;
+
+err:
+ netif_device_detach(ndev);
+
+ rndis_filter_device_remove(hdev, nvdev);
+
+ return ret;
}

static int netvsc_set_channels(struct net_device *net,
--
1.8.3.1

2019-10-28 23:10:14

by Haiyang Zhang

[permalink] [raw]
Subject: [PATCH net-next, 1/4] hv_netvsc: Fix error handling in netvsc_set_features()

When an error is returned by rndis_filter_set_offload_params(), we should
still assign the unaffected features to ndev->features. Otherwise, these
features will be missing.

Fixes: d6792a5a0747 ("hv_netvsc: Add handler for LRO setting change")
Signed-off-by: Haiyang Zhang <[email protected]>
---
drivers/net/hyperv/netvsc_drv.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 39dddcd..734e411 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1807,8 +1807,10 @@ static int netvsc_set_features(struct net_device *ndev,

ret = rndis_filter_set_offload_params(ndev, nvdev, &offloads);

- if (ret)
+ if (ret) {
features ^= NETIF_F_LRO;
+ ndev->features = features;
+ }

syncvf:
if (!vf_netdev)
--
1.8.3.1

2019-10-29 06:52:08

by Haiyang Zhang

[permalink] [raw]
Subject: [PATCH net-next, 4/4] hv_netvsc: Update document for XDP support

Added the new section in the document regarding XDP support
by hv_netvsc driver.

Signed-off-by: Haiyang Zhang <[email protected]>
---
.../networking/device_drivers/microsoft/netvsc.txt | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/Documentation/networking/device_drivers/microsoft/netvsc.txt b/Documentation/networking/device_drivers/microsoft/netvsc.txt
index 3bfa635..69ccfca 100644
--- a/Documentation/networking/device_drivers/microsoft/netvsc.txt
+++ b/Documentation/networking/device_drivers/microsoft/netvsc.txt
@@ -82,3 +82,17 @@ Features
contain one or more packets. The send buffer is an optimization, the driver
will use slower method to handle very large packets or if the send buffer
area is exhausted.
+
+ XDP support
+ -----------
+ XDP (eXpress Data Path) is a feature that runs eBPF bytecode at the early
+ stage when packets arrive at a NIC card. The goal is to increase performance
+ for packet processing, reducing the overhead of SKB allocation and other
+ upper network layers.
+
+ hv_netvsc supports XDP in native mode, and transparently sets the XDP
+ program on the associated VF NIC as well.
+
+ XDP program cannot run with LRO (RSC) enabled, so you need to disable LRO
+ before running XDP:
+ ethtool -K eth0 lro off
--
1.8.3.1

2019-10-29 06:53:38

by Haiyang Zhang

[permalink] [raw]
Subject: [PATCH net-next, 3/4] hv_netvsc: Add XDP support

This patch adds support of XDP in native mode for hv_netvsc driver, and
transparently sets the XDP program on the associated VF NIC as well.

XDP program cannot run with LRO (RSC) enabled, so you need to disable LRO
before running XDP:
ethtool -K eth0 lro off

XDP actions not yet supported:
XDP_TX, XDP_REDIRECT

Signed-off-by: Haiyang Zhang <[email protected]>
---
drivers/net/hyperv/Makefile | 2 +-
drivers/net/hyperv/hyperv_net.h | 15 +++
drivers/net/hyperv/netvsc.c | 8 +-
drivers/net/hyperv/netvsc_bpf.c | 211 ++++++++++++++++++++++++++++++++++++++++
drivers/net/hyperv/netvsc_drv.c | 141 ++++++++++++++++++++++-----
5 files changed, 351 insertions(+), 26 deletions(-)
create mode 100644 drivers/net/hyperv/netvsc_bpf.c

diff --git a/drivers/net/hyperv/Makefile b/drivers/net/hyperv/Makefile
index 3a2aa07..0db7cca 100644
--- a/drivers/net/hyperv/Makefile
+++ b/drivers/net/hyperv/Makefile
@@ -1,4 +1,4 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_HYPERV_NET) += hv_netvsc.o

-hv_netvsc-y := netvsc_drv.o netvsc.o rndis_filter.o netvsc_trace.o
+hv_netvsc-y := netvsc_drv.o netvsc.o rndis_filter.o netvsc_trace.o netvsc_bpf.o
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 670ef68..e5aa256 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -142,6 +142,8 @@ struct netvsc_device_info {
u32 send_section_size;
u32 recv_section_size;

+ struct bpf_prog *bprog;
+
u8 rss_key[NETVSC_HASH_KEYLEN];
};

@@ -199,6 +201,15 @@ int netvsc_recv_callback(struct net_device *net,
void netvsc_channel_cb(void *context);
int netvsc_poll(struct napi_struct *napi, int budget);

+u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel *nvchan,
+ void **p_pbuf);
+unsigned int netvsc_xdp_fraglen(unsigned int len);
+struct bpf_prog *netvsc_xdp_get(struct netvsc_device *nvdev);
+int netvsc_xdp_set(struct net_device *dev, struct bpf_prog *prog,
+ struct netvsc_device *nvdev);
+int netvsc_vf_setxdp(struct net_device *vf_netdev, struct bpf_prog *prog);
+int netvsc_bpf(struct net_device *dev, struct netdev_bpf *bpf);
+
int rndis_set_subchannel(struct net_device *ndev,
struct netvsc_device *nvdev,
struct netvsc_device_info *dev_info);
@@ -865,6 +876,7 @@ struct netvsc_stats {
u64 bytes;
u64 broadcast;
u64 multicast;
+ u64 xdp_drop;
struct u64_stats_sync syncp;
};

@@ -965,6 +977,9 @@ struct netvsc_channel {
atomic_t queue_sends;
struct nvsc_rsc rsc;

+ struct bpf_prog __rcu *bpf_prog;
+ struct xdp_rxq_info xdp_rxq;
+
struct netvsc_stats tx_stats;
struct netvsc_stats rx_stats;
};
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index d22a36f..688487b 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -122,8 +122,10 @@ static void free_netvsc_device(struct rcu_head *head)
vfree(nvdev->send_buf);
kfree(nvdev->send_section_map);

- for (i = 0; i < VRSS_CHANNEL_MAX; i++)
+ for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
+ xdp_rxq_info_unreg(&nvdev->chan_table[i].xdp_rxq);
vfree(nvdev->chan_table[i].mrc.slots);
+ }

kfree(nvdev);
}
@@ -1370,6 +1372,10 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device,
nvchan->net_device = net_device;
u64_stats_init(&nvchan->tx_stats.syncp);
u64_stats_init(&nvchan->rx_stats.syncp);
+
+ xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i);
+ xdp_rxq_info_reg_mem_model(&nvchan->xdp_rxq,
+ MEM_TYPE_PAGE_SHARED, NULL);
}

/* Enable NAPI handler before init callbacks */
diff --git a/drivers/net/hyperv/netvsc_bpf.c b/drivers/net/hyperv/netvsc_bpf.c
new file mode 100644
index 0000000..4d235ac
--- /dev/null
+++ b/drivers/net/hyperv/netvsc_bpf.c
@@ -0,0 +1,211 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2019, Microsoft Corporation.
+ *
+ * Author:
+ * Haiyang Zhang <[email protected]>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
+#include <linux/bpf.h>
+#include <linux/bpf_trace.h>
+#include <linux/kernel.h>
+#include <net/xdp.h>
+
+#include <linux/mutex.h>
+#include <linux/rtnetlink.h>
+
+#include "hyperv_net.h"
+
+u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel *nvchan,
+ void **p_pbuf)
+{
+ struct page *page = NULL;
+ void *data = nvchan->rsc.data[0];
+ u32 len = nvchan->rsc.len[0];
+ void *pbuf = data;
+ struct bpf_prog *prog;
+ struct xdp_buff xdp;
+ u32 act = XDP_PASS;
+
+ *p_pbuf = NULL;
+
+ rcu_read_lock();
+ prog = rcu_dereference(nvchan->bpf_prog);
+
+ if (!prog || nvchan->rsc.cnt > 1)
+ goto out;
+
+ /* copy to a new page buffer if data are not within a page */
+ if (virt_to_page(data) != virt_to_page(data + len - 1)) {
+ page = alloc_page(GFP_ATOMIC);
+ if (!page)
+ goto out;
+
+ pbuf = page_address(page);
+ memcpy(pbuf, nvchan->rsc.data[0], len);
+
+ *p_pbuf = pbuf;
+ }
+
+ xdp.data_hard_start = pbuf;
+ xdp.data = xdp.data_hard_start;
+ xdp_set_data_meta_invalid(&xdp);
+ xdp.data_end = xdp.data + len;
+ xdp.rxq = &nvchan->xdp_rxq;
+ xdp.handle = 0;
+
+ act = bpf_prog_run_xdp(prog, &xdp);
+
+ switch (act) {
+ case XDP_PASS:
+ /* Pass to upper layers */
+ break;
+
+ case XDP_ABORTED:
+ trace_xdp_exception(ndev, prog, act);
+ break;
+
+ case XDP_DROP:
+ break;
+
+ default:
+ bpf_warn_invalid_xdp_action(act);
+ }
+
+out:
+ rcu_read_unlock();
+
+ if (page && act != XDP_PASS) {
+ *p_pbuf = NULL;
+ __free_page(page);
+ }
+
+ return act;
+}
+
+unsigned int netvsc_xdp_fraglen(unsigned int len)
+{
+ return SKB_DATA_ALIGN(len) +
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+}
+
+struct bpf_prog *netvsc_xdp_get(struct netvsc_device *nvdev)
+{
+ return rtnl_dereference(nvdev->chan_table[0].bpf_prog);
+}
+
+int netvsc_xdp_set(struct net_device *dev, struct bpf_prog *prog,
+ struct netvsc_device *nvdev)
+{
+ struct bpf_prog *old_prog;
+ int frag_max, i;
+
+ old_prog = netvsc_xdp_get(nvdev);
+
+ if (!old_prog && !prog)
+ return 0;
+
+ frag_max = netvsc_xdp_fraglen(dev->mtu + ETH_HLEN);
+ if (prog && frag_max > PAGE_SIZE) {
+ netdev_err(dev, "XDP: mtu:%u too large, frag:%u\n",
+ dev->mtu, frag_max);
+ return -EOPNOTSUPP;
+ }
+
+ if (prog && (dev->features & NETIF_F_LRO)) {
+ netdev_err(dev, "XDP: not support LRO\n");
+ return -EOPNOTSUPP;
+ }
+
+ if (prog) {
+ prog = bpf_prog_add(prog, nvdev->num_chn);
+ if (IS_ERR(prog))
+ return PTR_ERR(prog);
+ }
+
+ for (i = 0; i < nvdev->num_chn; i++)
+ rcu_assign_pointer(nvdev->chan_table[i].bpf_prog, prog);
+
+ if (old_prog)
+ for (i = 0; i < nvdev->num_chn; i++)
+ bpf_prog_put(old_prog);
+
+ return 0;
+}
+
+int netvsc_vf_setxdp(struct net_device *vf_netdev, struct bpf_prog *prog)
+{
+ struct netdev_bpf xdp;
+ bpf_op_t ndo_bpf;
+
+ ASSERT_RTNL();
+
+ if (!vf_netdev)
+ return 0;
+
+ ndo_bpf = vf_netdev->netdev_ops->ndo_bpf;
+ if (!ndo_bpf)
+ return 0;
+
+ memset(&xdp, 0, sizeof(xdp));
+
+ xdp.command = XDP_SETUP_PROG;
+ xdp.prog = prog;
+
+ return ndo_bpf(vf_netdev, &xdp);
+}
+
+static u32 netvsc_xdp_query(struct netvsc_device *nvdev)
+{
+ struct bpf_prog *prog = netvsc_xdp_get(nvdev);
+
+ if (prog)
+ return prog->aux->id;
+
+ return 0;
+}
+
+int netvsc_bpf(struct net_device *dev, struct netdev_bpf *bpf)
+{
+ struct net_device_context *ndevctx = netdev_priv(dev);
+ struct netvsc_device *nvdev = rtnl_dereference(ndevctx->nvdev);
+ struct net_device *vf_netdev = rtnl_dereference(ndevctx->vf_netdev);
+ int ret;
+
+ if (!nvdev || nvdev->destroy) {
+ if (bpf->command == XDP_QUERY_PROG) {
+ bpf->prog_id = 0;
+ return 0; /* Query must always succeed */
+ } else {
+ return -ENODEV;
+ }
+ }
+
+ switch (bpf->command) {
+ case XDP_SETUP_PROG:
+ ret = netvsc_xdp_set(dev, bpf->prog, nvdev);
+
+ if (ret)
+ return ret;
+
+ ret = netvsc_vf_setxdp(vf_netdev, bpf->prog);
+
+ if (ret) {
+ netdev_err(dev, "vf_setxdp failed:%d\n", ret);
+ netvsc_xdp_set(dev, NULL, nvdev);
+ }
+
+ return ret;
+
+ case XDP_QUERY_PROG:
+ bpf->prog_id = netvsc_xdp_query(nvdev);
+ return 0;
+
+ default:
+ return -EINVAL;
+ }
+}
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index a14fc8e..415f8db 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -25,6 +25,7 @@
#include <linux/slab.h>
#include <linux/rtnetlink.h>
#include <linux/netpoll.h>
+#include <linux/bpf.h>

#include <net/arp.h>
#include <net/route.h>
@@ -760,7 +761,8 @@ static void netvsc_comp_ipcsum(struct sk_buff *skb)
}

static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net,
- struct netvsc_channel *nvchan)
+ struct netvsc_channel *nvchan,
+ void *pbuf)
{
struct napi_struct *napi = &nvchan->napi;
const struct ndis_pkt_8021q_info *vlan = nvchan->rsc.vlan;
@@ -769,16 +771,32 @@ static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net,
struct sk_buff *skb;
int i;

- skb = napi_alloc_skb(napi, nvchan->rsc.pktlen);
- if (!skb)
- return skb;
+ if (pbuf) {
+ unsigned int len = nvchan->rsc.pktlen;
+ unsigned int frag_size = netvsc_xdp_fraglen(len);

- /*
- * Copy to skb. This copy is needed here since the memory pointed by
- * hv_netvsc_packet cannot be deallocated
- */
- for (i = 0; i < nvchan->rsc.cnt; i++)
- skb_put_data(skb, nvchan->rsc.data[i], nvchan->rsc.len[i]);
+ skb = build_skb(pbuf, frag_size);
+
+ if (!skb) {
+ __free_page(virt_to_page(pbuf));
+ return NULL;
+ }
+
+ skb_put(skb, len);
+ skb->dev = napi->dev;
+ } else {
+ skb = napi_alloc_skb(napi, nvchan->rsc.pktlen);
+
+ if (!skb)
+ return NULL;
+
+ /* Copy to skb. This copy is needed here since the memory
+ * pointed by hv_netvsc_packet cannot be deallocated.
+ */
+ for (i = 0; i < nvchan->rsc.cnt; i++)
+ skb_put_data(skb, nvchan->rsc.data[i],
+ nvchan->rsc.len[i]);
+ }

skb->protocol = eth_type_trans(skb, net);

@@ -826,13 +844,25 @@ int netvsc_recv_callback(struct net_device *net,
struct vmbus_channel *channel = nvchan->channel;
u16 q_idx = channel->offermsg.offer.sub_channel_index;
struct sk_buff *skb;
- struct netvsc_stats *rx_stats;
+ struct netvsc_stats *rx_stats = &nvchan->rx_stats;
+ void *pbuf = NULL; /* page buffer */
+ u32 act;

if (net->reg_state != NETREG_REGISTERED)
return NVSP_STAT_FAIL;

+ act = netvsc_run_xdp(net, nvchan, &pbuf);
+
+ if (act != XDP_PASS) {
+ u64_stats_update_begin(&rx_stats->syncp);
+ rx_stats->xdp_drop++;
+ u64_stats_update_end(&rx_stats->syncp);
+
+ return NVSP_STAT_SUCCESS; /* consumed by XDP */
+ }
+
/* Allocate a skb - TODO direct I/O to pages? */
- skb = netvsc_alloc_recv_skb(net, nvchan);
+ skb = netvsc_alloc_recv_skb(net, nvchan, pbuf);

if (unlikely(!skb)) {
++net_device_ctx->eth_stats.rx_no_memory;
@@ -846,7 +876,6 @@ int netvsc_recv_callback(struct net_device *net,
* on the synthetic device because modifying the VF device
* statistics will not work correctly.
*/
- rx_stats = &nvchan->rx_stats;
u64_stats_update_begin(&rx_stats->syncp);
rx_stats->packets++;
rx_stats->bytes += nvchan->rsc.pktlen;
@@ -887,6 +916,7 @@ static void netvsc_get_channels(struct net_device *net,
(struct netvsc_device *nvdev)
{
struct netvsc_device_info *dev_info;
+ struct bpf_prog *prog;

dev_info = kzalloc(sizeof(*dev_info), GFP_ATOMIC);

@@ -894,6 +924,8 @@ static void netvsc_get_channels(struct net_device *net,
return NULL;

if (nvdev) {
+ ASSERT_RTNL();
+
dev_info->num_chn = nvdev->num_chn;
dev_info->send_sections = nvdev->send_section_cnt;
dev_info->send_section_size = nvdev->send_section_size;
@@ -902,6 +934,13 @@ static void netvsc_get_channels(struct net_device *net,

memcpy(dev_info->rss_key, nvdev->extension->rss_key,
NETVSC_HASH_KEYLEN);
+
+ prog = netvsc_xdp_get(nvdev);
+ if (prog) {
+ prog = bpf_prog_add(prog, 1);
+ if (!IS_ERR(prog))
+ dev_info->bprog = prog;
+ }
} else {
dev_info->num_chn = VRSS_CHANNEL_DEFAULT;
dev_info->send_sections = NETVSC_DEFAULT_TX;
@@ -913,6 +952,17 @@ static void netvsc_get_channels(struct net_device *net,
return dev_info;
}

+/* Free struct netvsc_device_info */
+static void netvsc_devinfo_put(struct netvsc_device_info *dev_info)
+{
+ if (dev_info->bprog) {
+ ASSERT_RTNL();
+ bpf_prog_put(dev_info->bprog);
+ }
+
+ kfree(dev_info);
+}
+
static int netvsc_detach(struct net_device *ndev,
struct netvsc_device *nvdev)
{
@@ -924,6 +974,8 @@ static int netvsc_detach(struct net_device *ndev,
if (cancel_work_sync(&nvdev->subchan_work))
nvdev->num_chn = 1;

+ netvsc_xdp_set(ndev, NULL, nvdev);
+
/* If device was up (receiving) then shutdown */
if (netif_running(ndev)) {
netvsc_tx_disable(nvdev, ndev);
@@ -957,7 +1009,8 @@ static int netvsc_attach(struct net_device *ndev,
struct hv_device *hdev = ndev_ctx->device_ctx;
struct netvsc_device *nvdev;
struct rndis_device *rdev;
- int ret;
+ struct bpf_prog *prog;
+ int ret = 0;

nvdev = rndis_filter_device_add(hdev, dev_info);
if (IS_ERR(nvdev))
@@ -973,6 +1026,13 @@ static int netvsc_attach(struct net_device *ndev,
}
}

+ prog = dev_info->bprog;
+ if (prog) {
+ ret = netvsc_xdp_set(ndev, prog, nvdev);
+ if (ret)
+ goto err1;
+ }
+
/* In any case device is now ready */
netif_device_attach(ndev);

@@ -982,7 +1042,7 @@ static int netvsc_attach(struct net_device *ndev,
if (netif_running(ndev)) {
ret = rndis_filter_open(nvdev);
if (ret)
- goto err;
+ goto err2;

rdev = nvdev->extension;
if (!rdev->link_state)
@@ -991,9 +1051,10 @@ static int netvsc_attach(struct net_device *ndev,

return 0;

-err:
+err2:
netif_device_detach(ndev);

+err1:
rndis_filter_device_remove(hdev, nvdev);

return ret;
@@ -1043,7 +1104,7 @@ static int netvsc_set_channels(struct net_device *net,
}

out:
- kfree(device_info);
+ netvsc_devinfo_put(device_info);
return ret;
}

@@ -1150,7 +1211,7 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu)
dev_set_mtu(vf_netdev, orig_mtu);

out:
- kfree(device_info);
+ netvsc_devinfo_put(device_info);
return ret;
}

@@ -1375,8 +1436,8 @@ static int netvsc_set_mac_addr(struct net_device *ndev, void *p)
/* statistics per queue (rx/tx packets/bytes) */
#define NETVSC_PCPU_STATS_LEN (num_present_cpus() * ARRAY_SIZE(pcpu_stats))

-/* 4 statistics per queue (rx/tx packets/bytes) */
-#define NETVSC_QUEUE_STATS_LEN(dev) ((dev)->num_chn * 4)
+/* 5 statistics per queue (rx/tx packets/bytes, rx xdp_drop) */
+#define NETVSC_QUEUE_STATS_LEN(dev) ((dev)->num_chn * 5)

static int netvsc_get_sset_count(struct net_device *dev, int string_set)
{
@@ -1408,6 +1469,7 @@ static void netvsc_get_ethtool_stats(struct net_device *dev,
struct netvsc_ethtool_pcpu_stats *pcpu_sum;
unsigned int start;
u64 packets, bytes;
+ u64 xdp_drop;
int i, j, cpu;

if (!nvdev)
@@ -1436,9 +1498,11 @@ static void netvsc_get_ethtool_stats(struct net_device *dev,
start = u64_stats_fetch_begin_irq(&qstats->syncp);
packets = qstats->packets;
bytes = qstats->bytes;
+ xdp_drop = qstats->xdp_drop;
} while (u64_stats_fetch_retry_irq(&qstats->syncp, start));
data[i++] = packets;
data[i++] = bytes;
+ data[i++] = xdp_drop;
}

pcpu_sum = kvmalloc_array(num_possible_cpus(),
@@ -1486,6 +1550,8 @@ static void netvsc_get_strings(struct net_device *dev, u32 stringset, u8 *data)
p += ETH_GSTRING_LEN;
sprintf(p, "rx_queue_%u_bytes", i);
p += ETH_GSTRING_LEN;
+ sprintf(p, "rx_queue_%u_xdp_drop", i);
+ p += ETH_GSTRING_LEN;
}

for_each_present_cpu(cpu) {
@@ -1782,10 +1848,27 @@ static int netvsc_set_ringparam(struct net_device *ndev,
}

out:
- kfree(device_info);
+ netvsc_devinfo_put(device_info);
return ret;
}

+static netdev_features_t netvsc_fix_features(struct net_device *ndev,
+ netdev_features_t features)
+{
+ struct net_device_context *ndevctx = netdev_priv(ndev);
+ struct netvsc_device *nvdev = rtnl_dereference(ndevctx->nvdev);
+
+ if (!nvdev || nvdev->destroy)
+ return features;
+
+ if ((features & NETIF_F_LRO) && netvsc_xdp_get(nvdev)) {
+ features ^= NETIF_F_LRO;
+ netdev_info(ndev, "Skip LRO - unsupported with XDP\n");
+ }
+
+ return features;
+}
+
static int netvsc_set_features(struct net_device *ndev,
netdev_features_t features)
{
@@ -1872,12 +1955,14 @@ static void netvsc_set_msglevel(struct net_device *ndev, u32 val)
.ndo_start_xmit = netvsc_start_xmit,
.ndo_change_rx_flags = netvsc_change_rx_flags,
.ndo_set_rx_mode = netvsc_set_rx_mode,
+ .ndo_fix_features = netvsc_fix_features,
.ndo_set_features = netvsc_set_features,
.ndo_change_mtu = netvsc_change_mtu,
.ndo_validate_addr = eth_validate_addr,
.ndo_set_mac_address = netvsc_set_mac_addr,
.ndo_select_queue = netvsc_select_queue,
.ndo_get_stats64 = netvsc_get_stats64,
+ .ndo_bpf = netvsc_bpf,
};

/*
@@ -2164,6 +2249,7 @@ static int netvsc_register_vf(struct net_device *vf_netdev)
{
struct net_device_context *net_device_ctx;
struct netvsc_device *netvsc_dev;
+ struct bpf_prog *prog;
struct net_device *ndev;
int ret;

@@ -2208,6 +2294,9 @@ static int netvsc_register_vf(struct net_device *vf_netdev)
vf_netdev->wanted_features = ndev->features;
netdev_update_features(vf_netdev);

+ prog = netvsc_xdp_get(netvsc_dev);
+ netvsc_vf_setxdp(vf_netdev, prog);
+
return NOTIFY_OK;
}

@@ -2249,6 +2338,8 @@ static int netvsc_unregister_vf(struct net_device *vf_netdev)

netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name);

+ netvsc_vf_setxdp(vf_netdev, NULL);
+
netdev_rx_handler_unregister(vf_netdev);
netdev_upper_dev_unlink(vf_netdev, ndev);
RCU_INIT_POINTER(net_device_ctx->vf_netdev, NULL);
@@ -2362,14 +2453,14 @@ static int netvsc_probe(struct hv_device *dev,
list_add(&net_device_ctx->list, &netvsc_dev_list);
rtnl_unlock();

- kfree(device_info);
+ netvsc_devinfo_put(device_info);
return 0;

register_failed:
rtnl_unlock();
rndis_filter_device_remove(dev, nvdev);
rndis_failed:
- kfree(device_info);
+ netvsc_devinfo_put(device_info);
devinfo_failed:
free_percpu(net_device_ctx->vf_stats);
no_stats:
@@ -2397,8 +2488,10 @@ static int netvsc_remove(struct hv_device *dev)

rtnl_lock();
nvdev = rtnl_dereference(ndev_ctx->nvdev);
- if (nvdev)
+ if (nvdev) {
cancel_work_sync(&nvdev->subchan_work);
+ netvsc_xdp_set(net, NULL, nvdev);
+ }

/*
* Call to the vsc driver to let it know that the device is being
--
1.8.3.1

2019-10-29 06:54:07

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next, 3/4] hv_netvsc: Add XDP support

On Mon, 28 Oct 2019 21:07:04 +0000, Haiyang Zhang wrote:
> This patch adds support of XDP in native mode for hv_netvsc driver, and
> transparently sets the XDP program on the associated VF NIC as well.
>
> XDP program cannot run with LRO (RSC) enabled, so you need to disable LRO
> before running XDP:
> ethtool -K eth0 lro off
>
> XDP actions not yet supported:
> XDP_TX, XDP_REDIRECT

I don't think we want to merge support without at least XDP_TX these
days..

And without the ability to prepend headers this may be the least
complete initial XDP implementation we've seen :(

> Signed-off-by: Haiyang Zhang <[email protected]>

> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index d22a36f..688487b 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -122,8 +122,10 @@ static void free_netvsc_device(struct rcu_head *head)
> vfree(nvdev->send_buf);
> kfree(nvdev->send_section_map);
>
> - for (i = 0; i < VRSS_CHANNEL_MAX; i++)
> + for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
> + xdp_rxq_info_unreg(&nvdev->chan_table[i].xdp_rxq);
> vfree(nvdev->chan_table[i].mrc.slots);
> + }
>
> kfree(nvdev);
> }
> @@ -1370,6 +1372,10 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device,
> nvchan->net_device = net_device;
> u64_stats_init(&nvchan->tx_stats.syncp);
> u64_stats_init(&nvchan->rx_stats.syncp);
> +
> + xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i);
> + xdp_rxq_info_reg_mem_model(&nvchan->xdp_rxq,
> + MEM_TYPE_PAGE_SHARED, NULL);

These can fail.

> }
>
> /* Enable NAPI handler before init callbacks */
> diff --git a/drivers/net/hyperv/netvsc_bpf.c b/drivers/net/hyperv/netvsc_bpf.c
> new file mode 100644
> index 0000000..4d235ac
> --- /dev/null
> +++ b/drivers/net/hyperv/netvsc_bpf.c
> @@ -0,0 +1,211 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (c) 2019, Microsoft Corporation.
> + *
> + * Author:
> + * Haiyang Zhang <[email protected]>
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/netdevice.h>
> +#include <linux/etherdevice.h>
> +#include <linux/ethtool.h>
> +#include <linux/bpf.h>
> +#include <linux/bpf_trace.h>
> +#include <linux/kernel.h>
> +#include <net/xdp.h>
> +
> +#include <linux/mutex.h>
> +#include <linux/rtnetlink.h>
> +
> +#include "hyperv_net.h"
> +
> +u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel *nvchan,
> + void **p_pbuf)
> +{
> + struct page *page = NULL;
> + void *data = nvchan->rsc.data[0];
> + u32 len = nvchan->rsc.len[0];
> + void *pbuf = data;
> + struct bpf_prog *prog;
> + struct xdp_buff xdp;
> + u32 act = XDP_PASS;
> +
> + *p_pbuf = NULL;
> +
> + rcu_read_lock();
> + prog = rcu_dereference(nvchan->bpf_prog);
> +
> + if (!prog || nvchan->rsc.cnt > 1)

Can rsc.cnt == 1 not be ensured at setup time? This looks quite
limiting if random frames could be forced to bypass the filter.

> + goto out;
> +
> + /* copy to a new page buffer if data are not within a page */
> + if (virt_to_page(data) != virt_to_page(data + len - 1)) {
> + page = alloc_page(GFP_ATOMIC);
> + if (!page)
> + goto out;

Returning XDP_PASS on allocation failure seems highly questionable.

> + pbuf = page_address(page);
> + memcpy(pbuf, nvchan->rsc.data[0], len);
> +
> + *p_pbuf = pbuf;
> + }
> +
> + xdp.data_hard_start = pbuf;
> + xdp.data = xdp.data_hard_start;

This patch also doesn't add any headroom for XDP to prepend data :(

> + xdp_set_data_meta_invalid(&xdp);
> + xdp.data_end = xdp.data + len;
> + xdp.rxq = &nvchan->xdp_rxq;
> + xdp.handle = 0;
> +
> + act = bpf_prog_run_xdp(prog, &xdp);
> +
> + switch (act) {
> + case XDP_PASS:
> + /* Pass to upper layers */
> + break;
> +
> + case XDP_ABORTED:
> + trace_xdp_exception(ndev, prog, act);
> + break;
> +
> + case XDP_DROP:
> + break;
> +
> + default:
> + bpf_warn_invalid_xdp_action(act);
> + }
> +
> +out:
> + rcu_read_unlock();
> +
> + if (page && act != XDP_PASS) {
> + *p_pbuf = NULL;
> + __free_page(page);
> + }
> +
> + return act;
> +}
> +
> +unsigned int netvsc_xdp_fraglen(unsigned int len)
> +{
> + return SKB_DATA_ALIGN(len) +
> + SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +}
> +
> +struct bpf_prog *netvsc_xdp_get(struct netvsc_device *nvdev)
> +{
> + return rtnl_dereference(nvdev->chan_table[0].bpf_prog);
> +}
> +
> +int netvsc_xdp_set(struct net_device *dev, struct bpf_prog *prog,
> + struct netvsc_device *nvdev)
> +{
> + struct bpf_prog *old_prog;
> + int frag_max, i;
> +
> + old_prog = netvsc_xdp_get(nvdev);
> +
> + if (!old_prog && !prog)
> + return 0;

I think this case is now handled by the core.

> + frag_max = netvsc_xdp_fraglen(dev->mtu + ETH_HLEN);
> + if (prog && frag_max > PAGE_SIZE) {
> + netdev_err(dev, "XDP: mtu:%u too large, frag:%u\n",
> + dev->mtu, frag_max);
> + return -EOPNOTSUPP;
> + }
> +
> + if (prog && (dev->features & NETIF_F_LRO)) {
> + netdev_err(dev, "XDP: not support LRO\n");

Please report this via extack, that way users will see it in the console
in which they're installing the program.

> + return -EOPNOTSUPP;
> + }
> +
> + if (prog) {
> + prog = bpf_prog_add(prog, nvdev->num_chn);
> + if (IS_ERR(prog))
> + return PTR_ERR(prog);
> + }
> +
> + for (i = 0; i < nvdev->num_chn; i++)
> + rcu_assign_pointer(nvdev->chan_table[i].bpf_prog, prog);
> +
> + if (old_prog)
> + for (i = 0; i < nvdev->num_chn; i++)
> + bpf_prog_put(old_prog);
> +
> + return 0;
> +}
> +
> +int netvsc_vf_setxdp(struct net_device *vf_netdev, struct bpf_prog *prog)
> +{
> + struct netdev_bpf xdp;
> + bpf_op_t ndo_bpf;
> +
> + ASSERT_RTNL();
> +
> + if (!vf_netdev)
> + return 0;
> +
> + ndo_bpf = vf_netdev->netdev_ops->ndo_bpf;
> + if (!ndo_bpf)
> + return 0;
> +
> + memset(&xdp, 0, sizeof(xdp));
> +
> + xdp.command = XDP_SETUP_PROG;
> + xdp.prog = prog;
> +
> + return ndo_bpf(vf_netdev, &xdp);

IMHO the automatic propagation is not a good idea. Especially if the
propagation doesn't make the entire installation fail if VF doesn't
have ndo_bpf.

> +}

2019-10-29 20:04:14

by Haiyang Zhang

[permalink] [raw]
Subject: RE: [PATCH net-next, 3/4] hv_netvsc: Add XDP support

> -----Original Message-----
> From: Jakub Kicinski <[email protected]>
> Sent: Monday, October 28, 2019 5:33 PM
> To: Haiyang Zhang <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; KY Srinivasan <[email protected]>; Stephen
> Hemminger <[email protected]>; [email protected]; vkuznets
> <[email protected]>; [email protected]; linux-
> [email protected]
> Subject: Re: [PATCH net-next, 3/4] hv_netvsc: Add XDP support
>
> On Mon, 28 Oct 2019 21:07:04 +0000, Haiyang Zhang wrote:
> > This patch adds support of XDP in native mode for hv_netvsc driver, and
> > transparently sets the XDP program on the associated VF NIC as well.
> >
> > XDP program cannot run with LRO (RSC) enabled, so you need to disable
> LRO
> > before running XDP:
> > ethtool -K eth0 lro off
> >
> > XDP actions not yet supported:
> > XDP_TX, XDP_REDIRECT
>
> I don't think we want to merge support without at least XDP_TX these
> days..
Thanks for your detailed comments --
I'm working on the XDP_TX...

>
> And without the ability to prepend headers this may be the least
> complete initial XDP implementation we've seen :(
The RNDIS packet buffer received by netvsc doesn't have a head room, but I'm
considering copy the packets to the page buffer, with a head room space
reserved for XDP.

>
> > Signed-off-by: Haiyang Zhang <[email protected]>
>
> > diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> > index d22a36f..688487b 100644
> > --- a/drivers/net/hyperv/netvsc.c
> > +++ b/drivers/net/hyperv/netvsc.c
> > @@ -122,8 +122,10 @@ static void free_netvsc_device(struct rcu_head
> *head)
> > vfree(nvdev->send_buf);
> > kfree(nvdev->send_section_map);
> >
> > - for (i = 0; i < VRSS_CHANNEL_MAX; i++)
> > + for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
> > + xdp_rxq_info_unreg(&nvdev->chan_table[i].xdp_rxq);
> > vfree(nvdev->chan_table[i].mrc.slots);
> > + }
> >
> > kfree(nvdev);
> > }
> > @@ -1370,6 +1372,10 @@ struct netvsc_device *netvsc_device_add(struct
> hv_device *device,
> > nvchan->net_device = net_device;
> > u64_stats_init(&nvchan->tx_stats.syncp);
> > u64_stats_init(&nvchan->rx_stats.syncp);
> > +
> > + xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i);
> > + xdp_rxq_info_reg_mem_model(&nvchan->xdp_rxq,
> > + MEM_TYPE_PAGE_SHARED, NULL);
>
> These can fail.
I will add error handling.

>
> > }
> >
> > /* Enable NAPI handler before init callbacks */
> > diff --git a/drivers/net/hyperv/netvsc_bpf.c
> b/drivers/net/hyperv/netvsc_bpf.c
> > new file mode 100644
> > index 0000000..4d235ac
> > --- /dev/null
> > +++ b/drivers/net/hyperv/netvsc_bpf.c
> > @@ -0,0 +1,211 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright (c) 2019, Microsoft Corporation.
> > + *
> > + * Author:
> > + * Haiyang Zhang <[email protected]>
> > + */
> > +
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +
> > +#include <linux/netdevice.h>
> > +#include <linux/etherdevice.h>
> > +#include <linux/ethtool.h>
> > +#include <linux/bpf.h>
> > +#include <linux/bpf_trace.h>
> > +#include <linux/kernel.h>
> > +#include <net/xdp.h>
> > +
> > +#include <linux/mutex.h>
> > +#include <linux/rtnetlink.h>
> > +
> > +#include "hyperv_net.h"
> > +
> > +u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel
> *nvchan,
> > + void **p_pbuf)
> > +{
> > + struct page *page = NULL;
> > + void *data = nvchan->rsc.data[0];
> > + u32 len = nvchan->rsc.len[0];
> > + void *pbuf = data;
> > + struct bpf_prog *prog;
> > + struct xdp_buff xdp;
> > + u32 act = XDP_PASS;
> > +
> > + *p_pbuf = NULL;
> > +
> > + rcu_read_lock();
> > + prog = rcu_dereference(nvchan->bpf_prog);
> > +
> > + if (!prog || nvchan->rsc.cnt > 1)
>
> Can rsc.cnt == 1 not be ensured at setup time? This looks quite
> limiting if random frames could be forced to bypass the filter.
Yes, the setup code already check/ensure LRO is disabled. So rsc.cnt > 1
is NOT expected here. Just an error check. I will change the return value
to XDP_ABORTED for this.

>
> > + goto out;
> > +
> > + /* copy to a new page buffer if data are not within a page */
> > + if (virt_to_page(data) != virt_to_page(data + len - 1)) {
> > + page = alloc_page(GFP_ATOMIC);
> > + if (!page)
> > + goto out;
>
> Returning XDP_PASS on allocation failure seems highly questionable.
I will change the return value to XDP_ABORTED for this too.

>
> > + pbuf = page_address(page);
> > + memcpy(pbuf, nvchan->rsc.data[0], len);
> > +
> > + *p_pbuf = pbuf;
> > + }
> > +
> > + xdp.data_hard_start = pbuf;
> > + xdp.data = xdp.data_hard_start;
>
> This patch also doesn't add any headroom for XDP to prepend data :(
I'm considering to add the headroom to the start of the page.

>
> > + xdp_set_data_meta_invalid(&xdp);
> > + xdp.data_end = xdp.data + len;
> > + xdp.rxq = &nvchan->xdp_rxq;
> > + xdp.handle = 0;
> > +
> > + act = bpf_prog_run_xdp(prog, &xdp);
> > +
> > + switch (act) {
> > + case XDP_PASS:
> > + /* Pass to upper layers */
> > + break;
> > +
> > + case XDP_ABORTED:
> > + trace_xdp_exception(ndev, prog, act);
> > + break;
> > +
> > + case XDP_DROP:
> > + break;
> > +
> > + default:
> > + bpf_warn_invalid_xdp_action(act);
> > + }
> > +
> > +out:
> > + rcu_read_unlock();
> > +
> > + if (page && act != XDP_PASS) {
> > + *p_pbuf = NULL;
> > + __free_page(page);
> > + }
> > +
> > + return act;
> > +}
> > +
> > +unsigned int netvsc_xdp_fraglen(unsigned int len)
> > +{
> > + return SKB_DATA_ALIGN(len) +
> > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > +}
> > +
> > +struct bpf_prog *netvsc_xdp_get(struct netvsc_device *nvdev)
> > +{
> > + return rtnl_dereference(nvdev->chan_table[0].bpf_prog);
> > +}
> > +
> > +int netvsc_xdp_set(struct net_device *dev, struct bpf_prog *prog,
> > + struct netvsc_device *nvdev)
> > +{
> > + struct bpf_prog *old_prog;
> > + int frag_max, i;
> > +
> > + old_prog = netvsc_xdp_get(nvdev);
> > +
> > + if (!old_prog && !prog)
> > + return 0;
>
> I think this case is now handled by the core.
Thanks for the reminder. I saw the code in dev_change_xdp_fd(), so the upper layer
doesn't call XDP_SETUP_PROG with old/new prog both NULL.
But this function is also called by other functions in our driver, like netvsc_detach(),
netvsc_remove(), etc. Instead of checking for NULL in each place, I still keep the check inside
netvsc_xdp_set().

>
> > + frag_max = netvsc_xdp_fraglen(dev->mtu + ETH_HLEN);
> > + if (prog && frag_max > PAGE_SIZE) {
> > + netdev_err(dev, "XDP: mtu:%u too large, frag:%u\n",
> > + dev->mtu, frag_max);
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + if (prog && (dev->features & NETIF_F_LRO)) {
> > + netdev_err(dev, "XDP: not support LRO\n");
>
> Please report this via extack, that way users will see it in the console
> in which they're installing the program.
I will.

>
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + if (prog) {
> > + prog = bpf_prog_add(prog, nvdev->num_chn);
> > + if (IS_ERR(prog))
> > + return PTR_ERR(prog);
> > + }
> > +
> > + for (i = 0; i < nvdev->num_chn; i++)
> > + rcu_assign_pointer(nvdev->chan_table[i].bpf_prog, prog);
> > +
> > + if (old_prog)
> > + for (i = 0; i < nvdev->num_chn; i++)
> > + bpf_prog_put(old_prog);
> > +
> > + return 0;
> > +}
> > +
> > +int netvsc_vf_setxdp(struct net_device *vf_netdev, struct bpf_prog
> *prog)
> > +{
> > + struct netdev_bpf xdp;
> > + bpf_op_t ndo_bpf;
> > +
> > + ASSERT_RTNL();
> > +
> > + if (!vf_netdev)
> > + return 0;
> > +
> > + ndo_bpf = vf_netdev->netdev_ops->ndo_bpf;
> > + if (!ndo_bpf)
> > + return 0;
> > +
> > + memset(&xdp, 0, sizeof(xdp));
> > +
> > + xdp.command = XDP_SETUP_PROG;
> > + xdp.prog = prog;
> > +
> > + return ndo_bpf(vf_netdev, &xdp);
>
> IMHO the automatic propagation is not a good idea. Especially if the
> propagation doesn't make the entire installation fail if VF doesn't
> have ndo_bpf.

On Hyperv and Azure hosts, VF is always acting as a slave below netvsc.
And they are both active -- most data packets go to VF, but broadcast,
multicast, and TCP SYN packets go to netvsc synthetic data path. The synthetic
NIC (netvsc) is also a failover NIC when VF is not available.
We ask customers to only use the synthetic NIC directly. So propagation
of XDP setting to VF NIC is desired.
But, I will change the return code to error, so the entire installation fails if VF is
present but unable to set XDP prog.

Thanks,
- Haiyang

2019-10-29 20:11:28

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next, 3/4] hv_netvsc: Add XDP support

On Tue, 29 Oct 2019 19:17:25 +0000, Haiyang Zhang wrote:
> > > +int netvsc_xdp_set(struct net_device *dev, struct bpf_prog *prog,
> > > + struct netvsc_device *nvdev)
> > > +{
> > > + struct bpf_prog *old_prog;
> > > + int frag_max, i;
> > > +
> > > + old_prog = netvsc_xdp_get(nvdev);
> > > +
> > > + if (!old_prog && !prog)
> > > + return 0;
> >
> > I think this case is now handled by the core.
> Thanks for the reminder. I saw the code in dev_change_xdp_fd(), so the upper layer
> doesn't call XDP_SETUP_PROG with old/new prog both NULL.
> But this function is also called by other functions in our driver, like netvsc_detach(),
> netvsc_remove(), etc. Instead of checking for NULL in each place, I still keep the check inside
> netvsc_xdp_set().

I see. Makes sense on a closer look.

BTW would you do me a favour and reformat this line:

static struct netvsc_device_info *netvsc_devinfo_get
(struct netvsc_device *nvdev)

to look like this:

static
struct netvsc_device_info *netvsc_devinfo_get(struct netvsc_device *nvdev)

or

static struct netvsc_device_info *
netvsc_devinfo_get(struct netvsc_device *nvdev)

Otherwise git diff gets confused about which function given chunk
belongs to. (Incorrectly thinking your patch is touching
netvsc_get_channels()). I spent few minutes trying to figure out what's
going on there :)

> >
> > > + return -EOPNOTSUPP;
> > > + }
> > > +
> > > + if (prog) {
> > > + prog = bpf_prog_add(prog, nvdev->num_chn);
> > > + if (IS_ERR(prog))
> > > + return PTR_ERR(prog);
> > > + }
> > > +
> > > + for (i = 0; i < nvdev->num_chn; i++)
> > > + rcu_assign_pointer(nvdev->chan_table[i].bpf_prog, prog);
> > > +
> > > + if (old_prog)
> > > + for (i = 0; i < nvdev->num_chn; i++)
> > > + bpf_prog_put(old_prog);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +int netvsc_vf_setxdp(struct net_device *vf_netdev, struct bpf_prog *prog)
> > > +{
> > > + struct netdev_bpf xdp;
> > > + bpf_op_t ndo_bpf;
> > > +
> > > + ASSERT_RTNL();
> > > +
> > > + if (!vf_netdev)
> > > + return 0;
> > > +
> > > + ndo_bpf = vf_netdev->netdev_ops->ndo_bpf;
> > > + if (!ndo_bpf)
> > > + return 0;
> > > +
> > > + memset(&xdp, 0, sizeof(xdp));
> > > +
> > > + xdp.command = XDP_SETUP_PROG;
> > > + xdp.prog = prog;
> > > +
> > > + return ndo_bpf(vf_netdev, &xdp);
> >
> > IMHO the automatic propagation is not a good idea. Especially if the
> > propagation doesn't make the entire installation fail if VF doesn't
> > have ndo_bpf.
>
> On Hyperv and Azure hosts, VF is always acting as a slave below netvsc.
> And they are both active -- most data packets go to VF, but broadcast,
> multicast, and TCP SYN packets go to netvsc synthetic data path. The synthetic
> NIC (netvsc) is also a failover NIC when VF is not available.
> We ask customers to only use the synthetic NIC directly. So propagation
> of XDP setting to VF NIC is desired.
> But, I will change the return code to error, so the entire installation fails if VF is
> present but unable to set XDP prog.

Okay, if I read the rest of the code correctly you also fail attach
if xdp propagation failed? If that's the case and we return an error
here on missing NDO, then the propagation could be okay.

So the semantics are these:

(a) install on virt - potentially overwrites the existing VF prog;
(b) install on VF is not noticed by virt;
(c) uninstall on virt - clears both virt and VF, regardless what
program was installed on virt;
(d) uninstall on VF does not propagate;

Since you're adding documentation it would perhaps be worth stating
there that touching the program on the VF is not supported/may lead
to breakage, and users should only touch/configure the program on the
virt.

2019-10-29 22:10:15

by Haiyang Zhang

[permalink] [raw]
Subject: RE: [PATCH net-next, 3/4] hv_netvsc: Add XDP support



> -----Original Message-----
> From: Jakub Kicinski <[email protected]>
> Sent: Tuesday, October 29, 2019 3:53 PM
> To: Haiyang Zhang <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; KY Srinivasan <[email protected]>; Stephen
> Hemminger <[email protected]>; [email protected]; vkuznets
> <[email protected]>; [email protected]; linux-
> [email protected]
> Subject: Re: [PATCH net-next, 3/4] hv_netvsc: Add XDP support
>
> On Tue, 29 Oct 2019 19:17:25 +0000, Haiyang Zhang wrote:
> > > > +int netvsc_xdp_set(struct net_device *dev, struct bpf_prog *prog,
> > > > + struct netvsc_device *nvdev)
> > > > +{
> > > > + struct bpf_prog *old_prog;
> > > > + int frag_max, i;
> > > > +
> > > > + old_prog = netvsc_xdp_get(nvdev);
> > > > +
> > > > + if (!old_prog && !prog)
> > > > + return 0;
> > >
> > > I think this case is now handled by the core.
> > Thanks for the reminder. I saw the code in dev_change_xdp_fd(), so the
> upper layer
> > doesn't call XDP_SETUP_PROG with old/new prog both NULL.
> > But this function is also called by other functions in our driver, like
> netvsc_detach(),
> > netvsc_remove(), etc. Instead of checking for NULL in each place, I still
> keep the check inside
> > netvsc_xdp_set().
>
> I see. Makes sense on a closer look.
>
> BTW would you do me a favour and reformat this line:
>
> static struct netvsc_device_info *netvsc_devinfo_get
> (struct netvsc_device *nvdev)
>
> to look like this:
>
> static
> struct netvsc_device_info *netvsc_devinfo_get(struct netvsc_device
> *nvdev)
>
> or
>
> static struct netvsc_device_info *
> netvsc_devinfo_get(struct netvsc_device *nvdev)
>
> Otherwise git diff gets confused about which function given chunk
> belongs to. (Incorrectly thinking your patch is touching
> netvsc_get_channels()). I spent few minutes trying to figure out what's
> going on there :)
I will.

>
> > >
> > > > + return -EOPNOTSUPP;
> > > > + }
> > > > +
> > > > + if (prog) {
> > > > + prog = bpf_prog_add(prog, nvdev->num_chn);
> > > > + if (IS_ERR(prog))
> > > > + return PTR_ERR(prog);
> > > > + }
> > > > +
> > > > + for (i = 0; i < nvdev->num_chn; i++)
> > > > + rcu_assign_pointer(nvdev->chan_table[i].bpf_prog, prog);
> > > > +
> > > > + if (old_prog)
> > > > + for (i = 0; i < nvdev->num_chn; i++)
> > > > + bpf_prog_put(old_prog);
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +int netvsc_vf_setxdp(struct net_device *vf_netdev, struct bpf_prog
> *prog)
> > > > +{
> > > > + struct netdev_bpf xdp;
> > > > + bpf_op_t ndo_bpf;
> > > > +
> > > > + ASSERT_RTNL();
> > > > +
> > > > + if (!vf_netdev)
> > > > + return 0;
> > > > +
> > > > + ndo_bpf = vf_netdev->netdev_ops->ndo_bpf;
> > > > + if (!ndo_bpf)
> > > > + return 0;
> > > > +
> > > > + memset(&xdp, 0, sizeof(xdp));
> > > > +
> > > > + xdp.command = XDP_SETUP_PROG;
> > > > + xdp.prog = prog;
> > > > +
> > > > + return ndo_bpf(vf_netdev, &xdp);
> > >
> > > IMHO the automatic propagation is not a good idea. Especially if the
> > > propagation doesn't make the entire installation fail if VF doesn't
> > > have ndo_bpf.
> >
> > On Hyperv and Azure hosts, VF is always acting as a slave below netvsc.
> > And they are both active -- most data packets go to VF, but broadcast,
> > multicast, and TCP SYN packets go to netvsc synthetic data path. The
> synthetic
> > NIC (netvsc) is also a failover NIC when VF is not available.
> > We ask customers to only use the synthetic NIC directly. So propagation
> > of XDP setting to VF NIC is desired.
> > But, I will change the return code to error, so the entire installation fails if
> VF is
> > present but unable to set XDP prog.
>
> Okay, if I read the rest of the code correctly you also fail attach
> if xdp propagation failed? If that's the case and we return an error
> here on missing NDO, then the propagation could be okay.
>
> So the semantics are these:
>
> (a) install on virt - potentially overwrites the existing VF prog;
> (b) install on VF is not noticed by virt;
> (c) uninstall on virt - clears both virt and VF, regardless what
> program was installed on virt;
> (d) uninstall on VF does not propagate;
>
> Since you're adding documentation it would perhaps be worth stating
> there that touching the program on the VF is not supported/may lead
> to breakage, and users should only touch/configure the program on the
> virt.

Sure I will document the recommended way of install xdp prog.

Thanks,
- Haiyang

2019-10-29 22:11:18

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH net-next, 3/4] hv_netvsc: Add XDP support

On Tue, 29 Oct 2019 19:17:25 +0000
Haiyang Zhang <[email protected]> wrote:

> > -----Original Message-----
> > From: Jakub Kicinski <[email protected]>
> > Sent: Monday, October 28, 2019 5:33 PM
> > To: Haiyang Zhang <[email protected]>
> > Cc: [email protected]; [email protected];
> > [email protected]; KY Srinivasan <[email protected]>; Stephen
> > Hemminger <[email protected]>; [email protected]; vkuznets
> > <[email protected]>; [email protected]; linux-
> > [email protected]
> > Subject: Re: [PATCH net-next, 3/4] hv_netvsc: Add XDP support
> >
> > On Mon, 28 Oct 2019 21:07:04 +0000, Haiyang Zhang wrote:
> > > This patch adds support of XDP in native mode for hv_netvsc driver, and
> > > transparently sets the XDP program on the associated VF NIC as well.
> > >
> > > XDP program cannot run with LRO (RSC) enabled, so you need to disable
> > LRO
> > > before running XDP:
> > > ethtool -K eth0 lro off
> > >
> > > XDP actions not yet supported:
> > > XDP_TX, XDP_REDIRECT
> >
> > I don't think we want to merge support without at least XDP_TX these
> > days..
> Thanks for your detailed comments --
> I'm working on the XDP_TX...
>
> >
> > And without the ability to prepend headers this may be the least
> > complete initial XDP implementation we've seen :(
> The RNDIS packet buffer received by netvsc doesn't have a head room, but I'm
> considering copy the packets to the page buffer, with a head room space
> reserved for XDP.


There is a small amount of headroom available by reusing the RNDIS
header and packet space. Looks like 40 bytes or so.

2019-10-29 22:14:22

by Haiyang Zhang

[permalink] [raw]
Subject: RE: [PATCH net-next, 3/4] hv_netvsc: Add XDP support



> -----Original Message-----
> From: Stephen Hemminger <[email protected]>
> Sent: Tuesday, October 29, 2019 5:59 PM
> To: Haiyang Zhang <[email protected]>
> Cc: Jakub Kicinski <[email protected]>; [email protected];
> [email protected]; [email protected]; KY Srinivasan
> <[email protected]>; Stephen Hemminger <[email protected]>;
> [email protected]; vkuznets <[email protected]>; [email protected];
> [email protected]
> Subject: Re: [PATCH net-next, 3/4] hv_netvsc: Add XDP support
>
> On Tue, 29 Oct 2019 19:17:25 +0000
> Haiyang Zhang <[email protected]> wrote:
>
> > > -----Original Message-----
> > > From: Jakub Kicinski <[email protected]>
> > > Sent: Monday, October 28, 2019 5:33 PM
> > > To: Haiyang Zhang <[email protected]>
> > > Cc: [email protected]; [email protected];
> > > [email protected]; KY Srinivasan <[email protected]>; Stephen
> > > Hemminger <[email protected]>; [email protected]; vkuznets
> > > <[email protected]>; [email protected]; linux-
> > > [email protected]
> > > Subject: Re: [PATCH net-next, 3/4] hv_netvsc: Add XDP support
> > >
> > > On Mon, 28 Oct 2019 21:07:04 +0000, Haiyang Zhang wrote:
> > > > This patch adds support of XDP in native mode for hv_netvsc driver,
> and
> > > > transparently sets the XDP program on the associated VF NIC as well.
> > > >
> > > > XDP program cannot run with LRO (RSC) enabled, so you need to
> disable
> > > LRO
> > > > before running XDP:
> > > > ethtool -K eth0 lro off
> > > >
> > > > XDP actions not yet supported:
> > > > XDP_TX, XDP_REDIRECT
> > >
> > > I don't think we want to merge support without at least XDP_TX these
> > > days..
> > Thanks for your detailed comments --
> > I'm working on the XDP_TX...
> >
> > >
> > > And without the ability to prepend headers this may be the least
> > > complete initial XDP implementation we've seen :(
> > The RNDIS packet buffer received by netvsc doesn't have a head room, but
> I'm
> > considering copy the packets to the page buffer, with a head room space
> > reserved for XDP.
>
>
> There is a small amount of headroom available by reusing the RNDIS
> header and packet space. Looks like 40 bytes or so.
Yes, I thought about the RNDIS header. But is this space sufficient?
I saw some drivers, like virtio_net gives bigger headroom: 256 bytes.

Thanks,
- Haiyang

2019-11-01 20:44:50

by Markus Elfring

[permalink] [raw]
Subject: Re: [PATCH net-next, 2/4] hv_netvsc: Fix error handling in netvsc_attach()

> If rndis_filter_open() fails, we need to remove the rndis device created
> in earlier steps, before returning an error code. Otherwise, the retry of
> netvsc_attach() from its callers will fail and hang.

How do you think about to choose a more “imperative mood” for your
change description?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=0dbe6cb8f7e05bc9611602ef45980a6c57b245a3#n151



> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -982,7 +982,7 @@ static int netvsc_attach(struct net_device *ndev,
> if (netif_running(ndev)) {
> ret = rndis_filter_open(nvdev);
> if (ret)
> - return ret;
> + goto err;
>
> rdev = nvdev->extension;
> if (!rdev->link_state)


I would prefer to specify the completed exception handling
(addition of two function calls) by a compound statement in
the shown if branch directly.

If you would insist to use a goto statement, I find an other label
more appropriate according to the Linux coding style.

Regards,
Markus

2019-11-04 15:11:57

by Haiyang Zhang

[permalink] [raw]
Subject: RE: [PATCH net-next, 2/4] hv_netvsc: Fix error handling in netvsc_attach()



> -----Original Message-----
> From: Markus Elfring <[email protected]>
> Sent: Friday, November 1, 2019 4:43 PM
> To: Haiyang Zhang <[email protected]>; linux-
> [email protected]; [email protected]
> Cc: [email protected]; [email protected]; David S.
> Miller <[email protected]>; KY Srinivasan <[email protected]>; Olaf
> Hering <[email protected]>; Sasha Levin <[email protected]>; Stephen
> Hemminger <[email protected]>; vkuznets <[email protected]>
> Subject: Re: [PATCH net-next, 2/4] hv_netvsc: Fix error handling in
> netvsc_attach()
>
> > If rndis_filter_open() fails, we need to remove the rndis device
> > created in earlier steps, before returning an error code. Otherwise,
> > the retry of
> > netvsc_attach() from its callers will fail and hang.
>
> How do you think about to choose a more “imperative mood” for your
> change description?
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.k
> ernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%
> 2Ftree%2FDocumentation%2Fprocess%2Fsubmitting-
> patches.rst%3Fid%3D0dbe6cb8f7e05bc9611602ef45980a6c57b245a3%23n151
> &amp;data=02%7C01%7Chaiyangz%40microsoft.com%7C162aa016f45e4293c
> abb08d75f0c0fee%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637
> 082377796295159&amp;sdata=ytjxGYTPI2D4BoNbslKPvBbsfGUEM7hXj1YAiG
> hn8Ik%3D&amp;reserved=0
Agreed. Thanks.


>
>
> …
> > +++ b/drivers/net/hyperv/netvsc_drv.c
> > @@ -982,7 +982,7 @@ static int netvsc_attach(struct net_device *ndev,
> > if (netif_running(ndev)) {
> > ret = rndis_filter_open(nvdev);
> > if (ret)
> > - return ret;
> > + goto err;
> >
> > rdev = nvdev->extension;
> > if (!rdev->link_state)
> …
>
> I would prefer to specify the completed exception handling (addition of two
> function calls) by a compound statement in the shown if branch directly.
>
> If you would insist to use a goto statement, I find an other label more
> appropriate according to the Linux coding style.

I will have more patches that make multiple entry points of error clean up
steps -- so I used goto instead of putting the functions in each if-branch.

I will name the labels more meaningfully.

Thanks,
- Haiyang