2020-03-26 14:06:59

by Jason Wang

[permalink] [raw]
Subject: [PATCH V9 0/9] vDPA support

Hi all:

This is an update version of vDPA support in kernel.

vDPA device is a device that uses a datapath which complies with the
virtio specifications with vendor specific control path. vDPA devices
can be both physically located on the hardware or emulated by
software. vDPA hardware devices are usually implemented through PCIE
with the following types:

- PF (Physical Function) - A single Physical Function
- VF (Virtual Function) - Device that supports single root I/O
virtualization (SR-IOV). Its Virtual Function (VF) represents a
virtualized instance of the device that can be assigned to different
partitions
- ADI (Assignable Device Interface) and its equivalents - With
technologies such as Intel Scalable IOV, a virtual device (VDEV)
composed by host OS utilizing one or more ADIs. Or its equivalent
like SF (Sub function) from Mellanox.

From a driver's perspective, depends on how and where the DMA
translation is done, vDPA devices are split into two types:

- Platform specific DMA translation - From the driver's perspective,
the device can be used on a platform where device access to data in
memory is limited and/or translated. An example is a PCIE vDPA whose
DMA request was tagged via a bus (e.g PCIE) specific way. DMA
translation and protection are done at PCIE bus IOMMU level.
- Device specific DMA translation - The device implements DMA
isolation and protection through its own logic. An example is a vDPA
device which uses on-chip IOMMU.

To hide the differences and complexity of the above types for a vDPA
device/IOMMU options and in order to present a generic virtio device
to the upper layer, a device agnostic framework is required.

This series introduces a software vDPA bus which abstracts the
common attributes of vDPA device, vDPA bus driver and the
communication method, the bus operations (vdpa_config_ops) between the
vDPA device abstraction and the vDPA bus driver. This allows multiple
types of drivers to be used for vDPA device like the virtio_vdpa and
vhost_vdpa driver to operate on the bus and allow vDPA device could be
used by either kernel virtio driver or userspace vhost drivers as:

virtio drivers vhost drivers
| |
[virtio bus] [vhost uAPI]
| |
virtio device vhost device
virtio_vdpa drv vhost_vdpa drv
\ /
[vDPA bus]
|
vDPA device
hardware drv
|
[hardware bus]
|
vDPA hardware

virtio_vdpa driver is a transport implementation for kernel virtio
drivers on top of vDPA bus operations. An alternative is to refactor
virtio bus which is sub-optimal since the bus and drivers are designed
to be use by kernel subsystem, a non-trivial major refactoring is
needed which may impact a brunches of drivers and devices
implementation inside the kernel. Using a new transport may grealy
simply both the design and changes.

vhost_vdpa driver is a new type of vhost device which allows userspace
vhost drivers to use vDPA devices via vhost uAPI (with minor
extension). This help to minimize the changes of existed vhost drivers
for using vDPA devices.

With the abstraction of vDPA bus and vDPA bus operations, the
difference and complexity of the under layer hardware is hidden from
upper layer. The vDPA bus drivers on top can use a unified
vdpa_config_ops to control different types of vDPA device.

Two drivers were implemented with the framework introduced in this
series:

- Intel IFC VF driver which depends on the platform IOMMU for DMA
translation
- VDPA simulator which is a software test device with an emulated
onchip IOMMU

Future work:

- direct doorbell mapping support
- control virtqueue support
- dirty page tracking support
- direct interrupt support
- management API (devlink)

Please review.

Thanks

Changes from V8:

- switch to use devres for PCI resoureces of IFCVF (Jason)
- require the caller of vdap_alloc_device() to use "struct foo"
instead of foo (Jason)
- some tweaks on the IFCVF driver

Changes from V7:

- refine kconfig to solve the dependency issues on archs that lacks of
CONFIG_VIRTUALIZATION (kbuild)

Changes from V6:

- vdpa_alloc_device() will allocate parent strcture (Jason)
- remove the vdpa core dev info in IFCVF patch (Jason)
- provide a free method in the vdpa bus operations for drivet to free
private data
- switch to use new layout in vdapsim and IFCVF
- make IFCVF depends on PCI_MSI (kbuild)
- some tweaks on the IFCVF driver

Changes from V5:

- include Intel IFCVF driver and vhost-vdpa drivers
- add the platform IOMMU support for vhost-vdpa
- check the return value of dev_set_name() (Jason)
- various tweaks and fixes

Changes from V4:

- use put_device() instead of kfree when fail to register virtio
device (Jason)
- simplify the error handling when allocating vdpasim device (Jason)
- don't use device_for_each_child() during module exit (Jason)
- correct the error checking for vdpa_alloc_device() (Harpreet, Lingshan)

Changes from V3:

- various Kconfig fixes (Randy)

Changes from V2:

- release idr in the release function for put_device() unwind (Jason)
- don't panic when fail to register vdpa bus (Jason)
- use unsigned int instead of int for ida (Jason)
- fix the wrong commit log in virito_vdpa patches (Jason)
- make vdpa_sim depends on RUNTIME_TESTING_MENU (Michael)
- provide a bus release function for vDPA device (Jason)
- fix the wrong unwind when creating devices for vDPA simulator (Jason)
- move vDPA simulator to a dedicated directory (Lingshan)
- cancel the work before release vDPA simulator

Changes from V1:

- drop sysfs API, leave the management interface to future development
(Michael)
- introduce incremental DMA ops (dma_map/dma_unmap) (Michael)
- introduce dma_device and use it instead of parent device for doing
IOMMU or DMA from bus driver (Michael, Jason, Ling Shan, Tiwei)
- accept parent device and dma device when register vdpa device
- coding style and compile fixes (Randy)
- using vdpa_xxx instead of xxx_vdpa (Jason)
- ove vDPA accessors to header and make it static inline (Jason)
- split vdp_register_device() into two helpers vdpa_init_device() and
vdpa_register_device() which allows intermediate step to be done (Jason)
- warn on invalidate queue state when fail to creating virtqueue (Jason)
- make to_virtio_vdpa_device() static (Jason)
- use kmalloc/kfree instead of devres for virtio vdpa device (Jason)
- avoid using cast in vdpa bus function (Jason)
- introduce module_vdpa_driver and fix module refcnt (Jason)
- fix returning freed address in vdapsim coherent DMA addr allocation (Dan)
- various other fixes and tweaks

V8: https://lkml.org/lkml/2020/3/25/125
V7: https://lkml.org/lkml/2020/3/24/21
V6: https://lkml.org/lkml/2020/3/18/88
V5: https://lkml.org/lkml/2020/2/26/58
V4: https://lkml.org/lkml/2020/2/20/59
V3: https://lkml.org/lkml/2020/2/19/1347
V2: https://lkml.org/lkml/2020/2/9/275
V1: https://lkml.org/lkml/2020/1/16/353

Jason Wang (7):
vhost: refine vhost and vringh kconfig
vhost: allow per device message handler
vhost: factor out IOTLB
vringh: IOTLB support
vDPA: introduce vDPA bus
virtio: introduce a vDPA based transport
vdpasim: vDPA device simulator

Tiwei Bie (1):
vhost: introduce vDPA-based backend

Zhu Lingshan (1):
virtio: Intel IFC VF driver for VDPA

MAINTAINERS | 2 +
arch/arm/kvm/Kconfig | 2 -
arch/arm64/kvm/Kconfig | 2 -
arch/mips/kvm/Kconfig | 2 -
arch/powerpc/kvm/Kconfig | 2 -
arch/s390/kvm/Kconfig | 4 -
arch/x86/kvm/Kconfig | 4 -
drivers/Kconfig | 2 +
drivers/misc/mic/Kconfig | 4 -
drivers/net/caif/Kconfig | 4 -
drivers/vhost/Kconfig | 42 +-
drivers/vhost/Kconfig.vringh | 6 -
drivers/vhost/Makefile | 6 +
drivers/vhost/iotlb.c | 177 +++++
drivers/vhost/net.c | 5 +-
drivers/vhost/scsi.c | 2 +-
drivers/vhost/vdpa.c | 883 ++++++++++++++++++++++++
drivers/vhost/vhost.c | 233 +++----
drivers/vhost/vhost.h | 45 +-
drivers/vhost/vringh.c | 421 ++++++++++-
drivers/vhost/vsock.c | 2 +-
drivers/virtio/Kconfig | 15 +
drivers/virtio/Makefile | 2 +
drivers/virtio/vdpa/Kconfig | 37 +
drivers/virtio/vdpa/Makefile | 4 +
drivers/virtio/vdpa/ifcvf/Makefile | 3 +
drivers/virtio/vdpa/ifcvf/ifcvf_base.c | 389 +++++++++++
drivers/virtio/vdpa/ifcvf/ifcvf_base.h | 118 ++++
drivers/virtio/vdpa/ifcvf/ifcvf_main.c | 435 ++++++++++++
drivers/virtio/vdpa/vdpa.c | 180 +++++
drivers/virtio/vdpa/vdpa_sim/Makefile | 2 +
drivers/virtio/vdpa/vdpa_sim/vdpa_sim.c | 629 +++++++++++++++++
drivers/virtio/virtio_vdpa.c | 396 +++++++++++
include/linux/vdpa.h | 253 +++++++
include/linux/vhost_iotlb.h | 47 ++
include/linux/vringh.h | 36 +
include/uapi/linux/vhost.h | 24 +
include/uapi/linux/vhost_types.h | 8 +
38 files changed, 4180 insertions(+), 248 deletions(-)
delete mode 100644 drivers/vhost/Kconfig.vringh
create mode 100644 drivers/vhost/iotlb.c
create mode 100644 drivers/vhost/vdpa.c
create mode 100644 drivers/virtio/vdpa/Kconfig
create mode 100644 drivers/virtio/vdpa/Makefile
create mode 100644 drivers/virtio/vdpa/ifcvf/Makefile
create mode 100644 drivers/virtio/vdpa/ifcvf/ifcvf_base.c
create mode 100644 drivers/virtio/vdpa/ifcvf/ifcvf_base.h
create mode 100644 drivers/virtio/vdpa/ifcvf/ifcvf_main.c
create mode 100644 drivers/virtio/vdpa/vdpa.c
create mode 100644 drivers/virtio/vdpa/vdpa_sim/Makefile
create mode 100644 drivers/virtio/vdpa/vdpa_sim/vdpa_sim.c
create mode 100644 drivers/virtio/virtio_vdpa.c
create mode 100644 include/linux/vdpa.h
create mode 100644 include/linux/vhost_iotlb.h

--
2.20.1


2020-03-26 14:08:30

by Jason Wang

[permalink] [raw]
Subject: [PATCH V9 9/9] virtio: Intel IFC VF driver for VDPA

From: Zhu Lingshan <[email protected]>

This commit introduced two layers to drive IFC VF:

(1) ifcvf_base layer, which handles IFC VF NIC hardware operations and
configurations.

(2) ifcvf_main layer, which complies to VDPA bus framework,
implemented device operations for VDPA bus, handles device probe,
bus attaching, vring operations, etc.

Signed-off-by: Zhu Lingshan <[email protected]>
Signed-off-by: Bie Tiwei <[email protected]>
Signed-off-by: Wang Xiao <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
---
drivers/virtio/vdpa/Kconfig | 11 +
drivers/virtio/vdpa/Makefile | 1 +
drivers/virtio/vdpa/ifcvf/Makefile | 3 +
drivers/virtio/vdpa/ifcvf/ifcvf_base.c | 389 ++++++++++++++++++++++
drivers/virtio/vdpa/ifcvf/ifcvf_base.h | 118 +++++++
drivers/virtio/vdpa/ifcvf/ifcvf_main.c | 435 +++++++++++++++++++++++++
6 files changed, 957 insertions(+)
create mode 100644 drivers/virtio/vdpa/ifcvf/Makefile
create mode 100644 drivers/virtio/vdpa/ifcvf/ifcvf_base.c
create mode 100644 drivers/virtio/vdpa/ifcvf/ifcvf_base.h
create mode 100644 drivers/virtio/vdpa/ifcvf/ifcvf_main.c

diff --git a/drivers/virtio/vdpa/Kconfig b/drivers/virtio/vdpa/Kconfig
index ae204465964e..7db1460104b7 100644
--- a/drivers/virtio/vdpa/Kconfig
+++ b/drivers/virtio/vdpa/Kconfig
@@ -23,4 +23,15 @@ config VDPA_SIM
to RX. This device is used for testing, prototyping and
development of vDPA.

+config IFCVF
+ tristate "Intel IFC VF VDPA driver"
+ depends on PCI_MSI
+ select VDPA
+ default n
+ help
+ This kernel module can drive Intel IFC VF NIC to offload
+ virtio dataplane traffic to hardware.
+ To compile this driver as a module, choose M here: the module will
+ be called ifcvf.
+
endif # VDPA_MENU
diff --git a/drivers/virtio/vdpa/Makefile b/drivers/virtio/vdpa/Makefile
index 3814af8e097b..8bbb686ca7a2 100644
--- a/drivers/virtio/vdpa/Makefile
+++ b/drivers/virtio/vdpa/Makefile
@@ -1,3 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_VDPA) += vdpa.o
obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
+obj-$(CONFIG_IFCVF) += ifcvf/
diff --git a/drivers/virtio/vdpa/ifcvf/Makefile b/drivers/virtio/vdpa/ifcvf/Makefile
new file mode 100644
index 000000000000..d709915995ab
--- /dev/null
+++ b/drivers/virtio/vdpa/ifcvf/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_IFCVF) += ifcvf.o
+ifcvf-$(CONFIG_IFCVF) += ifcvf_main.o ifcvf_base.o
diff --git a/drivers/virtio/vdpa/ifcvf/ifcvf_base.c b/drivers/virtio/vdpa/ifcvf/ifcvf_base.c
new file mode 100644
index 000000000000..b61b06ea26d3
--- /dev/null
+++ b/drivers/virtio/vdpa/ifcvf/ifcvf_base.c
@@ -0,0 +1,389 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Intel IFC VF NIC driver for virtio dataplane offloading
+ *
+ * Copyright (C) 2020 Intel Corporation.
+ *
+ * Author: Zhu Lingshan <[email protected]>
+ *
+ */
+
+#include "ifcvf_base.h"
+
+static inline u8 ifc_ioread8(u8 __iomem *addr)
+{
+ return ioread8(addr);
+}
+static inline u16 ifc_ioread16 (__le16 __iomem *addr)
+{
+ return ioread16(addr);
+}
+
+static inline u32 ifc_ioread32(__le32 __iomem *addr)
+{
+ return ioread32(addr);
+}
+
+static inline void ifc_iowrite8(u8 value, u8 __iomem *addr)
+{
+ iowrite8(value, addr);
+}
+
+static inline void ifc_iowrite16(u16 value, __le16 __iomem *addr)
+{
+ iowrite16(value, addr);
+}
+
+static inline void ifc_iowrite32(u32 value, __le32 __iomem *addr)
+{
+ iowrite32(value, addr);
+}
+
+static void ifc_iowrite64_twopart(u64 val,
+ __le32 __iomem *lo, __le32 __iomem *hi)
+{
+ ifc_iowrite32((u32)val, lo);
+ ifc_iowrite32(val >> 32, hi);
+}
+
+struct ifcvf_adapter *vf_to_adapter(struct ifcvf_hw *hw)
+{
+ return container_of(hw, struct ifcvf_adapter, vf);
+}
+
+static void __iomem *get_cap_addr(struct ifcvf_hw *hw,
+ struct virtio_pci_cap *cap)
+{
+ struct ifcvf_adapter *ifcvf;
+ struct pci_dev *pdev;
+ u32 length, offset;
+ u8 bar;
+
+ length = le32_to_cpu(cap->length);
+ offset = le32_to_cpu(cap->offset);
+ bar = cap->bar;
+
+ ifcvf= vf_to_adapter(hw);
+ pdev = ifcvf->pdev;
+
+ if (bar >= IFCVF_PCI_MAX_RESOURCE) {
+ IFCVF_DBG(pdev,
+ "Invalid bar number %u to get capabilities\n", bar);
+ return NULL;
+ }
+
+ if (offset + length > pci_resource_len(pdev, bar)) {
+ IFCVF_DBG(pdev,
+ "offset(%u) + len(%u) overflows bar%u's capability\n",
+ offset, length, bar);
+ return NULL;
+ }
+
+ return hw->base[bar] + offset;
+}
+
+static int ifcvf_read_config_range(struct pci_dev *dev,
+ uint32_t *val, int size, int where)
+{
+ int ret, i;
+
+ for (i = 0; i < size; i += 4) {
+ ret = pci_read_config_dword(dev, where + i, val + i / 4);
+ if (ret < 0)
+ return ret;
+ }
+
+ return 0;
+}
+
+int ifcvf_init_hw(struct ifcvf_hw *hw, struct pci_dev *pdev)
+{
+ struct virtio_pci_cap cap;
+ u16 notify_off;
+ int ret;
+ u8 pos;
+ u32 i;
+
+ ret = pci_read_config_byte(pdev, PCI_CAPABILITY_LIST, &pos);
+ if (ret < 0) {
+ IFCVF_ERR(pdev, "Failed to read PCI capability list\n");
+ return -EIO;
+ }
+
+ while (pos) {
+ ret = ifcvf_read_config_range(pdev, (u32 *)&cap,
+ sizeof(cap), pos);
+ if (ret < 0) {
+ IFCVF_ERR(pdev,
+ "Failed to get PCI capability at %x\n", pos);
+ break;
+ }
+
+ if (cap.cap_vndr != PCI_CAP_ID_VNDR)
+ goto next;
+
+ switch (cap.cfg_type) {
+ case VIRTIO_PCI_CAP_COMMON_CFG:
+ hw->common_cfg = get_cap_addr(hw, &cap);
+ IFCVF_DBG(pdev, "hw->common_cfg = %p\n",
+ hw->common_cfg);
+ break;
+ case VIRTIO_PCI_CAP_NOTIFY_CFG:
+ pci_read_config_dword(pdev, pos + sizeof(cap),
+ &hw->notify_off_multiplier);
+ hw->notify_bar = cap.bar;
+ hw->notify_base = get_cap_addr(hw, &cap);
+ IFCVF_DBG(pdev, "hw->notify_base = %p\n",
+ hw->notify_base);
+ break;
+ case VIRTIO_PCI_CAP_ISR_CFG:
+ hw->isr = get_cap_addr(hw, &cap);
+ IFCVF_DBG(pdev, "hw->isr = %p\n", hw->isr);
+ break;
+ case VIRTIO_PCI_CAP_DEVICE_CFG:
+ hw->net_cfg = get_cap_addr(hw, &cap);
+ IFCVF_DBG(pdev, "hw->net_cfg = %p\n", hw->net_cfg);
+ break;
+ }
+
+next:
+ pos = cap.cap_next;
+ }
+
+ if (hw->common_cfg == NULL || hw->notify_base == NULL ||
+ hw->isr == NULL || hw->net_cfg == NULL) {
+ IFCVF_ERR(pdev, "Incomplete PCI capabilities\n");
+ return -EIO;
+ }
+
+ for (i = 0; i < IFCVF_MAX_QUEUE_PAIRS * 2; i++) {
+ ifc_iowrite16(i, &hw->common_cfg->queue_select);
+ notify_off = ifc_ioread16(&hw->common_cfg->queue_notify_off);
+ hw->vring[i].notify_addr = hw->notify_base +
+ notify_off * hw->notify_off_multiplier;
+ }
+
+ hw->lm_cfg = hw->base[IFCVF_LM_BAR];
+
+ IFCVF_DBG(pdev,
+ "PCI capability mapping: common cfg: %p, notify base: %p\n, isr cfg: %p, device cfg: %p, multiplier: %u\n",
+ hw->common_cfg, hw->notify_base, hw->isr,
+ hw->net_cfg, hw->notify_off_multiplier);
+
+ return 0;
+}
+
+u8 ifcvf_get_status(struct ifcvf_hw *hw)
+{
+ return ifc_ioread8(&hw->common_cfg->device_status);
+}
+
+void ifcvf_set_status(struct ifcvf_hw *hw, u8 status)
+{
+ ifc_iowrite8(status, &hw->common_cfg->device_status);
+}
+
+void ifcvf_reset(struct ifcvf_hw *hw)
+{
+ ifcvf_set_status(hw, 0);
+ /* flush set_status, make sure VF is stopped, reset */
+ ifcvf_get_status(hw);
+}
+
+static void ifcvf_add_status(struct ifcvf_hw *hw, u8 status)
+{
+ if (status != 0)
+ status |= ifcvf_get_status(hw);
+
+ ifcvf_set_status(hw, status);
+ ifcvf_get_status(hw);
+}
+
+u64 ifcvf_get_features(struct ifcvf_hw *hw)
+{
+ struct virtio_pci_common_cfg __iomem *cfg = hw->common_cfg;
+ u32 features_lo, features_hi;
+
+ ifc_iowrite32(0, &cfg->device_feature_select);
+ features_lo = ifc_ioread32(&cfg->device_feature);
+
+ ifc_iowrite32(1, &cfg->device_feature_select);
+ features_hi = ifc_ioread32(&cfg->device_feature);
+
+ return ((u64)features_hi << 32) | features_lo;
+}
+
+void ifcvf_read_net_config(struct ifcvf_hw *hw, u64 offset,
+ void *dst, int length)
+{
+ u8 old_gen, new_gen, *p;
+ int i;
+
+ WARN_ON(offset + length > sizeof(struct virtio_net_config));
+ do {
+ old_gen = ifc_ioread8(&hw->common_cfg->config_generation);
+ p = dst;
+ for (i = 0; i < length; i++)
+ *p++ = ifc_ioread8(hw->net_cfg + offset + i);
+
+ new_gen = ifc_ioread8(&hw->common_cfg->config_generation);
+ } while (old_gen != new_gen);
+}
+
+void ifcvf_write_net_config(struct ifcvf_hw *hw, u64 offset,
+ const void *src, int length)
+{
+ const u8 *p;
+ int i;
+
+ p = src;
+ WARN_ON(offset + length > sizeof(struct virtio_net_config));
+ for (i = 0; i < length; i++)
+ ifc_iowrite8(*p++, hw->net_cfg + offset + i);
+}
+
+static void ifcvf_set_features(struct ifcvf_hw *hw, u64 features)
+{
+ struct virtio_pci_common_cfg __iomem *cfg = hw->common_cfg;
+
+ ifc_iowrite32(0, &cfg->guest_feature_select);
+ ifc_iowrite32((u32)features, &cfg->guest_feature);
+
+ ifc_iowrite32(1, &cfg->guest_feature_select);
+ ifc_iowrite32(features >> 32, &cfg->guest_feature);
+}
+
+static int ifcvf_config_features(struct ifcvf_hw *hw)
+{
+ struct ifcvf_adapter *ifcvf;
+
+ ifcvf = vf_to_adapter(hw);
+ ifcvf_set_features(hw, hw->req_features);
+ ifcvf_add_status(hw, VIRTIO_CONFIG_S_FEATURES_OK);
+
+ if (!(ifcvf_get_status(hw) & VIRTIO_CONFIG_S_FEATURES_OK)) {
+ IFCVF_ERR(ifcvf->pdev, "Failed to set FEATURES_OK status\n");
+ return -EIO;
+ }
+
+ return 0;
+}
+
+u64 ifcvf_get_vq_state(struct ifcvf_hw *hw, u16 qid)
+{
+ struct ifcvf_lm_cfg __iomem *ifcvf_lm;
+ void __iomem *avail_idx_addr;
+ u16 last_avail_idx;
+ u32 q_pair_id;
+
+ ifcvf_lm = (struct ifcvf_lm_cfg __iomem *)hw->lm_cfg;
+ q_pair_id = qid / (IFCVF_MAX_QUEUE_PAIRS * 2);
+ avail_idx_addr = &ifcvf_lm->vring_lm_cfg[q_pair_id].idx_addr[qid % 2];
+ last_avail_idx = ifc_ioread16(avail_idx_addr);
+
+ return last_avail_idx;
+}
+
+int ifcvf_set_vq_state(struct ifcvf_hw *hw, u16 qid, u64 num)
+{
+ struct ifcvf_lm_cfg __iomem *ifcvf_lm;
+ void __iomem *avail_idx_addr;
+ u32 q_pair_id;
+
+ ifcvf_lm = (struct ifcvf_lm_cfg __iomem *)hw->lm_cfg;
+ q_pair_id = qid / (IFCVF_MAX_QUEUE_PAIRS * 2);
+ avail_idx_addr = &ifcvf_lm->vring_lm_cfg[q_pair_id].idx_addr[qid % 2];
+ hw->vring[qid].last_avail_idx = num;
+ ifc_iowrite16(num, avail_idx_addr);
+
+ return 0;
+}
+
+static int ifcvf_hw_enable(struct ifcvf_hw *hw)
+{
+ struct ifcvf_lm_cfg __iomem *ifcvf_lm;
+ struct virtio_pci_common_cfg __iomem *cfg;
+ struct ifcvf_adapter *ifcvf;
+ u32 i;
+
+ ifcvf_lm = (struct ifcvf_lm_cfg __iomem *)hw->lm_cfg;
+ ifcvf = vf_to_adapter(hw);
+ cfg = hw->common_cfg;
+ ifc_iowrite16(IFCVF_MSI_CONFIG_OFF, &cfg->msix_config);
+
+ if (ifc_ioread16(&cfg->msix_config) == VIRTIO_MSI_NO_VECTOR) {
+ IFCVF_ERR(ifcvf->pdev, "No msix vector for device config\n");
+ return -EINVAL;
+ }
+
+ for (i = 0; i < hw->nr_vring; i++) {
+ if (!hw->vring[i].ready)
+ break;
+
+ ifc_iowrite16(i, &cfg->queue_select);
+ ifc_iowrite64_twopart(hw->vring[i].desc, &cfg->queue_desc_lo,
+ &cfg->queue_desc_hi);
+ ifc_iowrite64_twopart(hw->vring[i].avail, &cfg->queue_avail_lo,
+ &cfg->queue_avail_hi);
+ ifc_iowrite64_twopart(hw->vring[i].used, &cfg->queue_used_lo,
+ &cfg->queue_used_hi);
+ ifc_iowrite16(hw->vring[i].size, &cfg->queue_size);
+ ifc_iowrite16(i + IFCVF_MSI_QUEUE_OFF, &cfg->queue_msix_vector);
+
+ if (ifc_ioread16(&cfg->queue_msix_vector) ==
+ VIRTIO_MSI_NO_VECTOR) {
+ IFCVF_ERR(ifcvf->pdev,
+ "No msix vector for queue %u\n", i);
+ return -EINVAL;
+ }
+
+ ifcvf_set_vq_state(hw, i, hw->vring[i].last_avail_idx);
+ ifc_iowrite16(1, &cfg->queue_enable);
+ }
+
+ return 0;
+}
+
+static void ifcvf_hw_disable(struct ifcvf_hw *hw)
+{
+ struct virtio_pci_common_cfg __iomem *cfg;
+ u32 i;
+
+ cfg = hw->common_cfg;
+ ifc_iowrite16(VIRTIO_MSI_NO_VECTOR, &cfg->msix_config);
+
+ for (i = 0; i < hw->nr_vring; i++) {
+ ifc_iowrite16(i, &cfg->queue_select);
+ ifc_iowrite16(VIRTIO_MSI_NO_VECTOR, &cfg->queue_msix_vector);
+ }
+
+ ifc_ioread16(&cfg->queue_msix_vector);
+}
+
+int ifcvf_start_hw(struct ifcvf_hw *hw)
+{
+ ifcvf_reset(hw);
+ ifcvf_add_status(hw, VIRTIO_CONFIG_S_ACKNOWLEDGE);
+ ifcvf_add_status(hw, VIRTIO_CONFIG_S_DRIVER);
+
+ if (ifcvf_config_features(hw) < 0)
+ return -EINVAL;
+
+ if (ifcvf_hw_enable(hw) < 0)
+ return -EINVAL;
+
+ ifcvf_add_status(hw, VIRTIO_CONFIG_S_DRIVER_OK);
+
+ return 0;
+}
+
+void ifcvf_stop_hw(struct ifcvf_hw *hw)
+{
+ ifcvf_hw_disable(hw);
+ ifcvf_reset(hw);
+}
+
+void ifcvf_notify_queue(struct ifcvf_hw *hw, u16 qid)
+{
+ ifc_iowrite16(qid, hw->vring[qid].notify_addr);
+}
diff --git a/drivers/virtio/vdpa/ifcvf/ifcvf_base.h b/drivers/virtio/vdpa/ifcvf/ifcvf_base.h
new file mode 100644
index 000000000000..e80307092351
--- /dev/null
+++ b/drivers/virtio/vdpa/ifcvf/ifcvf_base.h
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Intel IFC VF NIC driver for virtio dataplane offloading
+ *
+ * Copyright (C) 2020 Intel Corporation.
+ *
+ * Author: Zhu Lingshan <[email protected]>
+ *
+ */
+
+#ifndef _IFCVF_H_
+#define _IFCVF_H_
+
+#include <linux/pci.h>
+#include <linux/pci_regs.h>
+#include <linux/vdpa.h>
+#include <uapi/linux/virtio_net.h>
+#include <uapi/linux/virtio_config.h>
+#include <uapi/linux/virtio_pci.h>
+
+#define IFCVF_VENDOR_ID 0x1AF4
+#define IFCVF_DEVICE_ID 0x1041
+#define IFCVF_SUBSYS_VENDOR_ID 0x8086
+#define IFCVF_SUBSYS_DEVICE_ID 0x001A
+
+#define IFCVF_SUPPORTED_FEATURES \
+ ((1ULL << VIRTIO_NET_F_MAC) | \
+ (1ULL << VIRTIO_F_ANY_LAYOUT) | \
+ (1ULL << VIRTIO_F_VERSION_1) | \
+ (1ULL << VIRTIO_F_ORDER_PLATFORM) | \
+ (1ULL << VIRTIO_F_IOMMU_PLATFORM) | \
+ (1ULL << VIRTIO_NET_F_MRG_RXBUF))
+
+/* Only one queue pair for now. */
+#define IFCVF_MAX_QUEUE_PAIRS 1
+
+#define IFCVF_QUEUE_ALIGNMENT PAGE_SIZE
+#define IFCVF_QUEUE_MAX 32768
+#define IFCVF_MSI_CONFIG_OFF 0
+#define IFCVF_MSI_QUEUE_OFF 1
+#define IFCVF_PCI_MAX_RESOURCE 6
+
+#define IFCVF_LM_CFG_SIZE 0x40
+#define IFCVF_LM_RING_STATE_OFFSET 0x20
+#define IFCVF_LM_BAR 4
+
+#define IFCVF_ERR(pdev, fmt, ...) dev_err(&pdev->dev, fmt, ##__VA_ARGS__)
+#define IFCVF_DBG(pdev, fmt, ...) dev_dbg(&pdev->dev, fmt, ##__VA_ARGS__)
+#define IFCVF_INFO(pdev, fmt, ...) dev_info(&pdev->dev, fmt, ##__VA_ARGS__)
+
+#define ifcvf_private_to_vf(adapter) \
+ (&((struct ifcvf_adapter *)adapter)->vf)
+
+#define IFCVF_MAX_INTR (IFCVF_MAX_QUEUE_PAIRS * 2 + 1)
+
+struct vring_info {
+ u64 desc;
+ u64 avail;
+ u64 used;
+ u16 size;
+ u16 last_avail_idx;
+ bool ready;
+ void __iomem *notify_addr;
+ u32 irq;
+ struct vdpa_callback cb;
+ char msix_name[256];
+};
+
+struct ifcvf_hw {
+ u8 __iomem *isr;
+ /* Live migration */
+ u8 __iomem *lm_cfg;
+ u16 nr_vring;
+ /* Notification bar number */
+ u8 notify_bar;
+ /* Notificaiton bar address */
+ void __iomem *notify_base;
+ u32 notify_off_multiplier;
+ u64 req_features;
+ struct virtio_pci_common_cfg __iomem *common_cfg;
+ void __iomem *net_cfg;
+ struct vring_info vring[IFCVF_MAX_QUEUE_PAIRS * 2];
+ void __iomem * const *base;
+};
+
+struct ifcvf_adapter {
+ struct vdpa_device vdpa;
+ struct pci_dev *pdev;
+ struct ifcvf_hw vf;
+};
+
+struct ifcvf_vring_lm_cfg {
+ u32 idx_addr[2];
+ u8 reserved[IFCVF_LM_CFG_SIZE - 8];
+};
+
+struct ifcvf_lm_cfg {
+ u8 reserved[IFCVF_LM_RING_STATE_OFFSET];
+ struct ifcvf_vring_lm_cfg vring_lm_cfg[IFCVF_MAX_QUEUE_PAIRS];
+};
+
+int ifcvf_init_hw(struct ifcvf_hw *hw, struct pci_dev *dev);
+int ifcvf_start_hw(struct ifcvf_hw *hw);
+void ifcvf_stop_hw(struct ifcvf_hw *hw);
+void ifcvf_notify_queue(struct ifcvf_hw *hw, u16 qid);
+void ifcvf_read_net_config(struct ifcvf_hw *hw, u64 offset,
+ void *dst, int length);
+void ifcvf_write_net_config(struct ifcvf_hw *hw, u64 offset,
+ const void *src, int length);
+u8 ifcvf_get_status(struct ifcvf_hw *hw);
+void ifcvf_set_status(struct ifcvf_hw *hw, u8 status);
+void io_write64_twopart(u64 val, u32 *lo, u32 *hi);
+void ifcvf_reset(struct ifcvf_hw *hw);
+u64 ifcvf_get_features(struct ifcvf_hw *hw);
+u64 ifcvf_get_vq_state(struct ifcvf_hw *hw, u16 qid);
+int ifcvf_set_vq_state(struct ifcvf_hw *hw, u16 qid, u64 num);
+struct ifcvf_adapter *vf_to_adapter(struct ifcvf_hw *hw);
+#endif /* _IFCVF_H_ */
diff --git a/drivers/virtio/vdpa/ifcvf/ifcvf_main.c b/drivers/virtio/vdpa/ifcvf/ifcvf_main.c
new file mode 100644
index 000000000000..8d54dc5b08d2
--- /dev/null
+++ b/drivers/virtio/vdpa/ifcvf/ifcvf_main.c
@@ -0,0 +1,435 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Intel IFC VF NIC driver for virtio dataplane offloading
+ *
+ * Copyright (C) 2020 Intel Corporation.
+ *
+ * Author: Zhu Lingshan <[email protected]>
+ *
+ */
+
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/sysfs.h>
+#include "ifcvf_base.h"
+
+#define VERSION_STRING "0.1"
+#define DRIVER_AUTHOR "Intel Corporation"
+#define IFCVF_DRIVER_NAME "ifcvf"
+
+static irqreturn_t ifcvf_intr_handler(int irq, void *arg)
+{
+ struct vring_info *vring = arg;
+
+ if (vring->cb.callback)
+ return vring->cb.callback(vring->cb.private);
+
+ return IRQ_HANDLED;
+}
+
+static int ifcvf_start_datapath(void *private)
+{
+ struct ifcvf_hw *vf = ifcvf_private_to_vf(private);
+ struct ifcvf_adapter *ifcvf;
+ u8 status;
+ int ret;
+
+ ifcvf = vf_to_adapter(vf);
+ vf->nr_vring = IFCVF_MAX_QUEUE_PAIRS * 2;
+ ret = ifcvf_start_hw(vf);
+ if (ret < 0) {
+ status = ifcvf_get_status(vf);
+ status |= VIRTIO_CONFIG_S_FAILED;
+ ifcvf_set_status(vf, status);
+ }
+
+ return ret;
+}
+
+static int ifcvf_stop_datapath(void *private)
+{
+ struct ifcvf_hw *vf = ifcvf_private_to_vf(private);
+ int i;
+
+ for (i = 0; i < IFCVF_MAX_QUEUE_PAIRS * 2; i++)
+ vf->vring[i].cb.callback = NULL;
+
+ ifcvf_stop_hw(vf);
+
+ return 0;
+}
+
+static void ifcvf_reset_vring(struct ifcvf_adapter *adapter)
+{
+ struct ifcvf_hw *vf = ifcvf_private_to_vf(adapter);
+ int i;
+
+ for (i = 0; i < IFCVF_MAX_QUEUE_PAIRS * 2; i++) {
+ vf->vring[i].last_avail_idx = 0;
+ vf->vring[i].desc = 0;
+ vf->vring[i].avail = 0;
+ vf->vring[i].used = 0;
+ vf->vring[i].ready = 0;
+ vf->vring[i].cb.callback = NULL;
+ vf->vring[i].cb.private = NULL;
+ }
+
+ ifcvf_reset(vf);
+}
+
+static struct ifcvf_adapter *vdpa_to_adapter(struct vdpa_device *vdpa_dev)
+{
+ return container_of(vdpa_dev, struct ifcvf_adapter, vdpa);
+}
+
+static struct ifcvf_hw *vdpa_to_vf(struct vdpa_device *vdpa_dev)
+{
+ struct ifcvf_adapter *adapter = vdpa_to_adapter(vdpa_dev);
+
+ return &adapter->vf;
+}
+
+static u64 ifcvf_vdpa_get_features(struct vdpa_device *vdpa_dev)
+{
+ struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
+ u64 features;
+
+ features = ifcvf_get_features(vf) & IFCVF_SUPPORTED_FEATURES;
+
+ return features;
+}
+
+static int ifcvf_vdpa_set_features(struct vdpa_device *vdpa_dev, u64 features)
+{
+ struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
+
+ vf->req_features = features;
+
+ return 0;
+}
+
+static u8 ifcvf_vdpa_get_status(struct vdpa_device *vdpa_dev)
+{
+ struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
+
+ return ifcvf_get_status(vf);
+}
+
+static void ifcvf_vdpa_set_status(struct vdpa_device *vdpa_dev, u8 status)
+{
+ struct ifcvf_adapter *adapter;
+ struct ifcvf_hw *vf;
+
+ vf = vdpa_to_vf(vdpa_dev);
+ adapter = dev_get_drvdata(vdpa_dev->dev.parent);
+
+ if (status == 0) {
+ ifcvf_stop_datapath(adapter);
+ ifcvf_reset_vring(adapter);
+ return;
+ }
+
+ if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+ if (ifcvf_start_datapath(adapter) < 0)
+ IFCVF_ERR(adapter->pdev,
+ "Failed to set ifcvf vdpa status %u\n",
+ status);
+ }
+
+ ifcvf_set_status(vf, status);
+}
+
+static u16 ifcvf_vdpa_get_vq_num_max(struct vdpa_device *vdpa_dev)
+{
+ return IFCVF_QUEUE_MAX;
+}
+
+static u64 ifcvf_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid)
+{
+ struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
+
+ return ifcvf_get_vq_state(vf, qid);
+}
+
+static int ifcvf_vdpa_set_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
+ u64 num)
+{
+ struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
+
+ return ifcvf_set_vq_state(vf, qid, num);
+}
+
+static void ifcvf_vdpa_set_vq_cb(struct vdpa_device *vdpa_dev, u16 qid,
+ struct vdpa_callback *cb)
+{
+ struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
+
+ vf->vring[qid].cb = *cb;
+}
+
+static void ifcvf_vdpa_set_vq_ready(struct vdpa_device *vdpa_dev,
+ u16 qid, bool ready)
+{
+ struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
+
+ vf->vring[qid].ready = ready;
+}
+
+static bool ifcvf_vdpa_get_vq_ready(struct vdpa_device *vdpa_dev, u16 qid)
+{
+ struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
+
+ return vf->vring[qid].ready;
+}
+
+static void ifcvf_vdpa_set_vq_num(struct vdpa_device *vdpa_dev, u16 qid,
+ u32 num)
+{
+ struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
+
+ vf->vring[qid].size = num;
+}
+
+static int ifcvf_vdpa_set_vq_address(struct vdpa_device *vdpa_dev, u16 qid,
+ u64 desc_area, u64 driver_area,
+ u64 device_area)
+{
+ struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
+
+ vf->vring[qid].desc = desc_area;
+ vf->vring[qid].avail = driver_area;
+ vf->vring[qid].used = device_area;
+
+ return 0;
+}
+
+static void ifcvf_vdpa_kick_vq(struct vdpa_device *vdpa_dev, u16 qid)
+{
+ struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
+
+ ifcvf_notify_queue(vf, qid);
+}
+
+static u32 ifcvf_vdpa_get_generation(struct vdpa_device *vdpa_dev)
+{
+ struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
+
+ return ioread8(&vf->common_cfg->config_generation);
+}
+
+static u32 ifcvf_vdpa_get_device_id(struct vdpa_device *vdpa_dev)
+{
+ return VIRTIO_ID_NET;
+}
+
+static u32 ifcvf_vdpa_get_vendor_id(struct vdpa_device *vdpa_dev)
+{
+ return IFCVF_SUBSYS_VENDOR_ID;
+}
+
+static u16 ifcvf_vdpa_get_vq_align(struct vdpa_device *vdpa_dev)
+{
+ return IFCVF_QUEUE_ALIGNMENT;
+}
+
+static void ifcvf_vdpa_get_config(struct vdpa_device *vdpa_dev,
+ unsigned int offset,
+ void *buf, unsigned int len)
+{
+ struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
+
+ WARN_ON(offset + len > sizeof(struct virtio_net_config));
+ ifcvf_read_net_config(vf, offset, buf, len);
+}
+
+static void ifcvf_vdpa_set_config(struct vdpa_device *vdpa_dev,
+ unsigned int offset, const void *buf,
+ unsigned int len)
+{
+ struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
+
+ WARN_ON(offset + len > sizeof(struct virtio_net_config));
+ ifcvf_write_net_config(vf, offset, buf, len);
+}
+
+static void ifcvf_vdpa_set_config_cb(struct vdpa_device *vdpa_dev,
+ struct vdpa_callback *cb)
+{
+ /* We don't support config interrupt */
+}
+
+/*
+ * IFCVF currently does't have on-chip IOMMU, so not
+ * implemented set_map()/dma_map()/dma_unmap()
+ */
+static const struct vdpa_config_ops ifc_vdpa_ops = {
+ .get_features = ifcvf_vdpa_get_features,
+ .set_features = ifcvf_vdpa_set_features,
+ .get_status = ifcvf_vdpa_get_status,
+ .set_status = ifcvf_vdpa_set_status,
+ .get_vq_num_max = ifcvf_vdpa_get_vq_num_max,
+ .get_vq_state = ifcvf_vdpa_get_vq_state,
+ .set_vq_state = ifcvf_vdpa_set_vq_state,
+ .set_vq_cb = ifcvf_vdpa_set_vq_cb,
+ .set_vq_ready = ifcvf_vdpa_set_vq_ready,
+ .get_vq_ready = ifcvf_vdpa_get_vq_ready,
+ .set_vq_num = ifcvf_vdpa_set_vq_num,
+ .set_vq_address = ifcvf_vdpa_set_vq_address,
+ .kick_vq = ifcvf_vdpa_kick_vq,
+ .get_generation = ifcvf_vdpa_get_generation,
+ .get_device_id = ifcvf_vdpa_get_device_id,
+ .get_vendor_id = ifcvf_vdpa_get_vendor_id,
+ .get_vq_align = ifcvf_vdpa_get_vq_align,
+ .get_config = ifcvf_vdpa_get_config,
+ .set_config = ifcvf_vdpa_set_config,
+ .set_config_cb = ifcvf_vdpa_set_config_cb,
+};
+
+static int ifcvf_request_irq(struct ifcvf_adapter *adapter)
+{
+ struct pci_dev *pdev = adapter->pdev;
+ struct ifcvf_hw *vf = &adapter->vf;
+ int vector, i, ret, irq;
+
+
+ for (i = 0; i < IFCVF_MAX_QUEUE_PAIRS * 2; i++) {
+ snprintf(vf->vring[i].msix_name, 256, "ifcvf[%s]-%d\n",
+ pci_name(pdev), i);
+ vector = i + IFCVF_MSI_QUEUE_OFF;
+ irq = pci_irq_vector(pdev, vector);
+ ret = devm_request_irq(&pdev->dev, irq,
+ ifcvf_intr_handler, 0,
+ vf->vring[i].msix_name,
+ &vf->vring[i]);
+ if (ret) {
+ IFCVF_ERR(pdev,
+ "Failed to request irq for vq %d\n", i);
+ return ret;
+ }
+ vf->vring[i].irq = irq;
+ }
+
+ return 0;
+}
+
+static void ifcvf_free_irq_vectors(void *data)
+{
+ pci_free_irq_vectors(data);
+}
+
+static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+ struct device *dev = &pdev->dev;
+ struct ifcvf_adapter *adapter;
+ struct ifcvf_hw *vf;
+ int ret;
+
+ ret = pcim_enable_device(pdev);
+ if (ret) {
+ IFCVF_ERR(pdev, "Failed to enable device\n");
+ return ret;
+ }
+
+ ret = pcim_iomap_regions(pdev, BIT(0) | BIT(2) | BIT(4),
+ IFCVF_DRIVER_NAME);
+ if (ret) {
+ IFCVF_ERR(pdev, "Failed to request MMIO region\n");
+ return ret;
+ }
+
+ ret = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+ if (ret) {
+ IFCVF_ERR(pdev, "No usable DMA confiugration\n");
+ return ret;
+ }
+
+ ret = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+ if (ret) {
+ IFCVF_ERR(pdev,
+ "No usable coherent DMA confiugration\n");
+ return ret;
+ }
+
+ ret = pci_alloc_irq_vectors(pdev, IFCVF_MAX_INTR,
+ IFCVF_MAX_INTR, PCI_IRQ_MSIX);
+ if (ret < 0) {
+ IFCVF_ERR(pdev, "Failed to alloc irq vectors\n");
+ return ret;
+ }
+
+ ret = devm_add_action_or_reset(dev, ifcvf_free_irq_vectors, pdev);
+ if (ret) {
+ IFCVF_ERR(pdev,
+ "Failed for adding devres for freeing irq vectors\n");
+ return ret;
+ }
+
+ adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa,
+ dev, &ifc_vdpa_ops);
+ if (adapter == NULL) {
+ IFCVF_ERR(pdev, "Failed to allocate vDPA structure");
+ return -ENOMEM;
+ }
+
+ pci_set_master(pdev);
+ pci_set_drvdata(pdev, adapter);
+
+ vf = &adapter->vf;
+ vf->base = pcim_iomap_table(pdev);
+
+ adapter->pdev = pdev;
+ adapter->vdpa.dma_dev = &pdev->dev;
+
+ ret = ifcvf_request_irq(adapter);
+ if (ret) {
+ IFCVF_ERR(pdev, "Failed to request MSI-X irq\n");
+ goto err;
+ }
+
+ ret = ifcvf_init_hw(vf, pdev);
+ if (ret) {
+ IFCVF_ERR(pdev, "Failed to init IFCVF hw\n");
+ goto err;
+ }
+
+ ret = vdpa_register_device(&adapter->vdpa);
+ if (ret) {
+ IFCVF_ERR(pdev, "Failed to register ifcvf to vdpa bus");
+ goto err;
+ }
+
+ return 0;
+
+err:
+ put_device(&adapter->vdpa.dev);
+ return ret;
+}
+
+static void ifcvf_remove(struct pci_dev *pdev)
+{
+ struct ifcvf_adapter *adapter = pci_get_drvdata(pdev);
+
+ vdpa_unregister_device(&adapter->vdpa);
+}
+
+static struct pci_device_id ifcvf_pci_ids[] = {
+ { PCI_DEVICE_SUB(IFCVF_VENDOR_ID,
+ IFCVF_DEVICE_ID,
+ IFCVF_SUBSYS_VENDOR_ID,
+ IFCVF_SUBSYS_DEVICE_ID) },
+ { 0 },
+};
+MODULE_DEVICE_TABLE(pci, ifcvf_pci_ids);
+
+static struct pci_driver ifcvf_driver = {
+ .name = IFCVF_DRIVER_NAME,
+ .id_table = ifcvf_pci_ids,
+ .probe = ifcvf_probe,
+ .remove = ifcvf_remove,
+};
+
+module_pci_driver(ifcvf_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_VERSION(VERSION_STRING);
--
2.20.1

2020-03-26 14:08:37

by Jason Wang

[permalink] [raw]
Subject: [PATCH V9 7/9] vhost: introduce vDPA-based backend

From: Tiwei Bie <[email protected]>

This patch introduces a vDPA-based vhost backend. This backend is
built on top of the same interface defined in virtio-vDPA and provides
a generic vhost interface for userspace to accelerate the virtio
devices in guest.

This backend is implemented as a vDPA device driver on top of the same
ops used in virtio-vDPA. It will create char device entry named
vhost-vdpa-$index for userspace to use. Userspace can use vhost ioctls
on top of this char device to setup the backend.

Vhost ioctls are extended to make it type agnostic and behave like a
virtio device, this help to eliminate type specific API like what
vhost_net/scsi/vsock did:

- VHOST_VDPA_GET_DEVICE_ID: get the virtio device ID which is defined
by virtio specification to differ from different type of devices
- VHOST_VDPA_GET_VRING_NUM: get the maximum size of virtqueue
supported by the vDPA device
- VHSOT_VDPA_SET/GET_STATUS: set and get virtio status of vDPA device
- VHOST_VDPA_SET/GET_CONFIG: access virtio config space
- VHOST_VDPA_SET_VRING_ENABLE: enable a specific virtqueue

For memory mapping, IOTLB API is mandated for vhost-vDPA which means
userspace drivers are required to use
VHOST_IOTLB_UPDATE/VHOST_IOTLB_INVALIDATE to add or remove mapping for
a specific userspace memory region.

The vhost-vDPA API is designed to be type agnostic, but it allows net
device only in current stage. Due to the lacking of control virtqueue
support, some features were filter out by vhost-vdpa.

We will enable more features and devices in the near future.

Signed-off-by: Tiwei Bie <[email protected]>
Signed-off-by: Eugenio Pérez <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
---
drivers/vhost/Kconfig | 11 +
drivers/vhost/Makefile | 3 +
drivers/vhost/vdpa.c | 883 +++++++++++++++++++++++++++++++
include/uapi/linux/vhost.h | 24 +
include/uapi/linux/vhost_types.h | 8 +
5 files changed, 929 insertions(+)
create mode 100644 drivers/vhost/vdpa.c

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index b03872886901..2a4efe39d79b 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -52,6 +52,17 @@ config VHOST_VSOCK
To compile this driver as a module, choose M here: the module will be called
vhost_vsock.

+config VHOST_VDPA
+ tristate "Vhost driver for vDPA-based backend"
+ depends on EVENTFD
+ select VDPA
+ help
+ This kernel module can be loaded in host kernel to accelerate
+ guest virtio devices with the vDPA-based backends.
+
+ To compile this driver as a module, choose M here: the module
+ will be called vhost_vdpa.
+
config VHOST_CROSS_ENDIAN_LEGACY
bool "Cross-endian support for vhost"
default n
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index fb831002bcf0..f3e1897cce85 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -10,6 +10,9 @@ vhost_vsock-y := vsock.o

obj-$(CONFIG_VHOST_RING) += vringh.o

+obj-$(CONFIG_VHOST_VDPA) += vhost_vdpa.o
+vhost_vdpa-y := vdpa.o
+
obj-$(CONFIG_VHOST) += vhost.o

obj-$(CONFIG_VHOST_IOTLB) += vhost_iotlb.o
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
new file mode 100644
index 000000000000..421f02a8530a
--- /dev/null
+++ b/drivers/vhost/vdpa.c
@@ -0,0 +1,883 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018-2020 Intel Corporation.
+ * Copyright (C) 2020 Red Hat, Inc.
+ *
+ * Author: Tiwei Bie <[email protected]>
+ * Jason Wang <[email protected]>
+ *
+ * Thanks Michael S. Tsirkin for the valuable comments and
+ * suggestions. And thanks to Cunming Liang and Zhihong Wang for all
+ * their supports.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/cdev.h>
+#include <linux/device.h>
+#include <linux/iommu.h>
+#include <linux/uuid.h>
+#include <linux/vdpa.h>
+#include <linux/nospec.h>
+#include <linux/vhost.h>
+#include <linux/virtio_net.h>
+
+#include "vhost.h"
+
+enum {
+ VHOST_VDPA_FEATURES =
+ (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) |
+ (1ULL << VIRTIO_F_ANY_LAYOUT) |
+ (1ULL << VIRTIO_F_VERSION_1) |
+ (1ULL << VIRTIO_F_IOMMU_PLATFORM) |
+ (1ULL << VIRTIO_F_RING_PACKED) |
+ (1ULL << VIRTIO_F_ORDER_PLATFORM) |
+ (1ULL << VIRTIO_RING_F_INDIRECT_DESC) |
+ (1ULL << VIRTIO_RING_F_EVENT_IDX),
+
+ VHOST_VDPA_NET_FEATURES = VHOST_VDPA_FEATURES |
+ (1ULL << VIRTIO_NET_F_CSUM) |
+ (1ULL << VIRTIO_NET_F_GUEST_CSUM) |
+ (1ULL << VIRTIO_NET_F_MTU) |
+ (1ULL << VIRTIO_NET_F_MAC) |
+ (1ULL << VIRTIO_NET_F_GUEST_TSO4) |
+ (1ULL << VIRTIO_NET_F_GUEST_TSO6) |
+ (1ULL << VIRTIO_NET_F_GUEST_ECN) |
+ (1ULL << VIRTIO_NET_F_GUEST_UFO) |
+ (1ULL << VIRTIO_NET_F_HOST_TSO4) |
+ (1ULL << VIRTIO_NET_F_HOST_TSO6) |
+ (1ULL << VIRTIO_NET_F_HOST_ECN) |
+ (1ULL << VIRTIO_NET_F_HOST_UFO) |
+ (1ULL << VIRTIO_NET_F_MRG_RXBUF) |
+ (1ULL << VIRTIO_NET_F_STATUS) |
+ (1ULL << VIRTIO_NET_F_SPEED_DUPLEX),
+};
+
+/* Currently, only network backend w/o multiqueue is supported. */
+#define VHOST_VDPA_VQ_MAX 2
+
+#define VHOST_VDPA_DEV_MAX (1U << MINORBITS)
+
+struct vhost_vdpa {
+ struct vhost_dev vdev;
+ struct iommu_domain *domain;
+ struct vhost_virtqueue *vqs;
+ struct completion completion;
+ struct vdpa_device *vdpa;
+ struct device dev;
+ struct cdev cdev;
+ atomic_t opened;
+ int nvqs;
+ int virtio_id;
+ int minor;
+};
+
+static DEFINE_IDA(vhost_vdpa_ida);
+
+static dev_t vhost_vdpa_major;
+
+static const u64 vhost_vdpa_features[] = {
+ [VIRTIO_ID_NET] = VHOST_VDPA_NET_FEATURES,
+};
+
+static void handle_vq_kick(struct vhost_work *work)
+{
+ struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
+ poll.work);
+ struct vhost_vdpa *v = container_of(vq->dev, struct vhost_vdpa, vdev);
+ const struct vdpa_config_ops *ops = v->vdpa->config;
+
+ ops->kick_vq(v->vdpa, vq - v->vqs);
+}
+
+static irqreturn_t vhost_vdpa_virtqueue_cb(void *private)
+{
+ struct vhost_virtqueue *vq = private;
+ struct eventfd_ctx *call_ctx = vq->call_ctx;
+
+ if (call_ctx)
+ eventfd_signal(call_ctx, 1);
+
+ return IRQ_HANDLED;
+}
+
+static void vhost_vdpa_reset(struct vhost_vdpa *v)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ ops->set_status(vdpa, 0);
+}
+
+static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+ u32 device_id;
+
+ device_id = ops->get_device_id(vdpa);
+
+ if (copy_to_user(argp, &device_id, sizeof(device_id)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static long vhost_vdpa_get_status(struct vhost_vdpa *v, u8 __user *statusp)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+ u8 status;
+
+ status = ops->get_status(vdpa);
+
+ if (copy_to_user(statusp, &status, sizeof(status)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+ u8 status;
+
+ if (copy_from_user(&status, statusp, sizeof(status)))
+ return -EFAULT;
+
+ /*
+ * Userspace shouldn't remove status bits unless reset the
+ * status to 0.
+ */
+ if (status != 0 && (ops->get_status(vdpa) & ~status) != 0)
+ return -EINVAL;
+
+ ops->set_status(vdpa, status);
+
+ return 0;
+}
+
+static int vhost_vdpa_config_validate(struct vhost_vdpa *v,
+ struct vhost_vdpa_config *c)
+{
+ long size = 0;
+
+ switch (v->virtio_id) {
+ case VIRTIO_ID_NET:
+ size = sizeof(struct virtio_net_config);
+ break;
+ }
+
+ if (c->len == 0)
+ return -EINVAL;
+
+ if (c->len > size - c->off)
+ return -E2BIG;
+
+ return 0;
+}
+
+static long vhost_vdpa_get_config(struct vhost_vdpa *v,
+ struct vhost_vdpa_config __user *c)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+ struct vhost_vdpa_config config;
+ unsigned long size = offsetof(struct vhost_vdpa_config, buf);
+ u8 *buf;
+
+ if (copy_from_user(&config, c, size))
+ return -EFAULT;
+ if (vhost_vdpa_config_validate(v, &config))
+ return -EINVAL;
+ buf = kvzalloc(config.len, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ ops->get_config(vdpa, config.off, buf, config.len);
+
+ if (copy_to_user(c->buf, buf, config.len)) {
+ kvfree(buf);
+ return -EFAULT;
+ }
+
+ kvfree(buf);
+ return 0;
+}
+
+static long vhost_vdpa_set_config(struct vhost_vdpa *v,
+ struct vhost_vdpa_config __user *c)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+ struct vhost_vdpa_config config;
+ unsigned long size = offsetof(struct vhost_vdpa_config, buf);
+ u8 *buf;
+
+ if (copy_from_user(&config, c, size))
+ return -EFAULT;
+ if (vhost_vdpa_config_validate(v, &config))
+ return -EINVAL;
+ buf = kvzalloc(config.len, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ if (copy_from_user(buf, c->buf, config.len)) {
+ kvfree(buf);
+ return -EFAULT;
+ }
+
+ ops->set_config(vdpa, config.off, buf, config.len);
+
+ kvfree(buf);
+ return 0;
+}
+
+static long vhost_vdpa_get_features(struct vhost_vdpa *v, u64 __user *featurep)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+ u64 features;
+
+ features = ops->get_features(vdpa);
+ features &= vhost_vdpa_features[v->virtio_id];
+
+ if (copy_to_user(featurep, &features, sizeof(features)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static long vhost_vdpa_set_features(struct vhost_vdpa *v, u64 __user *featurep)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+ u64 features;
+
+ /*
+ * It's not allowed to change the features after they have
+ * been negotiated.
+ */
+ if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
+ return -EBUSY;
+
+ if (copy_from_user(&features, featurep, sizeof(features)))
+ return -EFAULT;
+
+ if (features & ~vhost_vdpa_features[v->virtio_id])
+ return -EINVAL;
+
+ if (ops->set_features(vdpa, features))
+ return -EINVAL;
+
+ return 0;
+}
+
+static long vhost_vdpa_get_vring_num(struct vhost_vdpa *v, u16 __user *argp)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+ u16 num;
+
+ num = ops->get_vq_num_max(vdpa);
+
+ if (copy_to_user(argp, &num, sizeof(num)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
+ void __user *argp)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+ struct vdpa_callback cb;
+ struct vhost_virtqueue *vq;
+ struct vhost_vring_state s;
+ u8 status;
+ u32 idx;
+ long r;
+
+ r = get_user(idx, (u32 __user *)argp);
+ if (r < 0)
+ return r;
+
+ if (idx >= v->nvqs)
+ return -ENOBUFS;
+
+ idx = array_index_nospec(idx, v->nvqs);
+ vq = &v->vqs[idx];
+
+ status = ops->get_status(vdpa);
+
+ if (cmd == VHOST_VDPA_SET_VRING_ENABLE) {
+ if (copy_from_user(&s, argp, sizeof(s)))
+ return -EFAULT;
+ ops->set_vq_ready(vdpa, idx, s.num);
+ return 0;
+ }
+
+ if (cmd == VHOST_GET_VRING_BASE)
+ vq->last_avail_idx = ops->get_vq_state(v->vdpa, idx);
+
+ r = vhost_vring_ioctl(&v->vdev, cmd, argp);
+ if (r)
+ return r;
+
+ switch (cmd) {
+ case VHOST_SET_VRING_ADDR:
+ if (ops->set_vq_address(vdpa, idx,
+ (u64)(uintptr_t)vq->desc,
+ (u64)(uintptr_t)vq->avail,
+ (u64)(uintptr_t)vq->used))
+ r = -EINVAL;
+ break;
+
+ case VHOST_SET_VRING_BASE:
+ if (ops->set_vq_state(vdpa, idx, vq->last_avail_idx))
+ r = -EINVAL;
+ break;
+
+ case VHOST_SET_VRING_CALL:
+ if (vq->call_ctx) {
+ cb.callback = vhost_vdpa_virtqueue_cb;
+ cb.private = vq;
+ } else {
+ cb.callback = NULL;
+ cb.private = NULL;
+ }
+ ops->set_vq_cb(vdpa, idx, &cb);
+ break;
+
+ case VHOST_SET_VRING_NUM:
+ ops->set_vq_num(vdpa, idx, vq->num);
+ break;
+ }
+
+ return r;
+}
+
+static long vhost_vdpa_unlocked_ioctl(struct file *filep,
+ unsigned int cmd, unsigned long arg)
+{
+ struct vhost_vdpa *v = filep->private_data;
+ struct vhost_dev *d = &v->vdev;
+ void __user *argp = (void __user *)arg;
+ long r;
+
+ mutex_lock(&d->mutex);
+
+ switch (cmd) {
+ case VHOST_VDPA_GET_DEVICE_ID:
+ r = vhost_vdpa_get_device_id(v, argp);
+ break;
+ case VHOST_VDPA_GET_STATUS:
+ r = vhost_vdpa_get_status(v, argp);
+ break;
+ case VHOST_VDPA_SET_STATUS:
+ r = vhost_vdpa_set_status(v, argp);
+ break;
+ case VHOST_VDPA_GET_CONFIG:
+ r = vhost_vdpa_get_config(v, argp);
+ break;
+ case VHOST_VDPA_SET_CONFIG:
+ r = vhost_vdpa_set_config(v, argp);
+ break;
+ case VHOST_GET_FEATURES:
+ r = vhost_vdpa_get_features(v, argp);
+ break;
+ case VHOST_SET_FEATURES:
+ r = vhost_vdpa_set_features(v, argp);
+ break;
+ case VHOST_VDPA_GET_VRING_NUM:
+ r = vhost_vdpa_get_vring_num(v, argp);
+ break;
+ case VHOST_SET_LOG_BASE:
+ case VHOST_SET_LOG_FD:
+ r = -ENOIOCTLCMD;
+ break;
+ default:
+ r = vhost_dev_ioctl(&v->vdev, cmd, argp);
+ if (r == -ENOIOCTLCMD)
+ r = vhost_vdpa_vring_ioctl(v, cmd, argp);
+ break;
+ }
+
+ mutex_unlock(&d->mutex);
+ return r;
+}
+
+static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, u64 start, u64 last)
+{
+ struct vhost_dev *dev = &v->vdev;
+ struct vhost_iotlb *iotlb = dev->iotlb;
+ struct vhost_iotlb_map *map;
+ struct page *page;
+ unsigned long pfn, pinned;
+
+ while ((map = vhost_iotlb_itree_first(iotlb, start, last)) != NULL) {
+ pinned = map->size >> PAGE_SHIFT;
+ for (pfn = map->addr >> PAGE_SHIFT;
+ pinned > 0; pfn++, pinned--) {
+ page = pfn_to_page(pfn);
+ if (map->perm & VHOST_ACCESS_WO)
+ set_page_dirty_lock(page);
+ unpin_user_page(page);
+ }
+ atomic64_sub(map->size >> PAGE_SHIFT, &dev->mm->pinned_vm);
+ vhost_iotlb_map_free(iotlb, map);
+ }
+}
+
+static void vhost_vdpa_iotlb_free(struct vhost_vdpa *v)
+{
+ struct vhost_dev *dev = &v->vdev;
+
+ vhost_vdpa_iotlb_unmap(v, 0ULL, 0ULL - 1);
+ kfree(dev->iotlb);
+ dev->iotlb = NULL;
+}
+
+static int perm_to_iommu_flags(u32 perm)
+{
+ int flags = 0;
+
+ switch (perm) {
+ case VHOST_ACCESS_WO:
+ flags |= IOMMU_WRITE;
+ break;
+ case VHOST_ACCESS_RO:
+ flags |= IOMMU_READ;
+ break;
+ case VHOST_ACCESS_RW:
+ flags |= (IOMMU_WRITE | IOMMU_READ);
+ break;
+ default:
+ WARN(1, "invalidate vhost IOTLB permission\n");
+ break;
+ }
+
+ return flags | IOMMU_CACHE;
+}
+
+static int vhost_vdpa_map(struct vhost_vdpa *v,
+ u64 iova, u64 size, u64 pa, u32 perm)
+{
+ struct vhost_dev *dev = &v->vdev;
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+ int r = 0;
+
+ r = vhost_iotlb_add_range(dev->iotlb, iova, iova + size - 1,
+ pa, perm);
+ if (r)
+ return r;
+
+ if (ops->dma_map)
+ r = ops->dma_map(vdpa, iova, size, pa, perm);
+ else if (ops->set_map)
+ r = ops->set_map(vdpa, dev->iotlb);
+ else
+ r = iommu_map(v->domain, iova, pa, size,
+ perm_to_iommu_flags(perm));
+
+ return r;
+}
+
+static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 iova, u64 size)
+{
+ struct vhost_dev *dev = &v->vdev;
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ vhost_vdpa_iotlb_unmap(v, iova, iova + size - 1);
+
+ if (ops->dma_map)
+ ops->dma_unmap(vdpa, iova, size);
+ else if (ops->set_map)
+ ops->set_map(vdpa, dev->iotlb);
+ else
+ iommu_unmap(v->domain, iova, size);
+}
+
+static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
+ struct vhost_iotlb_msg *msg)
+{
+ struct vhost_dev *dev = &v->vdev;
+ struct vhost_iotlb *iotlb = dev->iotlb;
+ struct page **page_list;
+ unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
+ unsigned int gup_flags = FOLL_LONGTERM;
+ unsigned long npages, cur_base, map_pfn, last_pfn = 0;
+ unsigned long locked, lock_limit, pinned, i;
+ u64 iova = msg->iova;
+ int ret = 0;
+
+ if (vhost_iotlb_itree_first(iotlb, msg->iova,
+ msg->iova + msg->size - 1))
+ return -EEXIST;
+
+ page_list = (struct page **) __get_free_page(GFP_KERNEL);
+ if (!page_list)
+ return -ENOMEM;
+
+ if (msg->perm & VHOST_ACCESS_WO)
+ gup_flags |= FOLL_WRITE;
+
+ npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK)) >> PAGE_SHIFT;
+ if (!npages)
+ return -EINVAL;
+
+ down_read(&dev->mm->mmap_sem);
+
+ locked = atomic64_add_return(npages, &dev->mm->pinned_vm);
+ lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+
+ if (locked > lock_limit) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ cur_base = msg->uaddr & PAGE_MASK;
+ iova &= PAGE_MASK;
+
+ while (npages) {
+ pinned = min_t(unsigned long, npages, list_size);
+ ret = pin_user_pages(cur_base, pinned,
+ gup_flags, page_list, NULL);
+ if (ret != pinned)
+ goto out;
+
+ if (!last_pfn)
+ map_pfn = page_to_pfn(page_list[0]);
+
+ for (i = 0; i < ret; i++) {
+ unsigned long this_pfn = page_to_pfn(page_list[i]);
+ u64 csize;
+
+ if (last_pfn && (this_pfn != last_pfn + 1)) {
+ /* Pin a contiguous chunk of memory */
+ csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT;
+ if (vhost_vdpa_map(v, iova, csize,
+ map_pfn << PAGE_SHIFT,
+ msg->perm))
+ goto out;
+ map_pfn = this_pfn;
+ iova += csize;
+ }
+
+ last_pfn = this_pfn;
+ }
+
+ cur_base += ret << PAGE_SHIFT;
+ npages -= ret;
+ }
+
+ /* Pin the rest chunk */
+ ret = vhost_vdpa_map(v, iova, (last_pfn - map_pfn + 1) << PAGE_SHIFT,
+ map_pfn << PAGE_SHIFT, msg->perm);
+out:
+ if (ret) {
+ vhost_vdpa_unmap(v, msg->iova, msg->size);
+ atomic64_sub(npages, &dev->mm->pinned_vm);
+ }
+ up_read(&dev->mm->mmap_sem);
+ free_page((unsigned long)page_list);
+ return ret;
+}
+
+static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev,
+ struct vhost_iotlb_msg *msg)
+{
+ struct vhost_vdpa *v = container_of(dev, struct vhost_vdpa, vdev);
+ int r = 0;
+
+ r = vhost_dev_check_owner(dev);
+ if (r)
+ return r;
+
+ switch (msg->type) {
+ case VHOST_IOTLB_UPDATE:
+ r = vhost_vdpa_process_iotlb_update(v, msg);
+ break;
+ case VHOST_IOTLB_INVALIDATE:
+ vhost_vdpa_unmap(v, msg->iova, msg->size);
+ break;
+ default:
+ r = -EINVAL;
+ break;
+ }
+
+ return r;
+}
+
+static ssize_t vhost_vdpa_chr_write_iter(struct kiocb *iocb,
+ struct iov_iter *from)
+{
+ struct file *file = iocb->ki_filp;
+ struct vhost_vdpa *v = file->private_data;
+ struct vhost_dev *dev = &v->vdev;
+
+ return vhost_chr_write_iter(dev, from);
+}
+
+static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+ struct device *dma_dev = vdpa_get_dma_dev(vdpa);
+ struct bus_type *bus;
+ int ret;
+
+ /* Device want to do DMA by itself */
+ if (ops->set_map || ops->dma_map)
+ return 0;
+
+ bus = dma_dev->bus;
+ if (!bus)
+ return -EFAULT;
+
+ if (!iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))
+ return -ENOTSUPP;
+
+ v->domain = iommu_domain_alloc(bus);
+ if (!v->domain)
+ return -EIO;
+
+ ret = iommu_attach_device(v->domain, dma_dev);
+ if (ret)
+ goto err_attach;
+
+ return 0;
+
+err_attach:
+ iommu_domain_free(v->domain);
+ return ret;
+}
+
+static void vhost_vdpa_free_domain(struct vhost_vdpa *v)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ struct device *dma_dev = vdpa_get_dma_dev(vdpa);
+
+ if (v->domain) {
+ iommu_detach_device(v->domain, dma_dev);
+ iommu_domain_free(v->domain);
+ }
+
+ v->domain = NULL;
+}
+
+static int vhost_vdpa_open(struct inode *inode, struct file *filep)
+{
+ struct vhost_vdpa *v;
+ struct vhost_dev *dev;
+ struct vhost_virtqueue **vqs;
+ int nvqs, i, r, opened;
+
+ v = container_of(inode->i_cdev, struct vhost_vdpa, cdev);
+ if (!v)
+ return -ENODEV;
+
+ opened = atomic_cmpxchg(&v->opened, 0, 1);
+ if (opened)
+ return -EBUSY;
+
+ nvqs = v->nvqs;
+ vhost_vdpa_reset(v);
+
+ vqs = kmalloc_array(nvqs, sizeof(*vqs), GFP_KERNEL);
+ if (!vqs) {
+ r = -ENOMEM;
+ goto err;
+ }
+
+ dev = &v->vdev;
+ for (i = 0; i < nvqs; i++) {
+ vqs[i] = &v->vqs[i];
+ vqs[i]->handle_kick = handle_vq_kick;
+ }
+ vhost_dev_init(dev, vqs, nvqs, 0, 0, 0,
+ vhost_vdpa_process_iotlb_msg);
+
+ dev->iotlb = vhost_iotlb_alloc(0, 0);
+ if (!dev->iotlb) {
+ r = -ENOMEM;
+ goto err_init_iotlb;
+ }
+
+ r = vhost_vdpa_alloc_domain(v);
+ if (r)
+ goto err_init_iotlb;
+
+ filep->private_data = v;
+
+ return 0;
+
+err_init_iotlb:
+ vhost_dev_cleanup(&v->vdev);
+err:
+ atomic_dec(&v->opened);
+ return r;
+}
+
+static int vhost_vdpa_release(struct inode *inode, struct file *filep)
+{
+ struct vhost_vdpa *v = filep->private_data;
+ struct vhost_dev *d = &v->vdev;
+
+ mutex_lock(&d->mutex);
+ filep->private_data = NULL;
+ vhost_vdpa_reset(v);
+ vhost_dev_stop(&v->vdev);
+ vhost_vdpa_iotlb_free(v);
+ vhost_vdpa_free_domain(v);
+ vhost_dev_cleanup(&v->vdev);
+ kfree(v->vdev.vqs);
+ mutex_unlock(&d->mutex);
+
+ atomic_dec(&v->opened);
+ complete(&v->completion);
+
+ return 0;
+}
+
+static const struct file_operations vhost_vdpa_fops = {
+ .owner = THIS_MODULE,
+ .open = vhost_vdpa_open,
+ .release = vhost_vdpa_release,
+ .write_iter = vhost_vdpa_chr_write_iter,
+ .unlocked_ioctl = vhost_vdpa_unlocked_ioctl,
+ .compat_ioctl = compat_ptr_ioctl,
+};
+
+static void vhost_vdpa_release_dev(struct device *device)
+{
+ struct vhost_vdpa *v =
+ container_of(device, struct vhost_vdpa, dev);
+
+ ida_simple_remove(&vhost_vdpa_ida, v->minor);
+ kfree(v->vqs);
+ kfree(v);
+}
+
+static int vhost_vdpa_probe(struct vdpa_device *vdpa)
+{
+ const struct vdpa_config_ops *ops = vdpa->config;
+ struct vhost_vdpa *v;
+ int minor, nvqs = VHOST_VDPA_VQ_MAX;
+ int r;
+
+ /* Currently, we only accept the network devices. */
+ if (ops->get_device_id(vdpa) != VIRTIO_ID_NET)
+ return -ENOTSUPP;
+
+ v = kzalloc(sizeof(*v), GFP_KERNEL | __GFP_RETRY_MAYFAIL);
+ if (!v)
+ return -ENOMEM;
+
+ minor = ida_simple_get(&vhost_vdpa_ida, 0,
+ VHOST_VDPA_DEV_MAX, GFP_KERNEL);
+ if (minor < 0) {
+ kfree(v);
+ return minor;
+ }
+
+ atomic_set(&v->opened, 0);
+ v->minor = minor;
+ v->vdpa = vdpa;
+ v->nvqs = nvqs;
+ v->virtio_id = ops->get_device_id(vdpa);
+
+ device_initialize(&v->dev);
+ v->dev.release = vhost_vdpa_release_dev;
+ v->dev.parent = &vdpa->dev;
+ v->dev.devt = MKDEV(MAJOR(vhost_vdpa_major), minor);
+ v->vqs = kmalloc_array(nvqs, sizeof(struct vhost_virtqueue),
+ GFP_KERNEL);
+ if (!v->vqs) {
+ r = -ENOMEM;
+ goto err;
+ }
+
+ r = dev_set_name(&v->dev, "vhost-vdpa-%u", minor);
+ if (r)
+ goto err;
+
+ cdev_init(&v->cdev, &vhost_vdpa_fops);
+ v->cdev.owner = THIS_MODULE;
+
+ r = cdev_device_add(&v->cdev, &v->dev);
+ if (r)
+ goto err;
+
+ init_completion(&v->completion);
+ vdpa_set_drvdata(vdpa, v);
+
+ return 0;
+
+err:
+ put_device(&v->dev);
+ return r;
+}
+
+static void vhost_vdpa_remove(struct vdpa_device *vdpa)
+{
+ struct vhost_vdpa *v = vdpa_get_drvdata(vdpa);
+ int opened;
+
+ cdev_device_del(&v->cdev, &v->dev);
+
+ do {
+ opened = atomic_cmpxchg(&v->opened, 0, 1);
+ if (!opened)
+ break;
+ wait_for_completion(&v->completion);
+ } while (1);
+
+ put_device(&v->dev);
+}
+
+static struct vdpa_driver vhost_vdpa_driver = {
+ .driver = {
+ .name = "vhost_vdpa",
+ },
+ .probe = vhost_vdpa_probe,
+ .remove = vhost_vdpa_remove,
+};
+
+static int __init vhost_vdpa_init(void)
+{
+ int r;
+
+ r = alloc_chrdev_region(&vhost_vdpa_major, 0, VHOST_VDPA_DEV_MAX,
+ "vhost-vdpa");
+ if (r)
+ goto err_alloc_chrdev;
+
+ r = vdpa_register_driver(&vhost_vdpa_driver);
+ if (r)
+ goto err_vdpa_register_driver;
+
+ return 0;
+
+err_vdpa_register_driver:
+ unregister_chrdev_region(vhost_vdpa_major, VHOST_VDPA_DEV_MAX);
+err_alloc_chrdev:
+ return r;
+}
+module_init(vhost_vdpa_init);
+
+static void __exit vhost_vdpa_exit(void)
+{
+ vdpa_unregister_driver(&vhost_vdpa_driver);
+ unregister_chrdev_region(vhost_vdpa_major, VHOST_VDPA_DEV_MAX);
+}
+module_exit(vhost_vdpa_exit);
+
+MODULE_VERSION("0.0.1");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Intel Corporation");
+MODULE_DESCRIPTION("vDPA-based vhost backend for virtio");
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index 40d028eed645..9fe72e4b1373 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -116,4 +116,28 @@
#define VHOST_VSOCK_SET_GUEST_CID _IOW(VHOST_VIRTIO, 0x60, __u64)
#define VHOST_VSOCK_SET_RUNNING _IOW(VHOST_VIRTIO, 0x61, int)

+/* VHOST_VDPA specific defines */
+
+/* Get the device id. The device ids follow the same definition of
+ * the device id defined in virtio-spec.
+ */
+#define VHOST_VDPA_GET_DEVICE_ID _IOR(VHOST_VIRTIO, 0x70, __u32)
+/* Get and set the status. The status bits follow the same definition
+ * of the device status defined in virtio-spec.
+ */
+#define VHOST_VDPA_GET_STATUS _IOR(VHOST_VIRTIO, 0x71, __u8)
+#define VHOST_VDPA_SET_STATUS _IOW(VHOST_VIRTIO, 0x72, __u8)
+/* Get and set the device config. The device config follows the same
+ * definition of the device config defined in virtio-spec.
+ */
+#define VHOST_VDPA_GET_CONFIG _IOR(VHOST_VIRTIO, 0x73, \
+ struct vhost_vdpa_config)
+#define VHOST_VDPA_SET_CONFIG _IOW(VHOST_VIRTIO, 0x74, \
+ struct vhost_vdpa_config)
+/* Enable/disable the ring. */
+#define VHOST_VDPA_SET_VRING_ENABLE _IOW(VHOST_VIRTIO, 0x75, \
+ struct vhost_vring_state)
+/* Get the max ring size. */
+#define VHOST_VDPA_GET_VRING_NUM _IOR(VHOST_VIRTIO, 0x76, __u16)
+
#endif
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index c907290ff065..669457ce5c48 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -119,6 +119,14 @@ struct vhost_scsi_target {
unsigned short reserved;
};

+/* VHOST_VDPA specific definitions */
+
+struct vhost_vdpa_config {
+ __u32 off;
+ __u32 len;
+ __u8 buf[0];
+};
+
/* Feature bits */
/* Log all write descriptors. Can be changed while device is active. */
#define VHOST_F_LOG_ALL 26
--
2.20.1

2020-03-26 14:08:51

by Jason Wang

[permalink] [raw]
Subject: [PATCH V9 6/9] virtio: introduce a vDPA based transport

This patch introduces a vDPA transport for virtio. This is used to
use kernel virtio driver to drive the vDPA device that is capable
of populating virtqueue directly.

A new virtio-vdpa driver will be registered to the vDPA bus, when a
new virtio-vdpa device is probed, it will register the device with
vdpa based config ops. This means it is a software transport between
vDPA driver and vDPA device. The transport was implemented through
bus_ops of vDPA parent.

Signed-off-by: Jason Wang <[email protected]>
---
drivers/virtio/Kconfig | 13 ++
drivers/virtio/Makefile | 1 +
drivers/virtio/virtio_vdpa.c | 396 +++++++++++++++++++++++++++++++++++
3 files changed, 410 insertions(+)
create mode 100644 drivers/virtio/virtio_vdpa.c

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 9c4fdb64d9ac..99e424570644 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -43,6 +43,19 @@ config VIRTIO_PCI_LEGACY

If unsure, say Y.

+config VIRTIO_VDPA
+ tristate "vDPA driver for virtio devices"
+ select VDPA
+ select VIRTIO
+ help
+ This driver provides support for virtio based paravirtual
+ device driver over vDPA bus. For this to be useful, you need
+ an appropriate vDPA device implementation that operates on a
+ physical device to allow the datapath of virtio to be
+ offloaded to hardware.
+
+ If unsure, say M.
+
config VIRTIO_PMEM
tristate "Support for virtio pmem driver"
depends on VIRTIO
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index fdf5eacd0d0a..3407ac03fe60 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -6,4 +6,5 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
+obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
obj-$(CONFIG_VDPA) += vdpa/
diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
new file mode 100644
index 000000000000..c30eb55030be
--- /dev/null
+++ b/drivers/virtio/virtio_vdpa.c
@@ -0,0 +1,396 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VIRTIO based driver for vDPA device
+ *
+ * Copyright (c) 2020, Red Hat. All rights reserved.
+ * Author: Jason Wang <[email protected]>
+ *
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/uuid.h>
+#include <linux/virtio.h>
+#include <linux/vdpa.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_ring.h>
+
+#define MOD_VERSION "0.1"
+#define MOD_AUTHOR "Jason Wang <[email protected]>"
+#define MOD_DESC "vDPA bus driver for virtio devices"
+#define MOD_LICENSE "GPL v2"
+
+struct virtio_vdpa_device {
+ struct virtio_device vdev;
+ struct vdpa_device *vdpa;
+ u64 features;
+
+ /* The lock to protect virtqueue list */
+ spinlock_t lock;
+ /* List of virtio_vdpa_vq_info */
+ struct list_head virtqueues;
+};
+
+struct virtio_vdpa_vq_info {
+ /* the actual virtqueue */
+ struct virtqueue *vq;
+
+ /* the list node for the virtqueues list */
+ struct list_head node;
+};
+
+static inline struct virtio_vdpa_device *
+to_virtio_vdpa_device(struct virtio_device *dev)
+{
+ return container_of(dev, struct virtio_vdpa_device, vdev);
+}
+
+static struct vdpa_device *vd_get_vdpa(struct virtio_device *vdev)
+{
+ return to_virtio_vdpa_device(vdev)->vdpa;
+}
+
+static void virtio_vdpa_get(struct virtio_device *vdev, unsigned offset,
+ void *buf, unsigned len)
+{
+ struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ ops->get_config(vdpa, offset, buf, len);
+}
+
+static void virtio_vdpa_set(struct virtio_device *vdev, unsigned offset,
+ const void *buf, unsigned len)
+{
+ struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ ops->set_config(vdpa, offset, buf, len);
+}
+
+static u32 virtio_vdpa_generation(struct virtio_device *vdev)
+{
+ struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ if (ops->get_generation)
+ return ops->get_generation(vdpa);
+
+ return 0;
+}
+
+static u8 virtio_vdpa_get_status(struct virtio_device *vdev)
+{
+ struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ return ops->get_status(vdpa);
+}
+
+static void virtio_vdpa_set_status(struct virtio_device *vdev, u8 status)
+{
+ struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ return ops->set_status(vdpa, status);
+}
+
+static void virtio_vdpa_reset(struct virtio_device *vdev)
+{
+ struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ return ops->set_status(vdpa, 0);
+}
+
+static bool virtio_vdpa_notify(struct virtqueue *vq)
+{
+ struct vdpa_device *vdpa = vd_get_vdpa(vq->vdev);
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ ops->kick_vq(vdpa, vq->index);
+
+ return true;
+}
+
+static irqreturn_t virtio_vdpa_config_cb(void *private)
+{
+ struct virtio_vdpa_device *vd_dev = private;
+
+ virtio_config_changed(&vd_dev->vdev);
+
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t virtio_vdpa_virtqueue_cb(void *private)
+{
+ struct virtio_vdpa_vq_info *info = private;
+
+ return vring_interrupt(0, info->vq);
+}
+
+static struct virtqueue *
+virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned int index,
+ void (*callback)(struct virtqueue *vq),
+ const char *name, bool ctx)
+{
+ struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
+ struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+ const struct vdpa_config_ops *ops = vdpa->config;
+ struct virtio_vdpa_vq_info *info;
+ struct vdpa_callback cb;
+ struct virtqueue *vq;
+ u64 desc_addr, driver_addr, device_addr;
+ unsigned long flags;
+ u32 align, num;
+ int err;
+
+ if (!name)
+ return NULL;
+
+ /* Queue shouldn't already be set up. */
+ if (ops->get_vq_ready(vdpa, index))
+ return ERR_PTR(-ENOENT);
+
+ /* Allocate and fill out our active queue description */
+ info = kmalloc(sizeof(*info), GFP_KERNEL);
+ if (!info)
+ return ERR_PTR(-ENOMEM);
+
+ num = ops->get_vq_num_max(vdpa);
+ if (num == 0) {
+ err = -ENOENT;
+ goto error_new_virtqueue;
+ }
+
+ /* Create the vring */
+ align = ops->get_vq_align(vdpa);
+ vq = vring_create_virtqueue(index, num, align, vdev,
+ true, true, ctx,
+ virtio_vdpa_notify, callback, name);
+ if (!vq) {
+ err = -ENOMEM;
+ goto error_new_virtqueue;
+ }
+
+ /* Setup virtqueue callback */
+ cb.callback = virtio_vdpa_virtqueue_cb;
+ cb.private = info;
+ ops->set_vq_cb(vdpa, index, &cb);
+ ops->set_vq_num(vdpa, index, virtqueue_get_vring_size(vq));
+
+ desc_addr = virtqueue_get_desc_addr(vq);
+ driver_addr = virtqueue_get_avail_addr(vq);
+ device_addr = virtqueue_get_used_addr(vq);
+
+ if (ops->set_vq_address(vdpa, index,
+ desc_addr, driver_addr,
+ device_addr)) {
+ err = -EINVAL;
+ goto err_vq;
+ }
+
+ ops->set_vq_ready(vdpa, index, 1);
+
+ vq->priv = info;
+ info->vq = vq;
+
+ spin_lock_irqsave(&vd_dev->lock, flags);
+ list_add(&info->node, &vd_dev->virtqueues);
+ spin_unlock_irqrestore(&vd_dev->lock, flags);
+
+ return vq;
+
+err_vq:
+ vring_del_virtqueue(vq);
+error_new_virtqueue:
+ ops->set_vq_ready(vdpa, index, 0);
+ /* VDPA driver should make sure vq is stopeed here */
+ WARN_ON(ops->get_vq_ready(vdpa, index));
+ kfree(info);
+ return ERR_PTR(err);
+}
+
+static void virtio_vdpa_del_vq(struct virtqueue *vq)
+{
+ struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vq->vdev);
+ struct vdpa_device *vdpa = vd_dev->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+ struct virtio_vdpa_vq_info *info = vq->priv;
+ unsigned int index = vq->index;
+ unsigned long flags;
+
+ spin_lock_irqsave(&vd_dev->lock, flags);
+ list_del(&info->node);
+ spin_unlock_irqrestore(&vd_dev->lock, flags);
+
+ /* Select and deactivate the queue */
+ ops->set_vq_ready(vdpa, index, 0);
+ WARN_ON(ops->get_vq_ready(vdpa, index));
+
+ vring_del_virtqueue(vq);
+
+ kfree(info);
+}
+
+static void virtio_vdpa_del_vqs(struct virtio_device *vdev)
+{
+ struct virtqueue *vq, *n;
+
+ list_for_each_entry_safe(vq, n, &vdev->vqs, list)
+ virtio_vdpa_del_vq(vq);
+}
+
+static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned nvqs,
+ struct virtqueue *vqs[],
+ vq_callback_t *callbacks[],
+ const char * const names[],
+ const bool *ctx,
+ struct irq_affinity *desc)
+{
+ struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
+ struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+ const struct vdpa_config_ops *ops = vdpa->config;
+ struct vdpa_callback cb;
+ int i, err, queue_idx = 0;
+
+ for (i = 0; i < nvqs; ++i) {
+ if (!names[i]) {
+ vqs[i] = NULL;
+ continue;
+ }
+
+ vqs[i] = virtio_vdpa_setup_vq(vdev, queue_idx++,
+ callbacks[i], names[i], ctx ?
+ ctx[i] : false);
+ if (IS_ERR(vqs[i])) {
+ err = PTR_ERR(vqs[i]);
+ goto err_setup_vq;
+ }
+ }
+
+ cb.callback = virtio_vdpa_config_cb;
+ cb.private = vd_dev;
+ ops->set_config_cb(vdpa, &cb);
+
+ return 0;
+
+err_setup_vq:
+ virtio_vdpa_del_vqs(vdev);
+ return err;
+}
+
+static u64 virtio_vdpa_get_features(struct virtio_device *vdev)
+{
+ struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ return ops->get_features(vdpa);
+}
+
+static int virtio_vdpa_finalize_features(struct virtio_device *vdev)
+{
+ struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ /* Give virtio_ring a chance to accept features. */
+ vring_transport_features(vdev);
+
+ return ops->set_features(vdpa, vdev->features);
+}
+
+static const char *virtio_vdpa_bus_name(struct virtio_device *vdev)
+{
+ struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
+ struct vdpa_device *vdpa = vd_dev->vdpa;
+
+ return dev_name(&vdpa->dev);
+}
+
+static const struct virtio_config_ops virtio_vdpa_config_ops = {
+ .get = virtio_vdpa_get,
+ .set = virtio_vdpa_set,
+ .generation = virtio_vdpa_generation,
+ .get_status = virtio_vdpa_get_status,
+ .set_status = virtio_vdpa_set_status,
+ .reset = virtio_vdpa_reset,
+ .find_vqs = virtio_vdpa_find_vqs,
+ .del_vqs = virtio_vdpa_del_vqs,
+ .get_features = virtio_vdpa_get_features,
+ .finalize_features = virtio_vdpa_finalize_features,
+ .bus_name = virtio_vdpa_bus_name,
+};
+
+static void virtio_vdpa_release_dev(struct device *_d)
+{
+ struct virtio_device *vdev =
+ container_of(_d, struct virtio_device, dev);
+ struct virtio_vdpa_device *vd_dev =
+ container_of(vdev, struct virtio_vdpa_device, vdev);
+
+ kfree(vd_dev);
+}
+
+static int virtio_vdpa_probe(struct vdpa_device *vdpa)
+{
+ const struct vdpa_config_ops *ops = vdpa->config;
+ struct virtio_vdpa_device *vd_dev, *reg_dev = NULL;
+ int ret = -EINVAL;
+
+ vd_dev = kzalloc(sizeof(*vd_dev), GFP_KERNEL);
+ if (!vd_dev)
+ return -ENOMEM;
+
+ vd_dev->vdev.dev.parent = vdpa_get_dma_dev(vdpa);
+ vd_dev->vdev.dev.release = virtio_vdpa_release_dev;
+ vd_dev->vdev.config = &virtio_vdpa_config_ops;
+ vd_dev->vdpa = vdpa;
+ INIT_LIST_HEAD(&vd_dev->virtqueues);
+ spin_lock_init(&vd_dev->lock);
+
+ vd_dev->vdev.id.device = ops->get_device_id(vdpa);
+ if (vd_dev->vdev.id.device == 0)
+ goto err;
+
+ vd_dev->vdev.id.vendor = ops->get_vendor_id(vdpa);
+ ret = register_virtio_device(&vd_dev->vdev);
+ reg_dev = vd_dev;
+ if (ret)
+ goto err;
+
+ vdpa_set_drvdata(vdpa, vd_dev);
+
+ return 0;
+
+err:
+ if (reg_dev)
+ put_device(&vd_dev->vdev.dev);
+ else
+ kfree(vd_dev);
+ return ret;
+}
+
+static void virtio_vdpa_remove(struct vdpa_device *vdpa)
+{
+ struct virtio_vdpa_device *vd_dev = vdpa_get_drvdata(vdpa);
+
+ unregister_virtio_device(&vd_dev->vdev);
+}
+
+static struct vdpa_driver virtio_vdpa_driver = {
+ .driver = {
+ .name = "virtio_vdpa",
+ },
+ .probe = virtio_vdpa_probe,
+ .remove = virtio_vdpa_remove,
+};
+
+module_vdpa_driver(virtio_vdpa_driver);
+
+MODULE_VERSION(MOD_VERSION);
+MODULE_LICENSE(MOD_LICENSE);
+MODULE_AUTHOR(MOD_AUTHOR);
+MODULE_DESCRIPTION(MOD_DESC);
--
2.20.1

2020-03-29 11:08:59

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH V9 0/9] vDPA support

On Thu, Mar 26, 2020 at 10:01:16PM +0800, Jason Wang wrote:
> Hi all:
>
> This is an update version of vDPA support in kernel.

OK I put these in my next tree, with an eye towards merging them in 5.7.
Many thanks to everyone who reviewed the patches, especially to Jason
Gunthorpe for reviewing multiple versions of the patches, helping locate
and address the issues.


> vDPA device is a device that uses a datapath which complies with the
> virtio specifications with vendor specific control path. vDPA devices
> can be both physically located on the hardware or emulated by
> software. vDPA hardware devices are usually implemented through PCIE
> with the following types:
>
> - PF (Physical Function) - A single Physical Function
> - VF (Virtual Function) - Device that supports single root I/O
> virtualization (SR-IOV). Its Virtual Function (VF) represents a
> virtualized instance of the device that can be assigned to different
> partitions
> - ADI (Assignable Device Interface) and its equivalents - With
> technologies such as Intel Scalable IOV, a virtual device (VDEV)
> composed by host OS utilizing one or more ADIs. Or its equivalent
> like SF (Sub function) from Mellanox.
>
> >From a driver's perspective, depends on how and where the DMA
> translation is done, vDPA devices are split into two types:
>
> - Platform specific DMA translation - From the driver's perspective,
> the device can be used on a platform where device access to data in
> memory is limited and/or translated. An example is a PCIE vDPA whose
> DMA request was tagged via a bus (e.g PCIE) specific way. DMA
> translation and protection are done at PCIE bus IOMMU level.
> - Device specific DMA translation - The device implements DMA
> isolation and protection through its own logic. An example is a vDPA
> device which uses on-chip IOMMU.
>
> To hide the differences and complexity of the above types for a vDPA
> device/IOMMU options and in order to present a generic virtio device
> to the upper layer, a device agnostic framework is required.
>
> This series introduces a software vDPA bus which abstracts the
> common attributes of vDPA device, vDPA bus driver and the
> communication method, the bus operations (vdpa_config_ops) between the
> vDPA device abstraction and the vDPA bus driver. This allows multiple
> types of drivers to be used for vDPA device like the virtio_vdpa and
> vhost_vdpa driver to operate on the bus and allow vDPA device could be
> used by either kernel virtio driver or userspace vhost drivers as:
>
> virtio drivers vhost drivers
> | |
> [virtio bus] [vhost uAPI]
> | |
> virtio device vhost device
> virtio_vdpa drv vhost_vdpa drv
> \ /
> [vDPA bus]
> |
> vDPA device
> hardware drv
> |
> [hardware bus]
> |
> vDPA hardware
>
> virtio_vdpa driver is a transport implementation for kernel virtio
> drivers on top of vDPA bus operations. An alternative is to refactor
> virtio bus which is sub-optimal since the bus and drivers are designed
> to be use by kernel subsystem, a non-trivial major refactoring is
> needed which may impact a brunches of drivers and devices
> implementation inside the kernel. Using a new transport may grealy
> simply both the design and changes.
>
> vhost_vdpa driver is a new type of vhost device which allows userspace
> vhost drivers to use vDPA devices via vhost uAPI (with minor
> extension). This help to minimize the changes of existed vhost drivers
> for using vDPA devices.
>
> With the abstraction of vDPA bus and vDPA bus operations, the
> difference and complexity of the under layer hardware is hidden from
> upper layer. The vDPA bus drivers on top can use a unified
> vdpa_config_ops to control different types of vDPA device.
>
> Two drivers were implemented with the framework introduced in this
> series:
>
> - Intel IFC VF driver which depends on the platform IOMMU for DMA
> translation
> - VDPA simulator which is a software test device with an emulated
> onchip IOMMU
>
> Future work:
>
> - direct doorbell mapping support
> - control virtqueue support
> - dirty page tracking support
> - direct interrupt support
> - management API (devlink)
>
> Please review.
>
> Thanks
>
> Changes from V8:
>
> - switch to use devres for PCI resoureces of IFCVF (Jason)
> - require the caller of vdap_alloc_device() to use "struct foo"
> instead of foo (Jason)
> - some tweaks on the IFCVF driver
>
> Changes from V7:
>
> - refine kconfig to solve the dependency issues on archs that lacks of
> CONFIG_VIRTUALIZATION (kbuild)
>
> Changes from V6:
>
> - vdpa_alloc_device() will allocate parent strcture (Jason)
> - remove the vdpa core dev info in IFCVF patch (Jason)
> - provide a free method in the vdpa bus operations for drivet to free
> private data
> - switch to use new layout in vdapsim and IFCVF
> - make IFCVF depends on PCI_MSI (kbuild)
> - some tweaks on the IFCVF driver
>
> Changes from V5:
>
> - include Intel IFCVF driver and vhost-vdpa drivers
> - add the platform IOMMU support for vhost-vdpa
> - check the return value of dev_set_name() (Jason)
> - various tweaks and fixes
>
> Changes from V4:
>
> - use put_device() instead of kfree when fail to register virtio
> device (Jason)
> - simplify the error handling when allocating vdpasim device (Jason)
> - don't use device_for_each_child() during module exit (Jason)
> - correct the error checking for vdpa_alloc_device() (Harpreet, Lingshan)
>
> Changes from V3:
>
> - various Kconfig fixes (Randy)
>
> Changes from V2:
>
> - release idr in the release function for put_device() unwind (Jason)
> - don't panic when fail to register vdpa bus (Jason)
> - use unsigned int instead of int for ida (Jason)
> - fix the wrong commit log in virito_vdpa patches (Jason)
> - make vdpa_sim depends on RUNTIME_TESTING_MENU (Michael)
> - provide a bus release function for vDPA device (Jason)
> - fix the wrong unwind when creating devices for vDPA simulator (Jason)
> - move vDPA simulator to a dedicated directory (Lingshan)
> - cancel the work before release vDPA simulator
>
> Changes from V1:
>
> - drop sysfs API, leave the management interface to future development
> (Michael)
> - introduce incremental DMA ops (dma_map/dma_unmap) (Michael)
> - introduce dma_device and use it instead of parent device for doing
> IOMMU or DMA from bus driver (Michael, Jason, Ling Shan, Tiwei)
> - accept parent device and dma device when register vdpa device
> - coding style and compile fixes (Randy)
> - using vdpa_xxx instead of xxx_vdpa (Jason)
> - ove vDPA accessors to header and make it static inline (Jason)
> - split vdp_register_device() into two helpers vdpa_init_device() and
> vdpa_register_device() which allows intermediate step to be done (Jason)
> - warn on invalidate queue state when fail to creating virtqueue (Jason)
> - make to_virtio_vdpa_device() static (Jason)
> - use kmalloc/kfree instead of devres for virtio vdpa device (Jason)
> - avoid using cast in vdpa bus function (Jason)
> - introduce module_vdpa_driver and fix module refcnt (Jason)
> - fix returning freed address in vdapsim coherent DMA addr allocation (Dan)
> - various other fixes and tweaks
>
> V8: https://lkml.org/lkml/2020/3/25/125
> V7: https://lkml.org/lkml/2020/3/24/21
> V6: https://lkml.org/lkml/2020/3/18/88
> V5: https://lkml.org/lkml/2020/2/26/58
> V4: https://lkml.org/lkml/2020/2/20/59
> V3: https://lkml.org/lkml/2020/2/19/1347
> V2: https://lkml.org/lkml/2020/2/9/275
> V1: https://lkml.org/lkml/2020/1/16/353
>
> Jason Wang (7):
> vhost: refine vhost and vringh kconfig
> vhost: allow per device message handler
> vhost: factor out IOTLB
> vringh: IOTLB support
> vDPA: introduce vDPA bus
> virtio: introduce a vDPA based transport
> vdpasim: vDPA device simulator
>
> Tiwei Bie (1):
> vhost: introduce vDPA-based backend
>
> Zhu Lingshan (1):
> virtio: Intel IFC VF driver for VDPA
>
> MAINTAINERS | 2 +
> arch/arm/kvm/Kconfig | 2 -
> arch/arm64/kvm/Kconfig | 2 -
> arch/mips/kvm/Kconfig | 2 -
> arch/powerpc/kvm/Kconfig | 2 -
> arch/s390/kvm/Kconfig | 4 -
> arch/x86/kvm/Kconfig | 4 -
> drivers/Kconfig | 2 +
> drivers/misc/mic/Kconfig | 4 -
> drivers/net/caif/Kconfig | 4 -
> drivers/vhost/Kconfig | 42 +-
> drivers/vhost/Kconfig.vringh | 6 -
> drivers/vhost/Makefile | 6 +
> drivers/vhost/iotlb.c | 177 +++++
> drivers/vhost/net.c | 5 +-
> drivers/vhost/scsi.c | 2 +-
> drivers/vhost/vdpa.c | 883 ++++++++++++++++++++++++
> drivers/vhost/vhost.c | 233 +++----
> drivers/vhost/vhost.h | 45 +-
> drivers/vhost/vringh.c | 421 ++++++++++-
> drivers/vhost/vsock.c | 2 +-
> drivers/virtio/Kconfig | 15 +
> drivers/virtio/Makefile | 2 +
> drivers/virtio/vdpa/Kconfig | 37 +
> drivers/virtio/vdpa/Makefile | 4 +
> drivers/virtio/vdpa/ifcvf/Makefile | 3 +
> drivers/virtio/vdpa/ifcvf/ifcvf_base.c | 389 +++++++++++
> drivers/virtio/vdpa/ifcvf/ifcvf_base.h | 118 ++++
> drivers/virtio/vdpa/ifcvf/ifcvf_main.c | 435 ++++++++++++
> drivers/virtio/vdpa/vdpa.c | 180 +++++
> drivers/virtio/vdpa/vdpa_sim/Makefile | 2 +
> drivers/virtio/vdpa/vdpa_sim/vdpa_sim.c | 629 +++++++++++++++++
> drivers/virtio/virtio_vdpa.c | 396 +++++++++++
> include/linux/vdpa.h | 253 +++++++
> include/linux/vhost_iotlb.h | 47 ++
> include/linux/vringh.h | 36 +
> include/uapi/linux/vhost.h | 24 +
> include/uapi/linux/vhost_types.h | 8 +
> 38 files changed, 4180 insertions(+), 248 deletions(-)
> delete mode 100644 drivers/vhost/Kconfig.vringh
> create mode 100644 drivers/vhost/iotlb.c
> create mode 100644 drivers/vhost/vdpa.c
> create mode 100644 drivers/virtio/vdpa/Kconfig
> create mode 100644 drivers/virtio/vdpa/Makefile
> create mode 100644 drivers/virtio/vdpa/ifcvf/Makefile
> create mode 100644 drivers/virtio/vdpa/ifcvf/ifcvf_base.c
> create mode 100644 drivers/virtio/vdpa/ifcvf/ifcvf_base.h
> create mode 100644 drivers/virtio/vdpa/ifcvf/ifcvf_main.c
> create mode 100644 drivers/virtio/vdpa/vdpa.c
> create mode 100644 drivers/virtio/vdpa/vdpa_sim/Makefile
> create mode 100644 drivers/virtio/vdpa/vdpa_sim/vdpa_sim.c
> create mode 100644 drivers/virtio/virtio_vdpa.c
> create mode 100644 include/linux/vdpa.h
> create mode 100644 include/linux/vhost_iotlb.h
>
> --
> 2.20.1

2020-04-09 10:42:30

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH V9 9/9] virtio: Intel IFC VF driver for VDPA

On Thu, Mar 26, 2020 at 3:08 PM Jason Wang <[email protected]> wrote:
>
> From: Zhu Lingshan <[email protected]>
>
> This commit introduced two layers to drive IFC VF:
>
> (1) ifcvf_base layer, which handles IFC VF NIC hardware operations and
> configurations.
>
> (2) ifcvf_main layer, which complies to VDPA bus framework,
> implemented device operations for VDPA bus, handles device probe,
> bus attaching, vring operations, etc.
>
> Signed-off-by: Zhu Lingshan <[email protected]>
> Signed-off-by: Bie Tiwei <[email protected]>
> Signed-off-by: Wang Xiao <[email protected]>
> Signed-off-by: Jason Wang <[email protected]>

> +
> +#define IFCVF_QUEUE_ALIGNMENT PAGE_SIZE
> +#define IFCVF_QUEUE_MAX 32768
> +static u16 ifcvf_vdpa_get_vq_align(struct vdpa_device *vdpa_dev)
> +{
> + return IFCVF_QUEUE_ALIGNMENT;
> +}

This fails to build on arm64 with 64kb page size (found in linux-next):

/drivers/vdpa/ifcvf/ifcvf_main.c: In function 'ifcvf_vdpa_get_vq_align':
arch/arm64/include/asm/page-def.h:17:20: error: conversion from 'long
unsigned int' to 'u16' {aka 'short unsigned int'} changes value from
'65536' to '0' [-Werror=overflow]
17 | #define PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT)
| ^
drivers/vdpa/ifcvf/ifcvf_base.h:37:31: note: in expansion of macro 'PAGE_SIZE'
37 | #define IFCVF_QUEUE_ALIGNMENT PAGE_SIZE
| ^~~~~~~~~
drivers/vdpa/ifcvf/ifcvf_main.c:231:9: note: in expansion of macro
'IFCVF_QUEUE_ALIGNMENT'
231 | return IFCVF_QUEUE_ALIGNMENT;
| ^~~~~~~~~~~~~~~~~~~~~

It's probably good enough to just not allow the driver to be built in that
configuration as it's fairly rare but unfortunately there is no simple Kconfig
symbol for it.

In a similar driver, we did

config VMXNET3
tristate "VMware VMXNET3 ethernet driver"
depends on PCI && INET
depends on !(PAGE_SIZE_64KB || ARM64_64K_PAGES || \
IA64_PAGE_SIZE_64KB || MICROBLAZE_64K_PAGES || \
PARISC_PAGE_SIZE_64KB || PPC_64K_PAGES)

I think we should probably make PAGE_SIZE_64KB a global symbol
in arch/Kconfig and have it selected by the other symbols so drivers
like yours can add a dependency for it.

Arnd

2020-04-09 12:45:08

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH V9 9/9] virtio: Intel IFC VF driver for VDPA


On 2020/4/9 下午6:41, Arnd Bergmann wrote:
> On Thu, Mar 26, 2020 at 3:08 PM Jason Wang <[email protected]> wrote:
>> From: Zhu Lingshan <[email protected]>
>>
>> This commit introduced two layers to drive IFC VF:
>>
>> (1) ifcvf_base layer, which handles IFC VF NIC hardware operations and
>> configurations.
>>
>> (2) ifcvf_main layer, which complies to VDPA bus framework,
>> implemented device operations for VDPA bus, handles device probe,
>> bus attaching, vring operations, etc.
>>
>> Signed-off-by: Zhu Lingshan <[email protected]>
>> Signed-off-by: Bie Tiwei <[email protected]>
>> Signed-off-by: Wang Xiao <[email protected]>
>> Signed-off-by: Jason Wang <[email protected]>
>> +
>> +#define IFCVF_QUEUE_ALIGNMENT PAGE_SIZE
>> +#define IFCVF_QUEUE_MAX 32768
>> +static u16 ifcvf_vdpa_get_vq_align(struct vdpa_device *vdpa_dev)
>> +{
>> + return IFCVF_QUEUE_ALIGNMENT;
>> +}
> This fails to build on arm64 with 64kb page size (found in linux-next):
>
> /drivers/vdpa/ifcvf/ifcvf_main.c: In function 'ifcvf_vdpa_get_vq_align':
> arch/arm64/include/asm/page-def.h:17:20: error: conversion from 'long
> unsigned int' to 'u16' {aka 'short unsigned int'} changes value from
> '65536' to '0' [-Werror=overflow]
> 17 | #define PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT)
> | ^
> drivers/vdpa/ifcvf/ifcvf_base.h:37:31: note: in expansion of macro 'PAGE_SIZE'
> 37 | #define IFCVF_QUEUE_ALIGNMENT PAGE_SIZE
> | ^~~~~~~~~
> drivers/vdpa/ifcvf/ifcvf_main.c:231:9: note: in expansion of macro
> 'IFCVF_QUEUE_ALIGNMENT'
> 231 | return IFCVF_QUEUE_ALIGNMENT;
> | ^~~~~~~~~~~~~~~~~~~~~
>
> It's probably good enough to just not allow the driver to be built in that
> configuration as it's fairly rare but unfortunately there is no simple Kconfig
> symbol for it.


Or I think the 64KB alignment is probably more than enough.

Ling Shan, can we use smaller value here?

Thanks


>
> In a similar driver, we did
>
> config VMXNET3
> tristate "VMware VMXNET3 ethernet driver"
> depends on PCI && INET
> depends on !(PAGE_SIZE_64KB || ARM64_64K_PAGES || \
> IA64_PAGE_SIZE_64KB || MICROBLAZE_64K_PAGES || \
> PARISC_PAGE_SIZE_64KB || PPC_64K_PAGES)
>
> I think we should probably make PAGE_SIZE_64KB a global symbol
> in arch/Kconfig and have it selected by the other symbols so drivers
> like yours can add a dependency for it.
>
> Arnd
>

2020-04-09 13:31:58

by Zhu, Lingshan

[permalink] [raw]
Subject: Re: [PATCH V9 9/9] virtio: Intel IFC VF driver for VDPA


On 4/9/2020 8:43 PM, Jason Wang wrote:
>
> On 2020/4/9 下午6:41, Arnd Bergmann wrote:
>> On Thu, Mar 26, 2020 at 3:08 PM Jason Wang <[email protected]> wrote:
>>> From: Zhu Lingshan <[email protected]>
>>>
>>> This commit introduced two layers to drive IFC VF:
>>>
>>> (1) ifcvf_base layer, which handles IFC VF NIC hardware operations and
>>>      configurations.
>>>
>>> (2) ifcvf_main layer, which complies to VDPA bus framework,
>>>      implemented device operations for VDPA bus, handles device probe,
>>>      bus attaching, vring operations, etc.
>>>
>>> Signed-off-by: Zhu Lingshan <[email protected]>
>>> Signed-off-by: Bie Tiwei <[email protected]>
>>> Signed-off-by: Wang Xiao <[email protected]>
>>> Signed-off-by: Jason Wang <[email protected]>
>>> +
>>> +#define IFCVF_QUEUE_ALIGNMENT  PAGE_SIZE
>>> +#define IFCVF_QUEUE_MAX                32768
>>> +static u16 ifcvf_vdpa_get_vq_align(struct vdpa_device *vdpa_dev)
>>> +{
>>> +       return IFCVF_QUEUE_ALIGNMENT;
>>> +}
>> This fails to build on arm64 with 64kb page size (found in linux-next):
>>
>> /drivers/vdpa/ifcvf/ifcvf_main.c: In function 'ifcvf_vdpa_get_vq_align':
>> arch/arm64/include/asm/page-def.h:17:20: error: conversion from 'long
>> unsigned int' to 'u16' {aka 'short unsigned int'} changes value from
>> '65536' to '0' [-Werror=overflow]
>>     17 | #define PAGE_SIZE  (_AC(1, UL) << PAGE_SHIFT)
>>        |                    ^
>> drivers/vdpa/ifcvf/ifcvf_base.h:37:31: note: in expansion of macro
>> 'PAGE_SIZE'
>>     37 | #define IFCVF_QUEUE_ALIGNMENT PAGE_SIZE
>>        |                               ^~~~~~~~~
>> drivers/vdpa/ifcvf/ifcvf_main.c:231:9: note: in expansion of macro
>> 'IFCVF_QUEUE_ALIGNMENT'
>>    231 |  return IFCVF_QUEUE_ALIGNMENT;
>>        |         ^~~~~~~~~~~~~~~~~~~~~
>>
>> It's probably good enough to just not allow the driver to be built in
>> that
>> configuration as it's fairly rare but unfortunately there is no
>> simple Kconfig
>> symbol for it.
>
>
> Or I think the 64KB alignment is probably more than enough.
>
> Ling Shan, can we use smaller value here?
>
> Thanks

sure, we just need it page aligned, this value is only used in
get_vq_align(). I will try to find a arm64 machine and debug on this issue

Thanks!

>
>
>>
>> In a similar driver, we did
>>
>> config VMXNET3
>>          tristate "VMware VMXNET3 ethernet driver"
>>          depends on PCI && INET
>>          depends on !(PAGE_SIZE_64KB || ARM64_64K_PAGES || \
>>                       IA64_PAGE_SIZE_64KB || MICROBLAZE_64K_PAGES || \
>>                       PARISC_PAGE_SIZE_64KB || PPC_64K_PAGES)
>>
>> I think we should probably make PAGE_SIZE_64KB a global symbol
>> in arch/Kconfig and have it selected by the other symbols so drivers
>> like yours can add a dependency for it.
>>
>>           Arnd
>>
>

2020-04-09 21:31:05

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH V9 9/9] virtio: Intel IFC VF driver for VDPA

On Thu, Apr 09, 2020 at 12:41:13PM +0200, Arnd Bergmann wrote:
> On Thu, Mar 26, 2020 at 3:08 PM Jason Wang <[email protected]> wrote:
> >
> > From: Zhu Lingshan <[email protected]>
> >
> > This commit introduced two layers to drive IFC VF:
> >
> > (1) ifcvf_base layer, which handles IFC VF NIC hardware operations and
> > configurations.
> >
> > (2) ifcvf_main layer, which complies to VDPA bus framework,
> > implemented device operations for VDPA bus, handles device probe,
> > bus attaching, vring operations, etc.
> >
> > Signed-off-by: Zhu Lingshan <[email protected]>
> > Signed-off-by: Bie Tiwei <[email protected]>
> > Signed-off-by: Wang Xiao <[email protected]>
> > Signed-off-by: Jason Wang <[email protected]>
>
> > +
> > +#define IFCVF_QUEUE_ALIGNMENT PAGE_SIZE
> > +#define IFCVF_QUEUE_MAX 32768
> > +static u16 ifcvf_vdpa_get_vq_align(struct vdpa_device *vdpa_dev)
> > +{
> > + return IFCVF_QUEUE_ALIGNMENT;
> > +}
>
> This fails to build on arm64 with 64kb page size (found in linux-next):
>
> /drivers/vdpa/ifcvf/ifcvf_main.c: In function 'ifcvf_vdpa_get_vq_align':
> arch/arm64/include/asm/page-def.h:17:20: error: conversion from 'long
> unsigned int' to 'u16' {aka 'short unsigned int'} changes value from
> '65536' to '0' [-Werror=overflow]
> 17 | #define PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT)
> | ^
> drivers/vdpa/ifcvf/ifcvf_base.h:37:31: note: in expansion of macro 'PAGE_SIZE'
> 37 | #define IFCVF_QUEUE_ALIGNMENT PAGE_SIZE
> | ^~~~~~~~~
> drivers/vdpa/ifcvf/ifcvf_main.c:231:9: note: in expansion of macro
> 'IFCVF_QUEUE_ALIGNMENT'
> 231 | return IFCVF_QUEUE_ALIGNMENT;
> | ^~~~~~~~~~~~~~~~~~~~~
>
> It's probably good enough to just not allow the driver to be built in that
> configuration as it's fairly rare but unfortunately there is no simple Kconfig
> symbol for it.
>
> In a similar driver, we did
>
> config VMXNET3
> tristate "VMware VMXNET3 ethernet driver"
> depends on PCI && INET
> depends on !(PAGE_SIZE_64KB || ARM64_64K_PAGES || \
> IA64_PAGE_SIZE_64KB || MICROBLAZE_64K_PAGES || \
> PARISC_PAGE_SIZE_64KB || PPC_64K_PAGES)
>
> I think we should probably make PAGE_SIZE_64KB a global symbol
> in arch/Kconfig and have it selected by the other symbols so drivers
> like yours can add a dependency for it.
>
> Arnd

It's probably easier to make the alignment u32 - I don't really know why it's
u16, all callers seem to assign the result to a u32 value.

2020-04-10 03:16:54

by Zhu, Lingshan

[permalink] [raw]
Subject: Re: [PATCH V9 9/9] virtio: Intel IFC VF driver for VDPA


On 4/10/2020 4:25 AM, Michael S. Tsirkin wrote:
> On Thu, Apr 09, 2020 at 12:41:13PM +0200, Arnd Bergmann wrote:
>> On Thu, Mar 26, 2020 at 3:08 PM Jason Wang <[email protected]> wrote:
>>> From: Zhu Lingshan <[email protected]>
>>>
>>> This commit introduced two layers to drive IFC VF:
>>>
>>> (1) ifcvf_base layer, which handles IFC VF NIC hardware operations and
>>> configurations.
>>>
>>> (2) ifcvf_main layer, which complies to VDPA bus framework,
>>> implemented device operations for VDPA bus, handles device probe,
>>> bus attaching, vring operations, etc.
>>>
>>> Signed-off-by: Zhu Lingshan <[email protected]>
>>> Signed-off-by: Bie Tiwei <[email protected]>
>>> Signed-off-by: Wang Xiao <[email protected]>
>>> Signed-off-by: Jason Wang <[email protected]>
>>> +
>>> +#define IFCVF_QUEUE_ALIGNMENT PAGE_SIZE
>>> +#define IFCVF_QUEUE_MAX 32768
>>> +static u16 ifcvf_vdpa_get_vq_align(struct vdpa_device *vdpa_dev)
>>> +{
>>> + return IFCVF_QUEUE_ALIGNMENT;
>>> +}
>> This fails to build on arm64 with 64kb page size (found in linux-next):
>>
>> /drivers/vdpa/ifcvf/ifcvf_main.c: In function 'ifcvf_vdpa_get_vq_align':
>> arch/arm64/include/asm/page-def.h:17:20: error: conversion from 'long
>> unsigned int' to 'u16' {aka 'short unsigned int'} changes value from
>> '65536' to '0' [-Werror=overflow]
>> 17 | #define PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT)
>> | ^
>> drivers/vdpa/ifcvf/ifcvf_base.h:37:31: note: in expansion of macro 'PAGE_SIZE'
>> 37 | #define IFCVF_QUEUE_ALIGNMENT PAGE_SIZE
>> | ^~~~~~~~~
>> drivers/vdpa/ifcvf/ifcvf_main.c:231:9: note: in expansion of macro
>> 'IFCVF_QUEUE_ALIGNMENT'
>> 231 | return IFCVF_QUEUE_ALIGNMENT;
>> | ^~~~~~~~~~~~~~~~~~~~~
>>
>> It's probably good enough to just not allow the driver to be built in that
>> configuration as it's fairly rare but unfortunately there is no simple Kconfig
>> symbol for it.
>>
>> In a similar driver, we did
>>
>> config VMXNET3
>> tristate "VMware VMXNET3 ethernet driver"
>> depends on PCI && INET
>> depends on !(PAGE_SIZE_64KB || ARM64_64K_PAGES || \
>> IA64_PAGE_SIZE_64KB || MICROBLAZE_64K_PAGES || \
>> PARISC_PAGE_SIZE_64KB || PPC_64K_PAGES)
>>
>> I think we should probably make PAGE_SIZE_64KB a global symbol
>> in arch/Kconfig and have it selected by the other symbols so drivers
>> like yours can add a dependency for it.
>>
>> Arnd
> It's probably easier to make the alignment u32 - I don't really know why it's
> u16, all callers seem to assign the result to a u32 value.

Thanks Michael!

@ Arnd, would you please kindly have a try on the attached patch? I
failed to find an armv8 or above platform for now.

Thanks
BR
Zhu Lingshan


Attachments:
0001-vdpa-change-get_vq_align-to-u32-alignment.patch (2.03 kB)

2021-11-01 14:14:17

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH V9 7/9] vhost: introduce vDPA-based backend

On Thu, Mar 26, 2020 at 10:01:23PM +0800, Jason Wang wrote:
> From: Tiwei Bie <[email protected]>
>
> This patch introduces a vDPA-based vhost backend. This backend is
> built on top of the same interface defined in virtio-vDPA and provides
> a generic vhost interface for userspace to accelerate the virtio
> devices in guest.
>
> This backend is implemented as a vDPA device driver on top of the same
> ops used in virtio-vDPA. It will create char device entry named
> vhost-vdpa-$index for userspace to use. Userspace can use vhost ioctls
> on top of this char device to setup the backend.
>
> Vhost ioctls are extended to make it type agnostic and behave like a
> virtio device, this help to eliminate type specific API like what
> vhost_net/scsi/vsock did:
>
> - VHOST_VDPA_GET_DEVICE_ID: get the virtio device ID which is defined
> by virtio specification to differ from different type of devices
> - VHOST_VDPA_GET_VRING_NUM: get the maximum size of virtqueue
> supported by the vDPA device
> - VHSOT_VDPA_SET/GET_STATUS: set and get virtio status of vDPA device
> - VHOST_VDPA_SET/GET_CONFIG: access virtio config space
> - VHOST_VDPA_SET_VRING_ENABLE: enable a specific virtqueue
>
> For memory mapping, IOTLB API is mandated for vhost-vDPA which means
> userspace drivers are required to use
> VHOST_IOTLB_UPDATE/VHOST_IOTLB_INVALIDATE to add or remove mapping for
> a specific userspace memory region.
>
> The vhost-vDPA API is designed to be type agnostic, but it allows net
> device only in current stage. Due to the lacking of control virtqueue
> support, some features were filter out by vhost-vdpa.
>
> We will enable more features and devices in the near future.

[..]

> +static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v)
> +{
> + struct vdpa_device *vdpa = v->vdpa;
> + const struct vdpa_config_ops *ops = vdpa->config;
> + struct device *dma_dev = vdpa_get_dma_dev(vdpa);
> + struct bus_type *bus;
> + int ret;
> +
> + /* Device want to do DMA by itself */
> + if (ops->set_map || ops->dma_map)
> + return 0;
> +
> + bus = dma_dev->bus;
> + if (!bus)
> + return -EFAULT;
> +
> + if (!iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))
> + return -ENOTSUPP;
> +
> + v->domain = iommu_domain_alloc(bus);
> + if (!v->domain)
> + return -EIO;
> +
> + ret = iommu_attach_device(v->domain, dma_dev);
> + if (ret)
> + goto err_attach;
>

I've been looking at the security of iommu_attach_device() users, and
I wonder if this is safe?

The security question is if userspace is able to control the DMA
address the devices uses? Eg if any of the cpu to device ring's are in
userspace memory?

For instance if userspace can tell the device to send a packet from an
arbitrary user controlled address.

Thanks,
Jason

2021-11-02 03:55:37

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH V9 7/9] vhost: introduce vDPA-based backend

On Mon, Nov 1, 2021 at 10:11 PM Jason Gunthorpe <[email protected]> wrote:
>
> On Thu, Mar 26, 2020 at 10:01:23PM +0800, Jason Wang wrote:
> > From: Tiwei Bie <[email protected]>
> >
> > This patch introduces a vDPA-based vhost backend. This backend is
> > built on top of the same interface defined in virtio-vDPA and provides
> > a generic vhost interface for userspace to accelerate the virtio
> > devices in guest.
> >
> > This backend is implemented as a vDPA device driver on top of the same
> > ops used in virtio-vDPA. It will create char device entry named
> > vhost-vdpa-$index for userspace to use. Userspace can use vhost ioctls
> > on top of this char device to setup the backend.
> >
> > Vhost ioctls are extended to make it type agnostic and behave like a
> > virtio device, this help to eliminate type specific API like what
> > vhost_net/scsi/vsock did:
> >
> > - VHOST_VDPA_GET_DEVICE_ID: get the virtio device ID which is defined
> > by virtio specification to differ from different type of devices
> > - VHOST_VDPA_GET_VRING_NUM: get the maximum size of virtqueue
> > supported by the vDPA device
> > - VHSOT_VDPA_SET/GET_STATUS: set and get virtio status of vDPA device
> > - VHOST_VDPA_SET/GET_CONFIG: access virtio config space
> > - VHOST_VDPA_SET_VRING_ENABLE: enable a specific virtqueue
> >
> > For memory mapping, IOTLB API is mandated for vhost-vDPA which means
> > userspace drivers are required to use
> > VHOST_IOTLB_UPDATE/VHOST_IOTLB_INVALIDATE to add or remove mapping for
> > a specific userspace memory region.
> >
> > The vhost-vDPA API is designed to be type agnostic, but it allows net
> > device only in current stage. Due to the lacking of control virtqueue
> > support, some features were filter out by vhost-vdpa.
> >
> > We will enable more features and devices in the near future.
>
> [..]
>
> > +static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v)
> > +{
> > + struct vdpa_device *vdpa = v->vdpa;
> > + const struct vdpa_config_ops *ops = vdpa->config;
> > + struct device *dma_dev = vdpa_get_dma_dev(vdpa);
> > + struct bus_type *bus;
> > + int ret;
> > +
> > + /* Device want to do DMA by itself */
> > + if (ops->set_map || ops->dma_map)
> > + return 0;
> > +
> > + bus = dma_dev->bus;
> > + if (!bus)
> > + return -EFAULT;
> > +
> > + if (!iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))
> > + return -ENOTSUPP;
> > +
> > + v->domain = iommu_domain_alloc(bus);
> > + if (!v->domain)
> > + return -EIO;
> > +
> > + ret = iommu_attach_device(v->domain, dma_dev);
> > + if (ret)
> > + goto err_attach;
> >
>
> I've been looking at the security of iommu_attach_device() users, and
> I wonder if this is safe?
>
> The security question is if userspace is able to control the DMA
> address the devices uses? Eg if any of the cpu to device ring's are in
> userspace memory?
>
> For instance if userspace can tell the device to send a packet from an
> arbitrary user controlled address.

The map is validated via pin_user_pages() which guarantees that the
address is not arbitrary and must belong to userspace?

Thanks

>
> Thanks,
> Jason
>

2021-11-02 15:57:48

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH V9 7/9] vhost: introduce vDPA-based backend

On Tue, Nov 02, 2021 at 11:52:20AM +0800, Jason Wang wrote:
> On Mon, Nov 1, 2021 at 10:11 PM Jason Gunthorpe <[email protected]> wrote:
> >
> > On Thu, Mar 26, 2020 at 10:01:23PM +0800, Jason Wang wrote:
> > > From: Tiwei Bie <[email protected]>
> > >
> > > This patch introduces a vDPA-based vhost backend. This backend is
> > > built on top of the same interface defined in virtio-vDPA and provides
> > > a generic vhost interface for userspace to accelerate the virtio
> > > devices in guest.
> > >
> > > This backend is implemented as a vDPA device driver on top of the same
> > > ops used in virtio-vDPA. It will create char device entry named
> > > vhost-vdpa-$index for userspace to use. Userspace can use vhost ioctls
> > > on top of this char device to setup the backend.
> > >
> > > Vhost ioctls are extended to make it type agnostic and behave like a
> > > virtio device, this help to eliminate type specific API like what
> > > vhost_net/scsi/vsock did:
> > >
> > > - VHOST_VDPA_GET_DEVICE_ID: get the virtio device ID which is defined
> > > by virtio specification to differ from different type of devices
> > > - VHOST_VDPA_GET_VRING_NUM: get the maximum size of virtqueue
> > > supported by the vDPA device
> > > - VHSOT_VDPA_SET/GET_STATUS: set and get virtio status of vDPA device
> > > - VHOST_VDPA_SET/GET_CONFIG: access virtio config space
> > > - VHOST_VDPA_SET_VRING_ENABLE: enable a specific virtqueue
> > >
> > > For memory mapping, IOTLB API is mandated for vhost-vDPA which means
> > > userspace drivers are required to use
> > > VHOST_IOTLB_UPDATE/VHOST_IOTLB_INVALIDATE to add or remove mapping for
> > > a specific userspace memory region.
> > >
> > > The vhost-vDPA API is designed to be type agnostic, but it allows net
> > > device only in current stage. Due to the lacking of control virtqueue
> > > support, some features were filter out by vhost-vdpa.
> > >
> > > We will enable more features and devices in the near future.
> >
> > [..]
> >
> > > +static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v)
> > > +{
> > > + struct vdpa_device *vdpa = v->vdpa;
> > > + const struct vdpa_config_ops *ops = vdpa->config;
> > > + struct device *dma_dev = vdpa_get_dma_dev(vdpa);
> > > + struct bus_type *bus;
> > > + int ret;
> > > +
> > > + /* Device want to do DMA by itself */
> > > + if (ops->set_map || ops->dma_map)
> > > + return 0;
> > > +
> > > + bus = dma_dev->bus;
> > > + if (!bus)
> > > + return -EFAULT;
> > > +
> > > + if (!iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))
> > > + return -ENOTSUPP;
> > > +
> > > + v->domain = iommu_domain_alloc(bus);
> > > + if (!v->domain)
> > > + return -EIO;
> > > +
> > > + ret = iommu_attach_device(v->domain, dma_dev);
> > > + if (ret)
> > > + goto err_attach;
> > >
> >
> > I've been looking at the security of iommu_attach_device() users, and
> > I wonder if this is safe?
> >
> > The security question is if userspace is able to control the DMA
> > address the devices uses? Eg if any of the cpu to device ring's are in
> > userspace memory?
> >
> > For instance if userspace can tell the device to send a packet from an
> > arbitrary user controlled address.
>
> The map is validated via pin_user_pages() which guarantees that the
> address is not arbitrary and must belong to userspace?

That controls what gets put into the IOMMU, it doesn't restrict what
DMA the device itself can issue.

Upon investigating more it seems the answer is that
iommu_attach_device() requires devices to be in singleton groups, so
there is no leakage from rouge DMA

Jason

2021-11-03 07:36:36

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH V9 7/9] vhost: introduce vDPA-based backend

On Tue, Nov 2, 2021 at 11:56 PM Jason Gunthorpe <[email protected]> wrote:
>
> On Tue, Nov 02, 2021 at 11:52:20AM +0800, Jason Wang wrote:
> > On Mon, Nov 1, 2021 at 10:11 PM Jason Gunthorpe <[email protected]> wrote:
> > >
> > > On Thu, Mar 26, 2020 at 10:01:23PM +0800, Jason Wang wrote:
> > > > From: Tiwei Bie <[email protected]>
> > > >
> > > > This patch introduces a vDPA-based vhost backend. This backend is
> > > > built on top of the same interface defined in virtio-vDPA and provides
> > > > a generic vhost interface for userspace to accelerate the virtio
> > > > devices in guest.
> > > >
> > > > This backend is implemented as a vDPA device driver on top of the same
> > > > ops used in virtio-vDPA. It will create char device entry named
> > > > vhost-vdpa-$index for userspace to use. Userspace can use vhost ioctls
> > > > on top of this char device to setup the backend.
> > > >
> > > > Vhost ioctls are extended to make it type agnostic and behave like a
> > > > virtio device, this help to eliminate type specific API like what
> > > > vhost_net/scsi/vsock did:
> > > >
> > > > - VHOST_VDPA_GET_DEVICE_ID: get the virtio device ID which is defined
> > > > by virtio specification to differ from different type of devices
> > > > - VHOST_VDPA_GET_VRING_NUM: get the maximum size of virtqueue
> > > > supported by the vDPA device
> > > > - VHSOT_VDPA_SET/GET_STATUS: set and get virtio status of vDPA device
> > > > - VHOST_VDPA_SET/GET_CONFIG: access virtio config space
> > > > - VHOST_VDPA_SET_VRING_ENABLE: enable a specific virtqueue
> > > >
> > > > For memory mapping, IOTLB API is mandated for vhost-vDPA which means
> > > > userspace drivers are required to use
> > > > VHOST_IOTLB_UPDATE/VHOST_IOTLB_INVALIDATE to add or remove mapping for
> > > > a specific userspace memory region.
> > > >
> > > > The vhost-vDPA API is designed to be type agnostic, but it allows net
> > > > device only in current stage. Due to the lacking of control virtqueue
> > > > support, some features were filter out by vhost-vdpa.
> > > >
> > > > We will enable more features and devices in the near future.
> > >
> > > [..]
> > >
> > > > +static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v)
> > > > +{
> > > > + struct vdpa_device *vdpa = v->vdpa;
> > > > + const struct vdpa_config_ops *ops = vdpa->config;
> > > > + struct device *dma_dev = vdpa_get_dma_dev(vdpa);
> > > > + struct bus_type *bus;
> > > > + int ret;
> > > > +
> > > > + /* Device want to do DMA by itself */
> > > > + if (ops->set_map || ops->dma_map)
> > > > + return 0;
> > > > +
> > > > + bus = dma_dev->bus;
> > > > + if (!bus)
> > > > + return -EFAULT;
> > > > +
> > > > + if (!iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))
> > > > + return -ENOTSUPP;
> > > > +
> > > > + v->domain = iommu_domain_alloc(bus);
> > > > + if (!v->domain)
> > > > + return -EIO;
> > > > +
> > > > + ret = iommu_attach_device(v->domain, dma_dev);
> > > > + if (ret)
> > > > + goto err_attach;
> > > >
> > >
> > > I've been looking at the security of iommu_attach_device() users, and
> > > I wonder if this is safe?
> > >
> > > The security question is if userspace is able to control the DMA
> > > address the devices uses? Eg if any of the cpu to device ring's are in
> > > userspace memory?
> > >
> > > For instance if userspace can tell the device to send a packet from an
> > > arbitrary user controlled address.
> >
> > The map is validated via pin_user_pages() which guarantees that the
> > address is not arbitrary and must belong to userspace?
>
> That controls what gets put into the IOMMU, it doesn't restrict what
> DMA the device itself can issue.
>
> Upon investigating more it seems the answer is that
> iommu_attach_device() requires devices to be in singleton groups, so
> there is no leakage from rouge DMA

Yes, I think so.

Thanks

>
> Jason
>