2018-08-01 10:22:14

by Kenneth Lee

[permalink] [raw]
Subject: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

From: Kenneth Lee <[email protected]>

WarpDrive is an accelerator framework to expose the hardware capabilities
directly to the user space. It makes use of the exist vfio and vfio-mdev
facilities. So the user application can send request and DMA to the
hardware without interaction with the kernel. This remove the latency
of syscall and context switch.

The patchset contains documents for the detail. Please refer to it for more
information.

This patchset is intended to be used with Jean Philippe Brucker's SVA
patch [1] (Which is also in RFC stage). But it is not mandatory. This
patchset is tested in the latest mainline kernel without the SVA patches.
So it support only one process for each accelerator.

With SVA support, WarpDrive can support multi-process in the same
accelerator device. We tested it in our SoC integrated Accelerator (board
ID: D06, Chip ID: HIP08). A reference work tree can be found here: [2].

We have noticed the IOMMU aware mdev RFC announced recently [3].

The IOMMU aware mdev has similar idea but different intention comparing to
WarpDrive. It intends to dedicate part of the hardware resource to a VM.
And the design is supposed to be used with Scalable I/O Virtualization.
While spimdev is intended to share the hardware resource with a big amount
of processes. It just requires the hardware supporting address
translation per process (PCIE's PASID or ARM SMMU's substream ID).

But we don't see serious confliction on both design. We believe they can be
normalized as one.

The patch 1 is document. The patch 2 and 3 add spimdev support. The patch
4, 5 and 6 is drivers for Hislicon's ZIP Accelerator which is registered to
both crypto and warpdrive(spimdev) and can be used from kernel or user
space at the same time. The patch 7 is a user space sample demonstrating
how WarpDrive works.

Refernces:
[1] https://www.spinics.net/lists/kernel/msg2651481.html
[2] https://github.com/Kenneth-Lee/linux-kernel-warpdrive/tree/warpdrive-sva-v0.5
[3] https://lkml.org/lkml/2018/7/22/34

Best Regards
Kenneth Lee

Kenneth Lee (7):
vfio/spimdev: Add documents for WarpDrive framework
iommu: Add share domain interface in iommu for spimdev
vfio: add spimdev support
crypto: add hisilicon Queue Manager driver
crypto: Add Hisilicon Zip driver
crypto: add spimdev support to Hisilicon QM
vfio/spimdev: add user sample for spimdev

Documentation/00-INDEX | 2 +
Documentation/warpdrive/warpdrive.rst | 153 ++++
Documentation/warpdrive/wd-arch.svg | 732 +++++++++++++++
Documentation/warpdrive/wd.svg | 526 +++++++++++
drivers/crypto/Kconfig | 2 +
drivers/crypto/Makefile | 1 +
drivers/crypto/hisilicon/Kconfig | 15 +
drivers/crypto/hisilicon/Makefile | 2 +
drivers/crypto/hisilicon/qm.c | 1005 +++++++++++++++++++++
drivers/crypto/hisilicon/qm.h | 123 +++
drivers/crypto/hisilicon/zip/Makefile | 2 +
drivers/crypto/hisilicon/zip/zip.h | 55 ++
drivers/crypto/hisilicon/zip/zip_crypto.c | 358 ++++++++
drivers/crypto/hisilicon/zip/zip_crypto.h | 18 +
drivers/crypto/hisilicon/zip/zip_main.c | 182 ++++
drivers/iommu/iommu.c | 28 +-
drivers/vfio/Kconfig | 1 +
drivers/vfio/Makefile | 1 +
drivers/vfio/spimdev/Kconfig | 10 +
drivers/vfio/spimdev/Makefile | 3 +
drivers/vfio/spimdev/vfio_spimdev.c | 421 +++++++++
drivers/vfio/vfio_iommu_type1.c | 136 ++-
include/linux/iommu.h | 2 +
include/linux/vfio_spimdev.h | 95 ++
include/uapi/linux/vfio_spimdev.h | 28 +
samples/warpdrive/AUTHORS | 2 +
samples/warpdrive/ChangeLog | 1 +
samples/warpdrive/Makefile.am | 9 +
samples/warpdrive/NEWS | 1 +
samples/warpdrive/README | 32 +
samples/warpdrive/autogen.sh | 3 +
samples/warpdrive/cleanup.sh | 13 +
samples/warpdrive/configure.ac | 52 ++
samples/warpdrive/drv/hisi_qm_udrv.c | 223 +++++
samples/warpdrive/drv/hisi_qm_udrv.h | 53 ++
samples/warpdrive/test/Makefile.am | 7 +
samples/warpdrive/test/comp_hw.h | 23 +
samples/warpdrive/test/test_hisi_zip.c | 204 +++++
samples/warpdrive/wd.c | 325 +++++++
samples/warpdrive/wd.h | 153 ++++
samples/warpdrive/wd_adapter.c | 74 ++
samples/warpdrive/wd_adapter.h | 43 +
42 files changed, 5112 insertions(+), 7 deletions(-)
create mode 100644 Documentation/warpdrive/warpdrive.rst
create mode 100644 Documentation/warpdrive/wd-arch.svg
create mode 100644 Documentation/warpdrive/wd.svg
create mode 100644 drivers/crypto/hisilicon/Kconfig
create mode 100644 drivers/crypto/hisilicon/Makefile
create mode 100644 drivers/crypto/hisilicon/qm.c
create mode 100644 drivers/crypto/hisilicon/qm.h
create mode 100644 drivers/crypto/hisilicon/zip/Makefile
create mode 100644 drivers/crypto/hisilicon/zip/zip.h
create mode 100644 drivers/crypto/hisilicon/zip/zip_crypto.c
create mode 100644 drivers/crypto/hisilicon/zip/zip_crypto.h
create mode 100644 drivers/crypto/hisilicon/zip/zip_main.c
create mode 100644 drivers/vfio/spimdev/Kconfig
create mode 100644 drivers/vfio/spimdev/Makefile
create mode 100644 drivers/vfio/spimdev/vfio_spimdev.c
create mode 100644 include/linux/vfio_spimdev.h
create mode 100644 include/uapi/linux/vfio_spimdev.h
create mode 100644 samples/warpdrive/AUTHORS
create mode 100644 samples/warpdrive/ChangeLog
create mode 100644 samples/warpdrive/Makefile.am
create mode 100644 samples/warpdrive/NEWS
create mode 100644 samples/warpdrive/README
create mode 100755 samples/warpdrive/autogen.sh
create mode 100755 samples/warpdrive/cleanup.sh
create mode 100644 samples/warpdrive/configure.ac
create mode 100644 samples/warpdrive/drv/hisi_qm_udrv.c
create mode 100644 samples/warpdrive/drv/hisi_qm_udrv.h
create mode 100644 samples/warpdrive/test/Makefile.am
create mode 100644 samples/warpdrive/test/comp_hw.h
create mode 100644 samples/warpdrive/test/test_hisi_zip.c
create mode 100644 samples/warpdrive/wd.c
create mode 100644 samples/warpdrive/wd.h
create mode 100644 samples/warpdrive/wd_adapter.c
create mode 100644 samples/warpdrive/wd_adapter.h

--
2.17.1


2018-08-01 10:22:15

by Kenneth Lee

[permalink] [raw]
Subject: [RFC PATCH 1/7] vfio/spimdev: Add documents for WarpDrive framework

From: Kenneth Lee <[email protected]>

WarpDrive is a common user space accelerator framework. Its main component
in Kernel is called spimdev, Share Parent IOMMU Mediated Device. It exposes
the hardware capabilities to the user space via vfio-mdev. So processes in
user land can obtain a "queue" by open the device and direct access the
hardware MMIO space or do DMA operation via VFIO interface.

WarpDrive is intended to be used with Jean Philippe Brucker's SVA patchset
(it is still in RFC stage) to support multi-process. But This is not a must.
Without the SVA patches, WarpDrive can still work for one process for every
hardware device.

This patch add detail documents for the framework.

Signed-off-by: Kenneth Lee <[email protected]>
---
Documentation/00-INDEX | 2 +
Documentation/warpdrive/warpdrive.rst | 153 ++++++
Documentation/warpdrive/wd-arch.svg | 732 ++++++++++++++++++++++++++
Documentation/warpdrive/wd.svg | 526 ++++++++++++++++++
4 files changed, 1413 insertions(+)
create mode 100644 Documentation/warpdrive/warpdrive.rst
create mode 100644 Documentation/warpdrive/wd-arch.svg
create mode 100644 Documentation/warpdrive/wd.svg

diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index 2754fe83f0d4..9959affab599 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -410,6 +410,8 @@ vm/
- directory with info on the Linux vm code.
w1/
- directory with documents regarding the 1-wire (w1) subsystem.
+warpdrive/
+ - directory with documents about WarpDrive accelerator framework.
watchdog/
- how to auto-reboot Linux if it has "fallen and can't get up". ;-)
wimax/
diff --git a/Documentation/warpdrive/warpdrive.rst b/Documentation/warpdrive/warpdrive.rst
new file mode 100644
index 000000000000..3792b2780ea6
--- /dev/null
+++ b/Documentation/warpdrive/warpdrive.rst
@@ -0,0 +1,153 @@
+Introduction of WarpDrive
+=========================
+
+*WarpDrive* is a general accelerator framework built on top of vfio.
+It can be taken as a light weight virtual function, which you can use without
+*SR-IOV* like facility and can be shared among multiple processes.
+
+It can be used as the quick channel for accelerators, network adaptors or
+other hardware in user space. It can make some implementation simpler. E.g.
+you can reuse most of the *netdev* driver and just share some ring buffer to
+the user space driver for *DPDK* or *ODP*. Or you can combine the RSA
+accelerator with the *netdev* in the user space as a Web reversed proxy, etc.
+
+The name *WarpDrive* is simply a cool and general name meaning the framework
+makes the application faster. In kernel, the framework is called SPIMDEV,
+namely "Share Parent IOMMU Mediated Device".
+
+
+How does it work
+================
+
+*WarpDrive* takes the Hardware Accelerator as a heterogeneous processor which
+can share some load for the CPU:
+
+.. image:: wd.svg
+ :alt: This is a .svg image, if your browser cannot show it,
+ try to download and view it locally
+
+So it provides the capability to the user application to:
+
+1. Send request to the hardware
+2. Share memory with the application and other accelerators
+
+These requirements can be fulfilled by VFIO if the accelerator can serve each
+application with a separated Virtual Function. But a *SR-IOV* like VF (we will
+call it *HVF* hereinafter) design is too heavy for the accelerator which
+service thousands of processes.
+
+And the *HVF* is not good for the scenario that a device keep most of its
+resource but share partial of the function to the user space. E.g. a *NIC*
+works as a *netdev* but share some hardware queues to the user application to
+send packets direct to the hardware.
+
+*VFIO-mdev* can solve some of the problem here. But *VFIO-mdev* has two problem:
+
+1. it cannot make use of its parent device's IOMMU.
+2. it is assumed to be openned only once.
+
+So it will need some add-on for better resource control and let the VFIO
+driver be aware of this.
+
+
+Architecture
+------------
+
+The full *WarpDrive* architecture is represented in the following class
+diagram:
+
+.. image:: wd-arch.svg
+ :alt: This is a .svg image, if your browser cannot show it,
+ try to download and view it locally
+
+The idea is: when a device is probed, it can be registered to the general
+framework, e.g. *netdev* or *crypto*, and the *SPIMDEV* at the same time.
+
+If *SPIMDEV* is registered. A *mdev* creation interface is created. Then the
+system administrator can create a *mdev* in the user space and set its
+parameters via its sysfs interfacev. But not like the other mdev
+implementation, hardware resource will not be allocated until it is opened by
+an application.
+
+With this strategy, the hardware resource can be easily scheduled among
+multiple processes.
+
+
+The user API
+------------
+
+We adopt a polling style interface in the user space: ::
+
+ int wd_request_queue(int container, struct wd_queue *q,
+ const char *mdev)
+ void wd_release_queue(struct wd_queue *q);
+
+ int wd_send(struct wd_queue *q, void *req);
+ int wd_recv(struct wd_queue *q, void **req);
+ int wd_recv_sync(struct wd_queue *q, void **req);
+
+the ..._sync() interface is a wrapper to the non sync version. They wait on the
+device until the queue become available.
+
+Memory can be done by VFIO DMA API. Or the following helper function can be
+adopted: ::
+
+ int wd_mem_share(struct wd_queue *q, const void *addr,
+ size_t size, int flags);
+ void wd_mem_unshare(struct wd_queue *q, const void *addr, size_t size);
+
+Todo: if the IOMMU support *ATS* or *SMMU* stall mode. mem share is not
+necessary. This can be check with SPImdev sysfs interface.
+
+The user API is not mandatory. It is simply a suggestion and hint what the
+kernel interface is supposed to support.
+
+
+The user driver
+---------------
+
+*WarpDrive* expose the hardware IO space to the user process (via *mmap*). So
+it will require user driver for implementing the user API. The following API
+is suggested for a user driver: ::
+
+ int open(struct wd_queue *q);
+ int close(struct wd_queue *q);
+ int send(struct wd_queue *q, void *req);
+ int recv(struct wd_queue *q, void **req);
+
+These callback enable the communication between the user application and the
+device. You will still need the hardware-depend algorithm driver to access the
+algorithm functionality of the accelerator itself.
+
+
+Multiple processes support
+==========================
+
+In the latest mainline kernel (4.18) when this document is written.
+Multi-process is not supported in VFIO yet.
+
+*JPB* has a patchset to enable this[2]_. We have tested it with our hardware
+(which is known as *D06*). It works well. *WarpDrive* rely on them to support
+multiple processes. If it is not enabled, *WarpDrive* can still work, but it
+support only one process, which will share the same io map table with kernel
+(but the user application cannot access the kernel address, So it is not going
+to be a security problem)
+
+
+Legacy Mode Support
+===================
+For the hardware on which IOMMU is not support, WarpDrive can run on *NOIOMMU*
+mode.
+
+
+References
+==========
+.. [1] Accroding to the comment in in mm/gup.c, The *gup* is only safe within
+ a syscall. Because it can only keep the physical memory in place
+ without making sure the VMA will always point to it. Maybe we should
+ raise the VM_PINNED patchset (see
+ https://lists.gt.net/linux/kernel/1931993) again to solve this problem.
+.. [2] https://patchwork.kernel.org/patch/10394851/
+.. [3] https://zhuanlan.zhihu.com/p/35489035
+
+.. vim: tw=78
diff --git a/Documentation/warpdrive/wd-arch.svg b/Documentation/warpdrive/wd-arch.svg
new file mode 100644
index 000000000000..1b3d1817c4ba
--- /dev/null
+++ b/Documentation/warpdrive/wd-arch.svg
@@ -0,0 +1,732 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+
+<svg
+ xmlns:dc="http://purl.org/dc/elements/1.1/"
+ xmlns:cc="http://creativecommons.org/ns#"
+ xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+ xmlns:svg="http://www.w3.org/2000/svg"
+ xmlns="http://www.w3.org/2000/svg"
+ xmlns:xlink="http://www.w3.org/1999/xlink"
+ xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+ xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+ width="210mm"
+ height="193mm"
+ viewBox="0 0 744.09449 683.85823"
+ id="svg2"
+ version="1.1"
+ inkscape:version="0.92.3 (2405546, 2018-03-11)"
+ sodipodi:docname="wd-arch.svg">
+ <defs
+ id="defs4">
+ <linearGradient
+ inkscape:collect="always"
+ id="linearGradient6830">
+ <stop
+ style="stop-color:#000000;stop-opacity:1;"
+ offset="0"
+ id="stop6832" />
+ <stop
+ style="stop-color:#000000;stop-opacity:0;"
+ offset="1"
+ id="stop6834" />
+ </linearGradient>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="translate(-89.949614,405.94594)" />
+ <linearGradient
+ inkscape:collect="always"
+ id="linearGradient5026">
+ <stop
+ style="stop-color:#f2f2f2;stop-opacity:1;"
+ offset="0"
+ id="stop5028" />
+ <stop
+ style="stop-color:#f2f2f2;stop-opacity:0;"
+ offset="1"
+ id="stop5030" />
+ </linearGradient>
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6" />
+ </filter>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-1"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="translate(175.77842,400.29111)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-0"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-9" />
+ </filter>
+ <marker
+ markerWidth="18.960653"
+ markerHeight="11.194658"
+ refX="9.4803267"
+ refY="5.5973287"
+ orient="auto"
+ id="marker4613">
+ <rect
+ y="-5.1589785"
+ x="5.8504119"
+ height="10.317957"
+ width="10.317957"
+ id="rect4212"
+ style="fill:#ffffff;stroke:#000000;stroke-width:0.69143367;stroke-miterlimit:4;stroke-dasharray:none"
+ transform="matrix(0.86111274,0.50841405,-0.86111274,0.50841405,0,0)">
+ <title
+ id="title4262">generation</title>
+ </rect>
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3-9"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(1.2452511,0,0,0.98513016,-190.95632,540.33156)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-5-8"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-3-9" />
+ </filter>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-1">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-9"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3-9-7"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(1.3742742,0,0,0.97786398,-234.52617,654.63367)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-5-8-5"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-3-9-0" />
+ </filter>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-6">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-1"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3-9-4"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(1.3742912,0,0,2.0035845,-468.34428,342.56603)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-5-8-54"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-3-9-7" />
+ </filter>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-1-8">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-9-6"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-1-8-8">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-9-6-9"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-0">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-93"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-0-2">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-93-6"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter5382"
+ x="-0.089695387"
+ width="1.1793908"
+ y="-0.10052069"
+ height="1.2010413">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="0.86758925"
+ id="feGaussianBlur5384" />
+ </filter>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient6830"
+ id="linearGradient6836"
+ x1="362.73923"
+ y1="700.04059"
+ x2="340.4751"
+ y2="678.25488"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="translate(-23.771026,-135.76835)" />
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-6-2">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-1-9"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ </defs>
+ <sodipodi:namedview
+ id="base"
+ pagecolor="#ffffff"
+ bordercolor="#666666"
+ borderopacity="1.0"
+ inkscape:pageopacity="0.0"
+ inkscape:pageshadow="2"
+ inkscape:zoom="0.98994949"
+ inkscape:cx="222.32868"
+ inkscape:cy="370.44492"
+ inkscape:document-units="px"
+ inkscape:current-layer="layer1"
+ showgrid="false"
+ inkscape:window-width="1916"
+ inkscape:window-height="1033"
+ inkscape:window-x="0"
+ inkscape:window-y="22"
+ inkscape:window-maximized="0"
+ fit-margin-right="0.3"
+ inkscape:snap-global="false" />
+ <metadata
+ id="metadata7">
+ <rdf:RDF>
+ <cc:Work
+ rdf:about="">
+ <dc:format>image/svg+xml</dc:format>
+ <dc:type
+ rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
+ <dc:title />
+ </cc:Work>
+ </rdf:RDF>
+ </metadata>
+ <g
+ inkscape:label="Layer 1"
+ inkscape:groupmode="layer"
+ id="layer1"
+ transform="translate(0,-368.50374)">
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3)"
+ id="rect4136-3-6"
+ width="101.07784"
+ height="31.998148"
+ x="283.01144"
+ y="588.80896" />
+ <rect
+ style="fill:url(#linearGradient5032);fill-opacity:1;stroke:#000000;stroke-width:0.6465112"
+ id="rect4136-2"
+ width="101.07784"
+ height="31.998148"
+ x="281.63498"
+ y="586.75739" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="294.21747"
+ y="612.50073"
+ id="text4138-6"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1"
+ x="294.21747"
+ y="612.50073"
+ style="font-size:15px;line-height:1.25">WarpDrive</tspan></text>
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-0)"
+ id="rect4136-3-6-3"
+ width="101.07784"
+ height="31.998148"
+ x="548.7395"
+ y="583.15417" />
+ <rect
+ style="fill:url(#linearGradient5032-1);fill-opacity:1;stroke:#000000;stroke-width:0.6465112"
+ id="rect4136-2-60"
+ width="101.07784"
+ height="31.998148"
+ x="547.36304"
+ y="581.1026" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="557.83484"
+ y="602.32745"
+ id="text4138-6-6"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-2"
+ x="557.83484"
+ y="602.32745"
+ style="font-size:15px;line-height:1.25">user_driver</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4613)"
+ d="m 547.36304,600.78954 -156.58203,0.0691"
+ id="path4855"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8)"
+ id="rect4136-3-6-5-7"
+ width="101.07784"
+ height="31.998148"
+ x="128.74678"
+ y="80.648842"
+ transform="matrix(1.2452511,0,0,0.98513016,113.15182,641.02594)" />
+ <rect
+ style="fill:url(#linearGradient5032-3-9);fill-opacity:1;stroke:#000000;stroke-width:0.71606314"
+ id="rect4136-2-6-3"
+ width="125.86729"
+ height="31.522341"
+ x="271.75983"
+ y="718.45435" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="306.29599"
+ y="746.50073"
+ id="text4138-6-2-6"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-1"
+ x="306.29599"
+ y="746.50073"
+ style="font-size:15px;line-height:1.25">spimdev</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2)"
+ d="m 329.57309,619.72453 5.0373,97.14447"
+ id="path4661-3"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-1)"
+ d="m 342.57219,830.63108 -5.67699,-79.2841"
+ id="path4661-3-4"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-5)"
+ id="rect4136-3-6-5-7-3"
+ width="101.07784"
+ height="31.998148"
+ x="128.74678"
+ y="80.648842"
+ transform="matrix(1.3742742,0,0,0.97786398,101.09126,754.58534)" />
+ <rect
+ style="fill:url(#linearGradient5032-3-9-7);fill-opacity:1;stroke:#000000;stroke-width:0.74946606"
+ id="rect4136-2-6-3-6"
+ width="138.90866"
+ height="31.289837"
+ x="276.13297"
+ y="831.44263" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="295.67819"
+ y="852.98224"
+ id="text4138-6-2-6-1"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-1-0"
+ x="295.67819"
+ y="852.98224"
+ style="font-size:15px;line-height:1.25">Device Driver</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="349.31198"
+ y="829.46118"
+ id="text4138-6-2-6-1-6"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-1-0-3"
+ x="349.31198"
+ y="829.46118"
+ style="font-size:15px;line-height:1.25">*</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="349.98282"
+ y="768.698"
+ id="text4138-6-2-6-1-6-2"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-1-0-3-0"
+ x="349.98282"
+ y="768.698"
+ style="font-size:15px;line-height:1.25">1</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-6)"
+ d="m 568.1238,614.05402 0.51369,333.80219"
+ id="path4661-3-5"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="371.8013"
+ y="664.62476"
+ id="text4138-6-2-6-1-6-2-5"><tspan
+ sodipodi:role="line"
+ x="371.8013"
+ y="664.62476"
+ id="tspan4274"
+ style="font-size:15px;line-height:1.25">&lt;&lt;vfio&gt;&gt;</tspan><tspan
+ sodipodi:role="line"
+ x="371.8013"
+ y="683.37476"
+ id="tspan4305"
+ style="font-size:15px;line-height:1.25">resource management</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="389.92969"
+ y="587.44836"
+ id="text4138-6-2-6-1-6-2-56"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-1-0-3-0-9"
+ x="389.92969"
+ y="587.44836"
+ style="font-size:15px;line-height:1.25">1</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="528.64813"
+ y="600.08429"
+ id="text4138-6-2-6-1-6-3"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-1-0-3-7"
+ x="528.64813"
+ y="600.08429"
+ style="font-size:15px;line-height:1.25">*</tspan></text>
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-54)"
+ id="rect4136-3-6-5-7-4"
+ width="101.07784"
+ height="31.998148"
+ x="128.74678"
+ y="80.648842"
+ transform="matrix(1.3745874,0,0,1.8929066,-132.7754,556.04505)" />
+ <rect
+ style="fill:url(#linearGradient5032-3-9-4);fill-opacity:1;stroke:#000000;stroke-width:1.07280123"
+ id="rect4136-2-6-3-4"
+ width="138.91039"
+ height="64.111"
+ x="42.321312"
+ y="704.8371" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="110.30745"
+ y="722.94025"
+ id="text4138-6-2-6-3"><tspan
+ sodipodi:role="line"
+ x="111.99202"
+ y="722.94025"
+ id="tspan4366"
+ style="font-size:15px;line-height:1.25;text-align:center;text-anchor:middle">other standard </tspan><tspan
+ sodipodi:role="line"
+ x="110.30745"
+ y="741.69025"
+ id="tspan4368"
+ style="font-size:15px;line-height:1.25;text-align:center;text-anchor:middle">framework</tspan><tspan
+ sodipodi:role="line"
+ x="110.30745"
+ y="760.44025"
+ style="font-size:15px;line-height:1.25;text-align:center;text-anchor:middle"
+ id="tspan6840">(crypto/nic/others)</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-1-8)"
+ d="M 276.29661,849.04109 134.04449,771.90853"
+ id="path4661-3-4-8"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="313.70813"
+ y="730.06366"
+ id="text4138-6-2-6-36"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-1-7"
+ x="313.70813"
+ y="730.06366"
+ style="font-size:10px;line-height:1.25">&lt;&lt;lkm&gt;&gt;</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:start;letter-spacing:0px;word-spacing:0px;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="343.81625"
+ y="786.44141"
+ id="text4138-6-2-6-1-6-2-5-7-5"><tspan
+ sodipodi:role="line"
+ x="343.81625"
+ y="786.44141"
+ style="font-size:15px;line-height:1.25;text-align:start;text-anchor:start"
+ id="tspan2278">regist<tspan
+ style="text-align:start;text-anchor:start"
+ id="tspan2280">er as mdev with &quot;share </tspan></tspan><tspan
+ sodipodi:role="line"
+ x="343.81625"
+ y="805.19141"
+ style="font-size:15px;line-height:1.25;text-align:start;text-anchor:start"
+ id="tspan2357"><tspan
+ style="text-align:start;text-anchor:start"
+ id="tspan2359">parent iommu&quot; attribu</tspan>te</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="29.145819"
+ y="833.44244"
+ id="text4138-6-2-6-1-6-2-5-7-5-2"><tspan
+ sodipodi:role="line"
+ x="29.145819"
+ y="833.44244"
+ id="tspan4301"
+ style="font-size:15px;line-height:1.25">register to other subsystem</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="301.20813"
+ y="597.29437"
+ id="text4138-6-2-6-36-1"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-1-7-2"
+ x="301.20813"
+ y="597.29437"
+ style="font-size:10px;line-height:1.25">&lt;&lt;user_lib&gt;&gt;</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="649.09613"
+ y="774.4798"
+ id="text4138-6-2-6-1-6-2-5-3"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-1-0-3-0-4-6"
+ x="649.09613"
+ y="774.4798"
+ style="font-size:15px;line-height:1.25">&lt;&lt;vfio&gt;&gt;</tspan><tspan
+ sodipodi:role="line"
+ x="649.09613"
+ y="793.2298"
+ id="tspan4274-7"
+ style="font-size:15px;line-height:1.25">Hardware Accessing</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="371.01291"
+ y="529.23682"
+ id="text4138-6-2-6-1-6-2-5-36"><tspan
+ sodipodi:role="line"
+ x="371.01291"
+ y="529.23682"
+ id="tspan4305-3"
+ style="font-size:15px;line-height:1.25">wd user api</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ d="m 328.19325,585.87943 0,-23.57142"
+ id="path4348"
+ inkscape:connector-curvature="0" />
+ <ellipse
+ style="opacity:1;fill:#ffffff;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0"
+ id="path4350"
+ cx="328.01468"
+ cy="551.95081"
+ rx="11.607142"
+ ry="10.357142" />
+ <path
+ style="opacity:0.444;fill:url(#linearGradient6836);fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;filter:url(#filter5382)"
+ id="path4350-2"
+ sodipodi:type="arc"
+ sodipodi:cx="329.44327"
+ sodipodi:cy="553.37933"
+ sodipodi:rx="11.607142"
+ sodipodi:ry="10.357142"
+ sodipodi:start="0"
+ sodipodi:end="6.2509098"
+ d="m 341.05041,553.37933 a 11.607142,10.357142 0 0 1 -11.51349,10.35681 11.607142,10.357142 0 0 1 -11.69928,-10.18967 11.607142,10.357142 0 0 1 11.32469,-10.52124 11.607142,10.357142 0 0 1 11.88204,10.01988"
+ sodipodi:open="true" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="543.91455"
+ y="978.22363"
+ id="text4138-6-2-6-1-6-2-5-36-3"><tspan
+ sodipodi:role="line"
+ x="543.91455"
+ y="978.22363"
+ id="tspan4305-3-67"
+ style="font-size:15px;line-height:1.25">Device(Hardware)</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-6-2)"
+ d="m 347.51164,865.4527 153.19752,91.52439"
+ id="path4661-3-5-1"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="343.6398"
+ y="716.47754"
+ id="text4138-6-2-6-1-6-2-5-7-5-2-6"><tspan
+ sodipodi:role="line"
+ x="343.6398"
+ y="716.47754"
+ id="tspan4301-4"
+ style="font-style:italic;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:15px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif Italic';stroke-width:1px">Share Parent's IOMMU mdev</tspan></text>
+ </g>
+</svg>
diff --git a/Documentation/warpdrive/wd.svg b/Documentation/warpdrive/wd.svg
new file mode 100644
index 000000000000..87ab92ebfbc6
--- /dev/null
+++ b/Documentation/warpdrive/wd.svg
@@ -0,0 +1,526 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+
+<svg
+ xmlns:dc="http://purl.org/dc/elements/1.1/"
+ xmlns:cc="http://creativecommons.org/ns#"
+ xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+ xmlns:svg="http://www.w3.org/2000/svg"
+ xmlns="http://www.w3.org/2000/svg"
+ xmlns:xlink="http://www.w3.org/1999/xlink"
+ xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+ xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+ width="210mm"
+ height="116mm"
+ viewBox="0 0 744.09449 411.02338"
+ id="svg2"
+ version="1.1"
+ inkscape:version="0.92.3 (2405546, 2018-03-11)"
+ sodipodi:docname="wd.svg">
+ <defs
+ id="defs4">
+ <linearGradient
+ inkscape:collect="always"
+ id="linearGradient5026">
+ <stop
+ style="stop-color:#f2f2f2;stop-opacity:1;"
+ offset="0"
+ id="stop5028" />
+ <stop
+ style="stop-color:#f2f2f2;stop-opacity:0;"
+ offset="1"
+ id="stop5030" />
+ </linearGradient>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(2.7384117,0,0,0.91666329,-952.8283,571.10143)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-5"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-3" />
+ </filter>
+ <marker
+ markerWidth="18.960653"
+ markerHeight="11.194658"
+ refX="9.4803267"
+ refY="5.5973287"
+ orient="auto"
+ id="marker4613">
+ <rect
+ y="-5.1589785"
+ x="5.8504119"
+ height="10.317957"
+ width="10.317957"
+ id="rect4212"
+ style="fill:#ffffff;stroke:#000000;stroke-width:0.69143367;stroke-miterlimit:4;stroke-dasharray:none"
+ transform="matrix(0.86111274,0.50841405,-0.86111274,0.50841405,0,0)">
+ <title
+ id="title4262">generation</title>
+ </rect>
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3-9"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(1.2452511,0,0,0.98513016,-190.95632,540.33156)" />
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-1">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-9"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-6">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-1"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-1-8">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-9-6"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-1-8-8">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-9-6-9"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-0">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-93"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-0-2">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-93-6"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-6-2">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-1-9"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3-8"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(1.0104674,0,0,1.0052679,-218.642,661.15448)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-5-8"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-3-9" />
+ </filter>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3-8-2"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(2.1450559,0,0,1.0052679,-521.97704,740.76422)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-5-8-5"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-3-9-1" />
+ </filter>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3-8-0"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(1.0104674,0,0,1.0052679,83.456748,660.20747)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-5-8-6"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-3-9-2" />
+ </filter>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3-84"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(1.9884948,0,0,0.94903536,-318.42665,564.37696)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-5-4"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-3-0" />
+ </filter>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-0-0">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-93-8"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-3">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-1"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ </defs>
+ <sodipodi:namedview
+ id="base"
+ pagecolor="#ffffff"
+ bordercolor="#666666"
+ borderopacity="1.0"
+ inkscape:pageopacity="0.0"
+ inkscape:pageshadow="2"
+ inkscape:zoom="0.98994949"
+ inkscape:cx="457.47339"
+ inkscape:cy="250.14781"
+ inkscape:document-units="px"
+ inkscape:current-layer="layer1"
+ showgrid="false"
+ inkscape:window-width="1916"
+ inkscape:window-height="1033"
+ inkscape:window-x="0"
+ inkscape:window-y="22"
+ inkscape:window-maximized="0"
+ fit-margin-right="0.3" />
+ <metadata
+ id="metadata7">
+ <rdf:RDF>
+ <cc:Work
+ rdf:about="">
+ <dc:format>image/svg+xml</dc:format>
+ <dc:type
+ rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
+ <dc:title></dc:title>
+ </cc:Work>
+ </rdf:RDF>
+ </metadata>
+ <g
+ inkscape:label="Layer 1"
+ inkscape:groupmode="layer"
+ id="layer1"
+ transform="translate(0,-641.33861)">
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5)"
+ id="rect4136-3-6-5"
+ width="101.07784"
+ height="31.998148"
+ x="128.74678"
+ y="80.648842"
+ transform="matrix(2.7384116,0,0,0.91666328,-284.06895,664.79751)" />
+ <rect
+ style="fill:url(#linearGradient5032-3);fill-opacity:1;stroke:#000000;stroke-width:1.02430749"
+ id="rect4136-2-6"
+ width="276.79272"
+ height="29.331528"
+ x="64.723419"
+ y="736.84473" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="78.223282"
+ y="756.79803"
+ id="text4138-6-2"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9"
+ x="78.223282"
+ y="756.79803"
+ style="font-size:15px;line-height:1.25">user application (running by the CPU</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6)"
+ d="m 217.67507,876.6738 113.40331,45.0758"
+ id="path4661"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-0)"
+ d="m 208.10197,767.69811 0.29362,76.03656"
+ id="path4661-6"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8)"
+ id="rect4136-3-6-5-3"
+ width="101.07784"
+ height="31.998148"
+ x="128.74678"
+ y="80.648842"
+ transform="matrix(1.0104673,0,0,1.0052679,28.128628,763.90722)" />
+ <rect
+ style="fill:url(#linearGradient5032-3-8);fill-opacity:1;stroke:#000000;stroke-width:0.65159565"
+ id="rect4136-2-6-6"
+ width="102.13586"
+ height="32.16671"
+ x="156.83217"
+ y="842.91852" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="188.58519"
+ y="864.47125"
+ id="text4138-6-2-8"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-0"
+ x="188.58519"
+ y="864.47125"
+ style="font-size:15px;line-height:1.25;stroke-width:1px">MMU</tspan></text>
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-5)"
+ id="rect4136-3-6-5-3-1"
+ width="101.07784"
+ height="31.998148"
+ x="128.74678"
+ y="80.648842"
+ transform="matrix(2.1450556,0,0,1.0052679,1.87637,843.51696)" />
+ <rect
+ style="fill:url(#linearGradient5032-3-8-2);fill-opacity:1;stroke:#000000;stroke-width:0.94937181"
+ id="rect4136-2-6-6-0"
+ width="216.8176"
+ height="32.16671"
+ x="275.09283"
+ y="922.5282" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="347.81482"
+ y="943.23291"
+ id="text4138-6-2-8-8"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-0-5"
+ x="347.81482"
+ y="943.23291"
+ style="font-size:15px;line-height:1.25;stroke-width:1px">Memory</tspan></text>
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-6)"
+ id="rect4136-3-6-5-3-5"
+ width="101.07784"
+ height="31.998148"
+ x="128.74678"
+ y="80.648842"
+ transform="matrix(1.0104673,0,0,1.0052679,330.22737,762.9602)" />
+ <rect
+ style="fill:url(#linearGradient5032-3-8-0);fill-opacity:1;stroke:#000000;stroke-width:0.65159565"
+ id="rect4136-2-6-6-8"
+ width="102.13586"
+ height="32.16671"
+ x="458.93091"
+ y="841.9715" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="490.68393"
+ y="863.52423"
+ id="text4138-6-2-8-6"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-0-2"
+ x="490.68393"
+ y="863.52423"
+ style="font-size:15px;line-height:1.25;stroke-width:1px">IOMMU</tspan></text>
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-4)"
+ id="rect4136-3-6-5-6"
+ width="101.07784"
+ height="31.998148"
+ x="128.74678"
+ y="80.648842"
+ transform="matrix(1.9884947,0,0,0.94903537,167.19229,661.38193)" />
+ <rect
+ style="fill:url(#linearGradient5032-3-84);fill-opacity:1;stroke:#000000;stroke-width:0.88813609"
+ id="rect4136-2-6-2"
+ width="200.99274"
+ height="30.367374"
+ x="420.4675"
+ y="735.97351" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="441.95297"
+ y="755.9068"
+ id="text4138-6-2-9"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-9"
+ x="441.95297"
+ y="755.9068"
+ style="font-size:15px;line-height:1.25;stroke-width:1px">Hardware Accelerator</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-0-0)"
+ d="m 508.2914,766.55885 0.29362,76.03656"
+ id="path4661-6-1"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-3)"
+ d="M 499.70201,876.47297 361.38296,920.80258"
+ id="path4661-1"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ </g>
+</svg>
--
2.17.1

2018-08-01 10:22:16

by Kenneth Lee

[permalink] [raw]
Subject: [RFC PATCH 2/7] iommu: Add share domain interface in iommu for spimdev

From: Kenneth Lee <[email protected]>

This patch add sharing interface for a iommu_group. The new interface:

iommu_group_share_domain()
iommu_group_unshare_domain()

can be used by some virtual iommu_group (such as iommu_group for spimdev)
to share their parent's iommu_group.

When the domain of the group is shared, it cannot be changed before
unshared. In the future, notification can be added if update is required.

Signed-off-by: Kenneth Lee <[email protected]>
---
drivers/iommu/iommu.c | 28 +++++++++++++++++++++++++++-
include/linux/iommu.h | 2 ++
2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 63b37563db7e..a832aafe660d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -54,6 +54,9 @@ struct iommu_group {
int id;
struct iommu_domain *default_domain;
struct iommu_domain *domain;
+ atomic_t domain_shared_ref; /* Number of user of current domain.
+ * The domain cannot be modified if ref > 0
+ */
};

struct group_device {
@@ -353,6 +356,7 @@ struct iommu_group *iommu_group_alloc(void)
return ERR_PTR(ret);
}
group->id = ret;
+ atomic_set(&group->domain_shared_ref, 0);

ret = kobject_init_and_add(&group->kobj, &iommu_group_ktype,
NULL, "%d", group->id);
@@ -482,6 +486,25 @@ int iommu_group_set_name(struct iommu_group *group, const char *name)
}
EXPORT_SYMBOL_GPL(iommu_group_set_name);

+struct iommu_domain *iommu_group_share_domain(struct iommu_group *group)
+{
+ /* the domain can be shared only when the default domain is used */
+ /* todo: more shareable check */
+ if (group->domain != group->default_domain)
+ return ERR_PTR(-EINVAL);
+
+ atomic_inc(&group->domain_shared_ref);
+ return group->domain;
+}
+EXPORT_SYMBOL_GPL(iommu_group_share_domain);
+
+void iommu_group_unshare_domain(struct iommu_group *group)
+{
+ atomic_dec(&group->domain_shared_ref);
+ WARN_ON(atomic_read(&group->domain_shared_ref) < 0);
+}
+EXPORT_SYMBOL_GPL(iommu_group_unshare_domain);
+
static int iommu_group_create_direct_mappings(struct iommu_group *group,
struct device *dev)
{
@@ -1401,7 +1424,8 @@ static int __iommu_attach_group(struct iommu_domain *domain,
{
int ret;

- if (group->default_domain && group->domain != group->default_domain)
+ if ((group->default_domain && group->domain != group->default_domain) ||
+ atomic_read(&group->domain_shared_ref) > 0)
return -EBUSY;

ret = __iommu_group_for_each_dev(group, domain,
@@ -1438,6 +1462,8 @@ static void __iommu_detach_group(struct iommu_domain *domain,
{
int ret;

+ WARN_ON(atomic_read(&group->domain_shared_ref) > 0);
+
if (!group->default_domain) {
__iommu_group_for_each_dev(group, domain,
iommu_group_do_detach_device);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 19938ee6eb31..278d60e3ec39 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -349,6 +349,8 @@ extern int iommu_domain_get_attr(struct iommu_domain *domain, enum iommu_attr,
void *data);
extern int iommu_domain_set_attr(struct iommu_domain *domain, enum iommu_attr,
void *data);
+extern struct iommu_domain *iommu_group_share_domain(struct iommu_group *group);
+extern void iommu_group_unshare_domain(struct iommu_group *group);

/* Window handling function prototypes */
extern int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr,
--
2.17.1

2018-08-01 10:22:17

by Kenneth Lee

[permalink] [raw]
Subject: [RFC PATCH 3/7] vfio: add spimdev support

From: Kenneth Lee <[email protected]>

SPIMDEV is "Share Parent IOMMU Mdev". It is a vfio-mdev. But differ from
the general vfio-mdev:

1. It shares its parent's IOMMU.
2. There is no hardware resource attached to the mdev is created. The
hardware resource (A `queue') is allocated only when the mdev is
opened.

Currently only the vfio type-1 driver is updated to make it to be aware
of.

Signed-off-by: Kenneth Lee <[email protected]>
Signed-off-by: Zaibo Xu <[email protected]>
Signed-off-by: Zhou Wang <[email protected]>
---
drivers/vfio/Kconfig | 1 +
drivers/vfio/Makefile | 1 +
drivers/vfio/spimdev/Kconfig | 10 +
drivers/vfio/spimdev/Makefile | 3 +
drivers/vfio/spimdev/vfio_spimdev.c | 421 ++++++++++++++++++++++++++++
drivers/vfio/vfio_iommu_type1.c | 136 ++++++++-
include/linux/vfio_spimdev.h | 95 +++++++
include/uapi/linux/vfio_spimdev.h | 28 ++
8 files changed, 689 insertions(+), 6 deletions(-)
create mode 100644 drivers/vfio/spimdev/Kconfig
create mode 100644 drivers/vfio/spimdev/Makefile
create mode 100644 drivers/vfio/spimdev/vfio_spimdev.c
create mode 100644 include/linux/vfio_spimdev.h
create mode 100644 include/uapi/linux/vfio_spimdev.h

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index c84333eb5eb5..3719eba72ef1 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -47,4 +47,5 @@ menuconfig VFIO_NOIOMMU
source "drivers/vfio/pci/Kconfig"
source "drivers/vfio/platform/Kconfig"
source "drivers/vfio/mdev/Kconfig"
+source "drivers/vfio/spimdev/Kconfig"
source "virt/lib/Kconfig"
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index de67c4725cce..28f3ef0cdce1 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
obj-$(CONFIG_VFIO_PCI) += pci/
obj-$(CONFIG_VFIO_PLATFORM) += platform/
obj-$(CONFIG_VFIO_MDEV) += mdev/
+obj-$(CONFIG_VFIO_SPIMDEV) += spimdev/
diff --git a/drivers/vfio/spimdev/Kconfig b/drivers/vfio/spimdev/Kconfig
new file mode 100644
index 000000000000..1226301f9d0e
--- /dev/null
+++ b/drivers/vfio/spimdev/Kconfig
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0
+config VFIO_SPIMDEV
+ tristate "Support for Share Parent IOMMU MDEV"
+ depends on VFIO_MDEV_DEVICE
+ help
+ Support for VFIO Share Parent IOMMU MDEV, which enable the kernel to
+ support for the light weight hardware accelerator framework, WrapDrive.
+
+ To compile this as a module, choose M here: the module will be called
+ spimdev.
diff --git a/drivers/vfio/spimdev/Makefile b/drivers/vfio/spimdev/Makefile
new file mode 100644
index 000000000000..d02fb69c37e4
--- /dev/null
+++ b/drivers/vfio/spimdev/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+spimdev-y := spimdev.o
+obj-$(CONFIG_VFIO_SPIMDEV) += vfio_spimdev.o
diff --git a/drivers/vfio/spimdev/vfio_spimdev.c b/drivers/vfio/spimdev/vfio_spimdev.c
new file mode 100644
index 000000000000..1b6910c9d27d
--- /dev/null
+++ b/drivers/vfio/spimdev/vfio_spimdev.c
@@ -0,0 +1,421 @@
+// SPDX-License-Identifier: GPL-2.0+
+#include <linux/anon_inodes.h>
+#include <linux/idr.h>
+#include <linux/module.h>
+#include <linux/poll.h>
+#include <linux/vfio_spimdev.h>
+
+struct spimdev_mdev_state {
+ struct vfio_spimdev *spimdev;
+};
+
+static struct class *spimdev_class;
+static DEFINE_IDR(spimdev_idr);
+
+static int vfio_spimdev_dev_exist(struct device *dev, void *data)
+{
+ return !strcmp(dev_name(dev), dev_name((struct device *)data));
+}
+
+#ifdef CONFIG_IOMMU_SVA
+static bool vfio_spimdev_is_valid_pasid(int pasid)
+{
+ struct mm_struct *mm;
+
+ mm = iommu_sva_find(pasid);
+ if (mm) {
+ mmput(mm);
+ return mm == current->mm;
+ }
+
+ return false;
+}
+#endif
+
+/* Check if the device is a mediated device belongs to vfio_spimdev */
+int vfio_spimdev_is_spimdev(struct device *dev)
+{
+ struct mdev_device *mdev;
+ struct device *pdev;
+
+ mdev = mdev_from_dev(dev);
+ if (!mdev)
+ return 0;
+
+ pdev = mdev_parent_dev(mdev);
+ if (!pdev)
+ return 0;
+
+ return class_for_each_device(spimdev_class, NULL, pdev,
+ vfio_spimdev_dev_exist);
+}
+EXPORT_SYMBOL_GPL(vfio_spimdev_is_spimdev);
+
+struct vfio_spimdev *vfio_spimdev_pdev_spimdev(struct device *dev)
+{
+ struct device *class_dev;
+
+ if (!dev)
+ return ERR_PTR(-EINVAL);
+
+ class_dev = class_find_device(spimdev_class, NULL, dev,
+ (int(*)(struct device *, const void *))vfio_spimdev_dev_exist);
+ if (!class_dev)
+ return ERR_PTR(-ENODEV);
+
+ return container_of(class_dev, struct vfio_spimdev, cls_dev);
+}
+EXPORT_SYMBOL_GPL(vfio_spimdev_pdev_spimdev);
+
+struct vfio_spimdev *mdev_spimdev(struct mdev_device *mdev)
+{
+ struct device *pdev = mdev_parent_dev(mdev);
+
+ return vfio_spimdev_pdev_spimdev(pdev);
+}
+EXPORT_SYMBOL_GPL(mdev_spimdev);
+
+static ssize_t iommu_type_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct vfio_spimdev *spimdev = vfio_spimdev_pdev_spimdev(dev);
+
+ if (!spimdev)
+ return -ENODEV;
+
+ return sprintf(buf, "%d\n", spimdev->iommu_type);
+}
+
+static DEVICE_ATTR_RO(iommu_type);
+
+static ssize_t dma_flag_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct vfio_spimdev *spimdev = vfio_spimdev_pdev_spimdev(dev);
+
+ if (!spimdev)
+ return -ENODEV;
+
+ return sprintf(buf, "%d\n", spimdev->dma_flag);
+}
+
+static DEVICE_ATTR_RO(dma_flag);
+
+/* mdev->dev_attr_groups */
+static struct attribute *vfio_spimdev_attrs[] = {
+ &dev_attr_iommu_type.attr,
+ &dev_attr_dma_flag.attr,
+ NULL,
+};
+static const struct attribute_group vfio_spimdev_group = {
+ .name = VFIO_SPIMDEV_PDEV_ATTRS_GRP_NAME,
+ .attrs = vfio_spimdev_attrs,
+};
+const struct attribute_group *vfio_spimdev_groups[] = {
+ &vfio_spimdev_group,
+ NULL,
+};
+
+/* default attributes for mdev->supported_type_groups, used by registerer*/
+#define MDEV_TYPE_ATTR_RO_EXPORT(name) \
+ MDEV_TYPE_ATTR_RO(name); \
+ EXPORT_SYMBOL_GPL(mdev_type_attr_##name);
+
+#define DEF_SIMPLE_SPIMDEV_ATTR(_name, spimdev_member, format) \
+static ssize_t _name##_show(struct kobject *kobj, struct device *dev, \
+ char *buf) \
+{ \
+ struct vfio_spimdev *spimdev = vfio_spimdev_pdev_spimdev(dev); \
+ if (!spimdev) \
+ return -ENODEV; \
+ return sprintf(buf, format, spimdev->spimdev_member); \
+} \
+MDEV_TYPE_ATTR_RO_EXPORT(_name)
+
+DEF_SIMPLE_SPIMDEV_ATTR(flags, flags, "%d");
+DEF_SIMPLE_SPIMDEV_ATTR(name, name, "%s"); /* this should be algorithm name, */
+ /* but you would not care if you have only one algorithm */
+DEF_SIMPLE_SPIMDEV_ATTR(device_api, api_ver, "%s");
+
+/* this return total queue left, not mdev left */
+static ssize_t
+available_instances_show(struct kobject *kobj, struct device *dev, char *buf)
+{
+ struct vfio_spimdev *spimdev = vfio_spimdev_pdev_spimdev(dev);
+
+ return sprintf(buf, "%d",
+ spimdev->ops->get_available_instances(spimdev));
+}
+MDEV_TYPE_ATTR_RO_EXPORT(available_instances);
+
+static int vfio_spimdev_mdev_create(struct kobject *kobj,
+ struct mdev_device *mdev)
+{
+ struct device *dev = mdev_dev(mdev);
+ struct device *pdev = mdev_parent_dev(mdev);
+ struct spimdev_mdev_state *mdev_state;
+ struct vfio_spimdev *spimdev = mdev_spimdev(mdev);
+
+ if (!spimdev->ops->get_queue)
+ return -ENODEV;
+
+ mdev_state = devm_kzalloc(dev, sizeof(struct spimdev_mdev_state),
+ GFP_KERNEL);
+ if (!mdev_state)
+ return -ENOMEM;
+ mdev_set_drvdata(mdev, mdev_state);
+ mdev_state->spimdev = spimdev;
+ dev->iommu_fwspec = pdev->iommu_fwspec;
+ get_device(pdev);
+ __module_get(spimdev->owner);
+
+ return 0;
+}
+
+static int vfio_spimdev_mdev_remove(struct mdev_device *mdev)
+{
+ struct device *dev = mdev_dev(mdev);
+ struct device *pdev = mdev_parent_dev(mdev);
+ struct vfio_spimdev *spimdev = mdev_spimdev(mdev);
+
+ put_device(pdev);
+ module_put(spimdev->owner);
+ dev->iommu_fwspec = NULL;
+ mdev_set_drvdata(mdev, NULL);
+
+ return 0;
+}
+
+/* Wake up the process who is waiting this queue */
+void vfio_spimdev_wake_up(struct vfio_spimdev_queue *q)
+{
+ wake_up(&q->wait);
+}
+EXPORT_SYMBOL_GPL(vfio_spimdev_wake_up);
+
+static int vfio_spimdev_q_file_open(struct inode *inode, struct file *file)
+{
+ return 0;
+}
+
+static int vfio_spimdev_q_file_release(struct inode *inode, struct file *file)
+{
+ struct vfio_spimdev_queue *q =
+ (struct vfio_spimdev_queue *)file->private_data;
+ struct vfio_spimdev *spimdev = q->spimdev;
+ int ret;
+
+ ret = spimdev->ops->put_queue(q);
+ if (ret) {
+ dev_err(spimdev->dev, "drv put queue fail (%d)!\n", ret);
+ return ret;
+ }
+
+ put_device(mdev_dev(q->mdev));
+
+ return 0;
+}
+
+static long vfio_spimdev_q_file_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct vfio_spimdev_queue *q =
+ (struct vfio_spimdev_queue *)file->private_data;
+ struct vfio_spimdev *spimdev = q->spimdev;
+
+ if (spimdev->ops->ioctl)
+ return spimdev->ops->ioctl(q, cmd, arg);
+
+ dev_err(spimdev->dev, "ioctl cmd (%d) is not supported!\n", cmd);
+
+ return -EINVAL;
+}
+
+static int vfio_spimdev_q_file_mmap(struct file *file,
+ struct vm_area_struct *vma)
+{
+ struct vfio_spimdev_queue *q =
+ (struct vfio_spimdev_queue *)file->private_data;
+ struct vfio_spimdev *spimdev = q->spimdev;
+
+ if (spimdev->ops->mmap)
+ return spimdev->ops->mmap(q, vma);
+
+ dev_err(spimdev->dev, "no driver mmap!\n");
+ return -EINVAL;
+}
+
+static __poll_t vfio_spimdev_q_file_poll(struct file *file, poll_table *wait)
+{
+ struct vfio_spimdev_queue *q =
+ (struct vfio_spimdev_queue *)file->private_data;
+ struct vfio_spimdev *spimdev = q->spimdev;
+
+ poll_wait(file, &q->wait, wait);
+ if (spimdev->ops->is_q_updated(q))
+ return EPOLLIN | EPOLLRDNORM;
+
+ return 0;
+}
+
+static const struct file_operations spimdev_q_file_ops = {
+ .owner = THIS_MODULE,
+ .open = vfio_spimdev_q_file_open,
+ .unlocked_ioctl = vfio_spimdev_q_file_ioctl,
+ .release = vfio_spimdev_q_file_release,
+ .poll = vfio_spimdev_q_file_poll,
+ .mmap = vfio_spimdev_q_file_mmap,
+};
+
+static long vfio_spimdev_mdev_get_queue(struct mdev_device *mdev,
+ struct vfio_spimdev *spimdev, unsigned long arg)
+{
+ struct vfio_spimdev_queue *q;
+ int ret;
+
+#ifdef CONFIG_IOMMU_SVA
+ int pasid = arg;
+
+ if (!vfio_spimdev_is_valid_pasid(pasid))
+ return -EINVAL;
+#endif
+
+ if (!spimdev->ops->get_queue)
+ return -EINVAL;
+
+ ret = spimdev->ops->get_queue(spimdev, arg, &q);
+ if (ret < 0) {
+ dev_err(spimdev->dev, "get_queue failed\n");
+ return -ENODEV;
+ }
+
+ ret = anon_inode_getfd("spimdev_q", &spimdev_q_file_ops,
+ q, O_CLOEXEC | O_RDWR);
+ if (ret < 0) {
+ dev_err(spimdev->dev, "getfd fail %d\n", ret);
+ goto err_with_queue;
+ }
+
+ q->fd = ret;
+ q->spimdev = spimdev;
+ q->mdev = mdev;
+ q->container = arg;
+ init_waitqueue_head(&q->wait);
+ get_device(mdev_dev(mdev));
+
+ return ret;
+
+err_with_queue:
+ spimdev->ops->put_queue(q);
+ return ret;
+}
+
+static long vfio_spimdev_mdev_ioctl(struct mdev_device *mdev, unsigned int cmd,
+ unsigned long arg)
+{
+ struct spimdev_mdev_state *mdev_state;
+ struct vfio_spimdev *spimdev;
+
+ if (!mdev)
+ return -ENODEV;
+
+ mdev_state = mdev_get_drvdata(mdev);
+ if (!mdev_state)
+ return -ENODEV;
+
+ spimdev = mdev_state->spimdev;
+ if (!spimdev)
+ return -ENODEV;
+
+ if (cmd == VFIO_SPIMDEV_CMD_GET_Q)
+ return vfio_spimdev_mdev_get_queue(mdev, spimdev, arg);
+
+ dev_err(spimdev->dev,
+ "%s, ioctl cmd (0x%x) is not supported!\n", __func__, cmd);
+ return -EINVAL;
+}
+
+static void vfio_spimdev_release(struct device *dev) { }
+static void vfio_spimdev_mdev_release(struct mdev_device *mdev) { }
+static int vfio_spimdev_mdev_open(struct mdev_device *mdev) { return 0; }
+
+/**
+ * vfio_spimdev_register - register a spimdev
+ * @spimdev: device structure
+ */
+int vfio_spimdev_register(struct vfio_spimdev *spimdev)
+{
+ int ret;
+ const char *drv_name;
+
+ if (!spimdev->dev)
+ return -ENODEV;
+
+ drv_name = dev_driver_string(spimdev->dev);
+ if (strstr(drv_name, "-")) {
+ pr_err("spimdev: parent driver name cannot include '-'!\n");
+ return -EINVAL;
+ }
+
+ spimdev->dev_id = idr_alloc(&spimdev_idr, spimdev, 0, 0, GFP_KERNEL);
+ if (spimdev->dev_id < 0)
+ return spimdev->dev_id;
+
+ atomic_set(&spimdev->ref, 0);
+ spimdev->cls_dev.parent = spimdev->dev;
+ spimdev->cls_dev.class = spimdev_class;
+ spimdev->cls_dev.release = vfio_spimdev_release;
+ dev_set_name(&spimdev->cls_dev, "%s", dev_name(spimdev->dev));
+ ret = device_register(&spimdev->cls_dev);
+ if (ret)
+ return ret;
+
+ spimdev->mdev_fops.owner = spimdev->owner;
+ spimdev->mdev_fops.dev_attr_groups = vfio_spimdev_groups;
+ WARN_ON(!spimdev->mdev_fops.supported_type_groups);
+ spimdev->mdev_fops.create = vfio_spimdev_mdev_create;
+ spimdev->mdev_fops.remove = vfio_spimdev_mdev_remove;
+ spimdev->mdev_fops.ioctl = vfio_spimdev_mdev_ioctl;
+ spimdev->mdev_fops.open = vfio_spimdev_mdev_open;
+ spimdev->mdev_fops.release = vfio_spimdev_mdev_release;
+
+ ret = mdev_register_device(spimdev->dev, &spimdev->mdev_fops);
+ if (ret)
+ device_unregister(&spimdev->cls_dev);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(vfio_spimdev_register);
+
+/**
+ * vfio_spimdev_unregister - unregisters a spimdev
+ * @spimdev: device to unregister
+ *
+ * Unregister a miscellaneous device that wat previously successully registered
+ * with vfio_spimdev_register().
+ */
+void vfio_spimdev_unregister(struct vfio_spimdev *spimdev)
+{
+ mdev_unregister_device(spimdev->dev);
+ device_unregister(&spimdev->cls_dev);
+}
+EXPORT_SYMBOL_GPL(vfio_spimdev_unregister);
+
+static int __init vfio_spimdev_init(void)
+{
+ spimdev_class = class_create(THIS_MODULE, VFIO_SPIMDEV_CLASS_NAME);
+ return PTR_ERR_OR_ZERO(spimdev_class);
+}
+
+static __exit void vfio_spimdev_exit(void)
+{
+ class_destroy(spimdev_class);
+ idr_destroy(&spimdev_idr);
+}
+
+module_init(vfio_spimdev_init);
+module_exit(vfio_spimdev_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Hisilicon Tech. Co., Ltd.");
+MODULE_DESCRIPTION("VFIO Share Parent's IOMMU Mediated Device");
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3e5b17710a4f..0ec38a17c98c 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -41,6 +41,7 @@
#include <linux/notifier.h>
#include <linux/dma-iommu.h>
#include <linux/irqdomain.h>
+#include <linux/vfio_spimdev.h>

#define DRIVER_VERSION "0.2"
#define DRIVER_AUTHOR "Alex Williamson <[email protected]>"
@@ -89,6 +90,8 @@ struct vfio_dma {
};

struct vfio_group {
+ /* iommu_group of mdev's parent device */
+ struct iommu_group *parent_group;
struct iommu_group *iommu_group;
struct list_head next;
};
@@ -1327,6 +1330,109 @@ static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base)
return ret;
}

+/* return 0 if the device is not spimdev.
+ * return 1 if the device is spimdev, the data will be updated with parent
+ * device's group.
+ * return -errno if other error.
+ */
+static int vfio_spimdev_type(struct device *dev, void *data)
+{
+ struct iommu_group **group = data;
+ struct iommu_group *pgroup;
+ int (*spimdev_mdev)(struct device *dev);
+ struct device *pdev;
+ int ret = 1;
+
+ /* vfio_spimdev module is not configurated */
+ spimdev_mdev = symbol_get(vfio_spimdev_is_spimdev);
+ if (!spimdev_mdev)
+ return 0;
+
+ /* check if it belongs to vfio_spimdev device */
+ if (!spimdev_mdev(dev)) {
+ ret = 0;
+ goto get_exit;
+ }
+
+ pdev = dev->parent;
+ pgroup = iommu_group_get(pdev);
+ if (!pgroup) {
+ ret = -ENODEV;
+ goto get_exit;
+ }
+
+ if (group) {
+ /* check if all parent devices is the same */
+ if (*group && *group != pgroup)
+ ret = -ENODEV;
+ else
+ *group = pgroup;
+ }
+
+ iommu_group_put(pgroup);
+
+get_exit:
+ symbol_put(vfio_spimdev_is_spimdev);
+
+ return ret;
+}
+
+/* return 0 or -errno */
+static int vfio_spimdev_bus(struct device *dev, void *data)
+{
+ struct bus_type **bus = data;
+
+ if (!dev->bus)
+ return -ENODEV;
+
+ /* ensure all devices has the same bus_type */
+ if (*bus && *bus != dev->bus)
+ return -EINVAL;
+
+ *bus = dev->bus;
+ return 0;
+}
+
+/* return 0 means it is not spi group, 1 means it is, or -EXXX for error */
+static int vfio_iommu_type1_attach_spigroup(struct vfio_domain *domain,
+ struct vfio_group *group,
+ struct iommu_group *iommu_group)
+{
+ int ret;
+ struct bus_type *pbus = NULL;
+ struct iommu_group *pgroup = NULL;
+
+ ret = iommu_group_for_each_dev(iommu_group, &pgroup,
+ vfio_spimdev_type);
+ if (ret < 0)
+ goto out;
+ else if (ret > 0) {
+ domain->domain = iommu_group_share_domain(pgroup);
+ if (IS_ERR(domain->domain))
+ goto out;
+ ret = iommu_group_for_each_dev(pgroup, &pbus,
+ vfio_spimdev_bus);
+ if (ret < 0)
+ goto err_with_share_domain;
+
+ if (pbus && iommu_capable(pbus, IOMMU_CAP_CACHE_COHERENCY))
+ domain->prot |= IOMMU_CACHE;
+
+ group->parent_group = pgroup;
+ INIT_LIST_HEAD(&domain->group_list);
+ list_add(&group->next, &domain->group_list);
+
+ return 1;
+ }
+
+ return 0;
+
+err_with_share_domain:
+ iommu_group_unshare_domain(pgroup);
+out:
+ return ret;
+}
+
static int vfio_iommu_type1_attach_group(void *iommu_data,
struct iommu_group *iommu_group)
{
@@ -1335,8 +1441,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
struct vfio_domain *domain, *d;
struct bus_type *bus = NULL, *mdev_bus;
int ret;
- bool resv_msi, msi_remap;
- phys_addr_t resv_msi_base;
+ bool resv_msi = false, msi_remap;
+ phys_addr_t resv_msi_base = 0;

mutex_lock(&iommu->lock);

@@ -1373,6 +1479,14 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
if (mdev_bus) {
if ((bus == mdev_bus) && !iommu_present(bus)) {
symbol_put(mdev_bus_type);
+
+ ret = vfio_iommu_type1_attach_spigroup(domain, group,
+ iommu_group);
+ if (ret < 0)
+ goto out_free;
+ else if (ret > 0)
+ goto replay_check;
+
if (!iommu->external_domain) {
INIT_LIST_HEAD(&domain->group_list);
iommu->external_domain = domain;
@@ -1451,12 +1565,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,

vfio_test_domain_fgsp(domain);

+replay_check:
/* replay mappings on new domains */
ret = vfio_iommu_replay(iommu, domain);
if (ret)
goto out_detach;

- if (resv_msi) {
+ if (!group->parent_group && resv_msi) {
ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
if (ret)
goto out_detach;
@@ -1471,7 +1586,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
out_detach:
iommu_detach_group(domain->domain, iommu_group);
out_domain:
- iommu_domain_free(domain->domain);
+ if (group->parent_group)
+ iommu_group_unshare_domain(group->parent_group);
+ else
+ iommu_domain_free(domain->domain);
out_free:
kfree(domain);
kfree(group);
@@ -1533,6 +1651,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
struct vfio_iommu *iommu = iommu_data;
struct vfio_domain *domain;
struct vfio_group *group;
+ int ret;

mutex_lock(&iommu->lock);

@@ -1560,7 +1679,11 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
if (!group)
continue;

- iommu_detach_group(domain->domain, iommu_group);
+ if (group->parent_group)
+ iommu_group_unshare_domain(group->parent_group);
+ else
+ iommu_detach_group(domain->domain, iommu_group);
+
list_del(&group->next);
kfree(group);
/*
@@ -1577,7 +1700,8 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
else
vfio_iommu_unmap_unpin_reaccount(iommu);
}
- iommu_domain_free(domain->domain);
+ if (!ret)
+ iommu_domain_free(domain->domain);
list_del(&domain->next);
kfree(domain);
}
diff --git a/include/linux/vfio_spimdev.h b/include/linux/vfio_spimdev.h
new file mode 100644
index 000000000000..f7e7d90013e1
--- /dev/null
+++ b/include/linux/vfio_spimdev.h
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+#ifndef __VFIO_SPIMDEV_H
+#define __VFIO_SPIMDEV_H
+
+#include <linux/device.h>
+#include <linux/iommu.h>
+#include <linux/mdev.h>
+#include <linux/vfio.h>
+#include <uapi/linux/vfio_spimdev.h>
+
+struct vfio_spimdev_queue;
+struct vfio_spimdev;
+
+/**
+ * struct vfio_spimdev_ops - WD device operations
+ * @get_queue: get a queue from the device according to algorithm
+ * @put_queue: free a queue to the device
+ * @is_q_updated: check whether the task is finished
+ * @mask_notify: mask the task irq of queue
+ * @mmap: mmap addresses of queue to user space
+ * @reset: reset the WD device
+ * @reset_queue: reset the queue
+ * @ioctl: ioctl for user space users of the queue
+ * @get_available_instances: get numbers of the queue remained
+ */
+struct vfio_spimdev_ops {
+ int (*get_queue)(struct vfio_spimdev *spimdev, unsigned long arg,
+ struct vfio_spimdev_queue **q);
+ int (*put_queue)(struct vfio_spimdev_queue *q);
+ int (*is_q_updated)(struct vfio_spimdev_queue *q);
+ void (*mask_notify)(struct vfio_spimdev_queue *q, int event_mask);
+ int (*mmap)(struct vfio_spimdev_queue *q, struct vm_area_struct *vma);
+ int (*reset)(struct vfio_spimdev *spimdev);
+ int (*reset_queue)(struct vfio_spimdev_queue *q);
+ long (*ioctl)(struct vfio_spimdev_queue *q, unsigned int cmd,
+ unsigned long arg);
+ int (*get_available_instances)(struct vfio_spimdev *spimdev);
+};
+
+struct vfio_spimdev_queue {
+ struct mutex mutex;
+ struct vfio_spimdev *spimdev;
+ int qid;
+ __u32 flags;
+ void *priv;
+ wait_queue_head_t wait;
+ struct mdev_device *mdev;
+ int fd;
+ int container;
+#ifdef CONFIG_IOMMU_SVA
+ int pasid;
+#endif
+};
+
+struct vfio_spimdev {
+ const char *name;
+ int status;
+ atomic_t ref;
+ struct module *owner;
+ const struct vfio_spimdev_ops *ops;
+ struct device *dev;
+ struct device cls_dev;
+ bool is_vf;
+ u32 iommu_type;
+ u32 dma_flag;
+ u32 dev_id;
+ void *priv;
+ int flags;
+ const char *api_ver;
+ struct mdev_parent_ops mdev_fops;
+};
+
+int vfio_spimdev_register(struct vfio_spimdev *spimdev);
+void vfio_spimdev_unregister(struct vfio_spimdev *spimdev);
+void vfio_spimdev_wake_up(struct vfio_spimdev_queue *q);
+int vfio_spimdev_is_spimdev(struct device *dev);
+struct vfio_spimdev *vfio_spimdev_pdev_spimdev(struct device *dev);
+int vfio_spimdev_pasid_pri_check(int pasid);
+int vfio_spimdev_get(struct device *dev);
+int vfio_spimdev_put(struct device *dev);
+struct vfio_spimdev *mdev_spimdev(struct mdev_device *mdev);
+
+extern struct mdev_type_attribute mdev_type_attr_flags;
+extern struct mdev_type_attribute mdev_type_attr_name;
+extern struct mdev_type_attribute mdev_type_attr_device_api;
+extern struct mdev_type_attribute mdev_type_attr_available_instances;
+#define VFIO_SPIMDEV_DEFAULT_MDEV_TYPE_ATTRS \
+ &mdev_type_attr_name.attr, \
+ &mdev_type_attr_device_api.attr, \
+ &mdev_type_attr_available_instances.attr, \
+ &mdev_type_attr_flags.attr
+
+#define _VFIO_SPIMDEV_REGION(vm_pgoff) (vm_pgoff & 0xf)
+
+#endif
diff --git a/include/uapi/linux/vfio_spimdev.h b/include/uapi/linux/vfio_spimdev.h
new file mode 100644
index 000000000000..3435e5c345b4
--- /dev/null
+++ b/include/uapi/linux/vfio_spimdev.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+#ifndef _UAPIVFIO_SPIMDEV_H
+#define _UAPIVFIO_SPIMDEV_H
+
+#include <linux/ioctl.h>
+
+#define VFIO_SPIMDEV_CLASS_NAME "spimdev"
+
+/* Device ATTRs in parent dev SYSFS DIR */
+#define VFIO_SPIMDEV_PDEV_ATTRS_GRP_NAME "params"
+
+/* Parent device attributes */
+#define SPIMDEV_IOMMU_TYPE "iommu_type"
+#define SPIMDEV_DMA_FLAG "dma_flag"
+
+/* Maximum length of algorithm name string */
+#define VFIO_SPIMDEV_ALG_NAME_SIZE 64
+
+/* the bits used in SPIMDEV_DMA_FLAG attributes */
+#define VFIO_SPIMDEV_DMA_INVALID 0
+#define VFIO_SPIMDEV_DMA_SINGLE_PROC_MAP 1
+#define VFIO_SPIMDEV_DMA_MULTI_PROC_MAP 2
+#define VFIO_SPIMDEV_DMA_SVM 4
+#define VFIO_SPIMDEV_DMA_SVM_NO_FAULT 8
+#define VFIO_SPIMDEV_DMA_PHY 16
+
+#define VFIO_SPIMDEV_CMD_GET_Q _IO('W', 1)
+#endif
--
2.17.1

2018-08-01 10:22:18

by Kenneth Lee

[permalink] [raw]
Subject: [RFC PATCH 4/7] crypto: add hisilicon Queue Manager driver

From: Kenneth Lee <[email protected]>

Hisilicon QM is a general IP used by some Hisilicon accelerators. It
provides a general PCIE interface for the CPU and the accelerator to share
a group of queues.

This commit includes a library used by the accelerator driver to access
the QM hardware.

Signed-off-by: Kenneth Lee <[email protected]>
Signed-off-by: Zhou Wang <[email protected]>
Signed-off-by: Hao Fang <[email protected]>
---
drivers/crypto/Kconfig | 2 +
drivers/crypto/Makefile | 1 +
drivers/crypto/hisilicon/Kconfig | 8 +
drivers/crypto/hisilicon/Makefile | 1 +
drivers/crypto/hisilicon/qm.c | 855 ++++++++++++++++++++++++++++++
drivers/crypto/hisilicon/qm.h | 111 ++++
6 files changed, 978 insertions(+)
create mode 100644 drivers/crypto/hisilicon/Kconfig
create mode 100644 drivers/crypto/hisilicon/Makefile
create mode 100644 drivers/crypto/hisilicon/qm.c
create mode 100644 drivers/crypto/hisilicon/qm.h

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 43cccf6aff61..8da1e3170eb4 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -746,4 +746,6 @@ config CRYPTO_DEV_CCREE
cryptographic operations on the system REE.
If unsure say Y.

+source "drivers/crypto/hisilicon/Kconfig"
+
endif # CRYPTO_HW
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index 7ae87b4f6c8d..32e9bf64a42f 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -45,3 +45,4 @@ obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/
obj-$(CONFIG_CRYPTO_DEV_BCM_SPU) += bcm/
obj-$(CONFIG_CRYPTO_DEV_SAFEXCEL) += inside-secure/
obj-$(CONFIG_CRYPTO_DEV_ARTPEC6) += axis/
+obj-$(CONFIG_CRYPTO_DEV_HISILICON) += hisilicon/
diff --git a/drivers/crypto/hisilicon/Kconfig b/drivers/crypto/hisilicon/Kconfig
new file mode 100644
index 000000000000..0dd30f84b90e
--- /dev/null
+++ b/drivers/crypto/hisilicon/Kconfig
@@ -0,0 +1,8 @@
+config CRYPTO_DEV_HISILICON
+ tristate "Support for HISILICON CRYPTO ACCELERATOR"
+ help
+ Enable this to use Hisilicon Hardware Accelerators
+
+config CRYPTO_DEV_HISI_QM
+ tristate
+ depends on ARM64 && PCI
diff --git a/drivers/crypto/hisilicon/Makefile b/drivers/crypto/hisilicon/Makefile
new file mode 100644
index 000000000000..3378afc11703
--- /dev/null
+++ b/drivers/crypto/hisilicon/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_CRYPTO_DEV_HISI_QM) += qm.o
diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
new file mode 100644
index 000000000000..e779bc661500
--- /dev/null
+++ b/drivers/crypto/hisilicon/qm.c
@@ -0,0 +1,855 @@
+// SPDX-License-Identifier: GPL-2.0+
+#include <asm/page.h>
+#include <linux/bitmap.h>
+#include <linux/dma-mapping.h>
+#include <linux/io.h>
+#include <linux/irqreturn.h>
+#include <linux/log2.h>
+#include "qm.h"
+
+#define QM_DEF_Q_NUM 128
+
+/* eq/aeq irq enable */
+#define QM_VF_AEQ_INT_SOURCE 0x0
+#define QM_VF_AEQ_INT_MASK 0x4
+#define QM_VF_EQ_INT_SOURCE 0x8
+#define QM_VF_EQ_INT_MASK 0xc
+
+/* mailbox */
+#define MAILBOX_CMD_SQC 0x0
+#define MAILBOX_CMD_CQC 0x1
+#define MAILBOX_CMD_EQC 0x2
+#define MAILBOX_CMD_AEQC 0x3
+#define MAILBOX_CMD_SQC_BT 0x4
+#define MAILBOX_CMD_CQC_BT 0x5
+
+#define MAILBOX_CMD_SEND_BASE 0x300
+#define MAILBOX_EVENT_SHIFT 8
+#define MAILBOX_STATUS_SHIFT 9
+#define MAILBOX_BUSY_SHIFT 13
+#define MAILBOX_OP_SHIFT 14
+#define MAILBOX_QUEUE_SHIFT 16
+
+/* sqc shift */
+#define SQ_HEAD_SHIFT 0
+#define SQ_TAIL_SHIFI 16
+#define SQ_HOP_NUM_SHIFT 0
+#define SQ_PAGE_SIZE_SHIFT 4
+#define SQ_BUF_SIZE_SHIFT 8
+#define SQ_SQE_SIZE_SHIFT 12
+#define SQ_HEAD_IDX_SIG_SHIFT 0
+#define SQ_TAIL_IDX_SIG_SHIFT 0
+#define SQ_CQN_SHIFT 0
+#define SQ_PRIORITY_SHIFT 0
+#define SQ_ORDERS_SHIFT 4
+#define SQ_TYPE_SHIFT 8
+
+#define SQ_TYPE_MASK 0xf
+
+/* cqc shift */
+#define CQ_HEAD_SHIFT 0
+#define CQ_TAIL_SHIFI 16
+#define CQ_HOP_NUM_SHIFT 0
+#define CQ_PAGE_SIZE_SHIFT 4
+#define CQ_BUF_SIZE_SHIFT 8
+#define CQ_SQE_SIZE_SHIFT 12
+#define CQ_PASID 0
+#define CQ_HEAD_IDX_SIG_SHIFT 0
+#define CQ_TAIL_IDX_SIG_SHIFT 0
+#define CQ_CQN_SHIFT 0
+#define CQ_PRIORITY_SHIFT 16
+#define CQ_ORDERS_SHIFT 0
+#define CQ_TYPE_SHIFT 0
+#define CQ_PHASE_SHIFT 0
+#define CQ_FLAG_SHIFT 1
+
+#define CQC_HEAD_INDEX(cqc) ((cqc)->cq_head)
+#define CQC_PHASE(cqc) (((cqc)->dw6) & 0x1)
+#define CQC_CQ_ADDRESS(cqc) (((u64)((cqc)->cq_base_h) << 32) | \
+ ((cqc)->cq_base_l))
+#define CQC_PHASE_BIT 0x1
+
+/* eqc shift */
+#define MB_EQC_EQE_SHIFT 12
+#define MB_EQC_PHASE_SHIFT 16
+
+#define EQC_HEAD_INDEX(eqc) ((eqc)->eq_head)
+#define EQC_TAIL_INDEX(eqc) ((eqc)->eq_tail)
+#define EQC_PHASE(eqc) ((((eqc)->dw6) >> 16) & 0x1)
+
+#define EQC_PHASE_BIT 0x00010000
+
+/* aeqc shift */
+#define MB_AEQC_AEQE_SHIFT 12
+#define MB_AEQC_PHASE_SHIFT 16
+
+/* cqe shift */
+#define CQE_PHASE(cqe) ((cqe)->w7 & 0x1)
+#define CQE_SQ_NUM(cqe) ((cqe)->sq_num)
+#define CQE_SQ_HEAD_INDEX(cqe) ((cqe)->sq_head)
+
+/* eqe shift */
+#define EQE_PHASE(eqe) (((eqe)->dw0 >> 16) & 0x1)
+#define EQE_CQN(eqe) (((eqe)->dw0) & 0xffff)
+
+#define QM_EQE_CQN_MASK 0xffff
+
+/* doorbell */
+#define DOORBELL_CMD_SQ 0
+#define DOORBELL_CMD_CQ 1
+#define DOORBELL_CMD_EQ 2
+#define DOORBELL_CMD_AEQ 3
+
+#define DOORBELL_CMD_SEND_BASE 0x340
+
+#define QM_MEM_START_INIT 0x100040
+#define QM_MEM_INIT_DONE 0x100044
+#define QM_VFT_CFG_RDY 0x10006c
+#define QM_VFT_CFG_OP_WR 0x100058
+#define QM_VFT_CFG_TYPE 0x10005c
+#define QM_SQC_VFT 0x0
+#define QM_CQC_VFT 0x1
+#define QM_VFT_CFG_ADDRESS 0x100060
+#define QM_VFT_CFG_OP_ENABLE 0x100054
+
+#define QM_VFT_CFG_DATA_L 0x100064
+#define QM_VFT_CFG_DATA_H 0x100068
+#define QM_SQC_VFT_BUF_SIZE (7ULL << 8)
+#define QM_SQC_VFT_SQC_SIZE (5ULL << 12)
+#define QM_SQC_VFT_INDEX_NUMBER (1ULL << 16)
+#define QM_SQC_VFT_BT_INDEX_SHIFT 22
+#define QM_SQC_VFT_START_SQN_SHIFT 28
+#define QM_SQC_VFT_VALID (1ULL << 44)
+#define QM_CQC_VFT_BUF_SIZE (7ULL << 8)
+#define QM_CQC_VFT_SQC_SIZE (5ULL << 12)
+#define QM_CQC_VFT_INDEX_NUMBER (1ULL << 16)
+#define QM_CQC_VFT_BT_INDEX_SHIFT 22
+#define QM_CQC_VFT_VALID (1ULL << 28)
+
+struct cqe {
+ __le32 rsvd0;
+ __le16 cmd_id;
+ __le16 rsvd1;
+ __le16 sq_head;
+ __le16 sq_num;
+ __le16 rsvd2;
+ __le16 w7;
+};
+
+struct eqe {
+ __le32 dw0;
+};
+
+struct aeqe {
+ __le32 dw0;
+};
+
+struct sqc {
+ __le16 head;
+ __le16 tail;
+ __le32 base_l;
+ __le32 base_h;
+ __le32 dw3;
+ __le16 qes;
+ __le16 rsvd0;
+ __le16 pasid;
+ __le16 w11;
+ __le16 cq_num;
+ __le16 w13;
+ __le32 rsvd1;
+};
+
+struct cqc {
+ __le16 head;
+ __le16 tail;
+ __le32 base_l;
+ __le32 base_h;
+ __le32 dw3;
+ __le16 qes;
+ __le16 rsvd0;
+ __le16 pasid;
+ __le16 w11;
+ __le32 dw6;
+ __le32 rsvd1;
+};
+
+#define INIT_QC(qc, base) do { \
+ (qc)->head = 0; \
+ (qc)->tail = 0; \
+ (qc)->base_l = lower_32_bits(base); \
+ (qc)->base_h = upper_32_bits(base); \
+ (qc)->pasid = 0; \
+ (qc)->w11 = 0; \
+ (qc)->rsvd1 = 0; \
+ (qc)->qes = QM_Q_DEPTH - 1; \
+} while (0)
+
+struct eqc {
+ __le16 head;
+ __le16 tail;
+ __le32 base_l;
+ __le32 base_h;
+ __le32 dw3;
+ __le32 rsvd[2];
+ __le32 dw6;
+};
+
+struct aeqc {
+ __le16 head;
+ __le16 tail;
+ __le32 base_l;
+ __le32 base_h;
+ __le32 rsvd[3];
+ __le32 dw6;
+};
+
+struct mailbox {
+ __le16 w0;
+ __le16 queue_num;
+ __le32 base_l;
+ __le32 base_h;
+ __le32 rsvd;
+};
+
+struct doorbell {
+ __le16 queue_num;
+ __le16 cmd;
+ __le16 index;
+ __le16 priority;
+};
+
+#define QM_DMA_BUF(p, buf) ((struct buf *)(p)->buf.addr)
+#define QM_SQC(p) QM_DMA_BUF(p, sqc)
+#define QM_CQC(p) QM_DMA_BUF(p, cqc)
+#define QM_EQC(p) QM_DMA_BUF(p, eqc)
+#define QM_EQE(p) QM_DMA_BUF(p, eqe)
+#define QM_AEQC(p) QM_DMA_BUF(p, aeqc)
+#define QM_AEQE(p) QM_DMA_BUF(p, aeqe)
+
+#define QP_SQE_DMA(qp) ((qp)->scqe.dma)
+#define QP_CQE(qp) ((struct cqe *)((qp)->scqe.addr + \
+ qp->qm->sqe_size * QM_Q_DEPTH))
+#define QP_CQE_DMA(qp) ((qp)->scqe.dma + qp->qm->sqe_size * QM_Q_DEPTH)
+
+static inline void qm_writel(struct qm_info *qm, u32 val, u32 offset)
+{
+ writel(val, qm->io_base + offset);
+}
+
+struct qm_info;
+
+struct hisi_acc_qm_hw_ops {
+ int (*vft_config)(struct qm_info *qm, u16 base, u32 number);
+};
+
+static inline int hacc_qm_mb_is_busy(struct qm_info *qm)
+{
+ u32 val;
+
+ return readl_relaxed_poll_timeout(QM_ADDR(qm, MAILBOX_CMD_SEND_BASE),
+ val, !((val >> MAILBOX_BUSY_SHIFT) & 0x1), 10, 1000);
+}
+
+static inline void qm_mb_write(struct qm_info *qm, void *src)
+{
+ void __iomem *fun_base = QM_ADDR(qm, MAILBOX_CMD_SEND_BASE);
+ unsigned long tmp0 = 0, tmp1 = 0;
+
+ asm volatile("ldp %0, %1, %3\n"
+ "stp %0, %1, %2\n"
+ "dsb sy\n"
+ : "=&r" (tmp0),
+ "=&r" (tmp1),
+ "+Q" (*((char *)fun_base))
+ : "Q" (*((char *)src))
+ : "memory");
+}
+
+static int qm_mb(struct qm_info *qm, u8 cmd, u64 phys_addr, u16 queue,
+ bool op, bool event)
+{
+ struct mailbox mailbox;
+ int i = 0;
+ int ret = 0;
+
+ memset(&mailbox, 0, sizeof(struct mailbox));
+
+ mailbox.w0 = cmd |
+ (event ? 0x1 << MAILBOX_EVENT_SHIFT : 0) |
+ (op ? 0x1 << MAILBOX_OP_SHIFT : 0) |
+ (0x1 << MAILBOX_BUSY_SHIFT);
+ mailbox.queue_num = queue;
+ mailbox.base_l = lower_32_bits(phys_addr);
+ mailbox.base_h = upper_32_bits(phys_addr);
+ mailbox.rsvd = 0;
+
+ mutex_lock(&qm->mailbox_lock);
+
+ while (hacc_qm_mb_is_busy(qm) && i < 10)
+ i++;
+ if (i >= 10) {
+ ret = -EBUSY;
+ dev_err(&qm->pdev->dev, "QM mail box is busy!");
+ goto busy_unlock;
+ }
+ qm_mb_write(qm, &mailbox);
+ i = 0;
+ while (hacc_qm_mb_is_busy(qm) && i < 10)
+ i++;
+ if (i >= 10) {
+ ret = -EBUSY;
+ dev_err(&qm->pdev->dev, "QM mail box is still busy!");
+ goto busy_unlock;
+ }
+
+busy_unlock:
+ mutex_unlock(&qm->mailbox_lock);
+
+ return ret;
+}
+
+static int qm_db(struct qm_info *qm, u16 qn, u8 cmd, u16 index, u8 priority)
+{
+ u64 doorbell = 0;
+
+ doorbell = (u64)qn | ((u64)cmd << 16);
+ doorbell |= ((u64)index | ((u64)priority << 16)) << 32;
+
+ writeq(doorbell, QM_ADDR(qm, DOORBELL_CMD_SEND_BASE));
+
+ return 0;
+}
+
+/* @return 0 - cq/eq event, 1 - async event, 2 - abnormal error */
+static u32 qm_get_irq_source(struct qm_info *qm)
+{
+ return readl(QM_ADDR(qm, QM_VF_EQ_INT_SOURCE));
+}
+
+static inline struct hisi_qp *to_hisi_qp(struct qm_info *qm, struct eqe *eqe)
+{
+ u16 cqn = eqe->dw0 & QM_EQE_CQN_MASK;
+ struct hisi_qp *qp;
+
+ read_lock(&qm->qps_lock);
+ qp = qm->qp_array[cqn];
+ read_unlock(&qm->qps_lock);
+
+ return qm->qp_array[cqn];
+}
+
+static inline void qm_cq_head_update(struct hisi_qp *qp)
+{
+ if (qp->qp_status.cq_head == QM_Q_DEPTH - 1) {
+ QM_CQC(qp)->dw6 = QM_CQC(qp)->dw6 ^ CQC_PHASE_BIT;
+ qp->qp_status.cq_head = 0;
+ } else {
+ qp->qp_status.cq_head++;
+ }
+}
+
+static inline void qm_poll_qp(struct hisi_qp *qp, struct qm_info *qm)
+{
+ struct cqe *cqe;
+
+ cqe = QP_CQE(qp) + qp->qp_status.cq_head;
+
+ if (qp->req_cb) {
+ while (CQE_PHASE(cqe) == CQC_PHASE(QM_CQC(qp))) {
+ dma_rmb();
+ qp->req_cb(qp, QP_SQE_ADDR(qp) +
+ qm->sqe_size *
+ CQE_SQ_HEAD_INDEX(cqe));
+ qm_cq_head_update(qp);
+ cqe = QP_CQE(qp) + qp->qp_status.cq_head;
+ }
+ } else if (qp->event_cb) {
+ qp->event_cb(qp);
+ qm_cq_head_update(qp);
+ cqe = QP_CQE(qp) + qp->qp_status.cq_head;
+ }
+
+ qm_db(qm, qp->queue_id, DOORBELL_CMD_CQ,
+ qp->qp_status.cq_head, 0);
+
+ /* set c_flag */
+ qm_db(qm, qp->queue_id, DOORBELL_CMD_CQ,
+ qp->qp_status.cq_head, 1);
+}
+
+static irqreturn_t qm_irq_thread(int irq, void *data)
+{
+ struct qm_info *qm = data;
+ struct eqe *eqe = QM_EQE(qm) + qm->eq_head;
+ struct eqc *eqc = QM_EQC(qm);
+ struct hisi_qp *qp;
+
+ while (EQE_PHASE(eqe) == EQC_PHASE(eqc)) {
+ qp = to_hisi_qp(qm, eqe);
+ if (qp) {
+ qm_poll_qp(qp, qm);
+ }
+
+ if (qm->eq_head == QM_Q_DEPTH - 1) {
+ eqc->dw6 = eqc->dw6 ^ EQC_PHASE_BIT;
+ eqe = QM_EQE(qm);
+ qm->eq_head = 0;
+ } else {
+ eqe++;
+ qm->eq_head++;
+ }
+ }
+
+ qm_db(qm, 0, DOORBELL_CMD_EQ, qm->eq_head, 0);
+
+ return IRQ_HANDLED;
+}
+
+static void qm_init_qp_status(struct hisi_qp *qp)
+{
+ struct hisi_acc_qp_status *qp_status = &qp->qp_status;
+
+ qp_status->sq_tail = 0;
+ qp_status->sq_head = 0;
+ qp_status->cq_head = 0;
+ qp_status->sqn = 0;
+ qp_status->cqc_phase = 1;
+ qp_status->is_sq_full = 0;
+}
+
+/* check if bit in regs is 1 */
+static inline int qm_acc_check(struct qm_info *qm, u32 offset, u32 bit)
+{
+ int val;
+
+ return readl_relaxed_poll_timeout(QM_ADDR(qm, offset), val,
+ val & BIT(bit), 10, 1000);
+}
+
+static inline int qm_init_q_buffer(struct device *dev, size_t size,
+ struct qm_dma_buffer *db)
+{
+ db->size = size;
+ db->addr = dma_alloc_coherent(dev, size, &db->dma, GFP_KERNEL);
+ if (!db->addr)
+ return -ENOMEM;
+ memset(db->addr, 0, size);
+ return 0;
+}
+
+static inline void qm_uninit_q_buffer(struct device *dev,
+ struct qm_dma_buffer *db)
+{
+ dma_free_coherent(dev, db->size, db->addr, db->dma);
+}
+
+static inline int qm_init_bt(struct qm_info *qm, struct device *dev,
+ size_t size, struct qm_dma_buffer *db, int mb_cmd)
+{
+ int ret;
+
+ ret = qm_init_q_buffer(dev, size, db);
+ if (ret)
+ return -ENOMEM;
+
+ ret = qm_mb(qm, mb_cmd, db->dma, 0, 0, 0);
+ if (ret) {
+ qm_uninit_q_buffer(dev, db);
+ return ret;
+ }
+
+ return 0;
+}
+
+/* the config should be conducted after hisi_acc_init_qm_mem() */
+static int qm_vft_common_config(struct qm_info *qm, u16 base, u32 number)
+{
+ u64 tmp;
+ int ret;
+
+ ret = qm_acc_check(qm, QM_VFT_CFG_RDY, 0);
+ if (ret)
+ return ret;
+ qm_writel(qm, 0x0, QM_VFT_CFG_OP_WR);
+ qm_writel(qm, QM_SQC_VFT, QM_VFT_CFG_TYPE);
+ qm_writel(qm, qm->pdev->devfn, QM_VFT_CFG_ADDRESS);
+
+ tmp = QM_SQC_VFT_BUF_SIZE |
+ QM_SQC_VFT_SQC_SIZE |
+ QM_SQC_VFT_INDEX_NUMBER |
+ QM_SQC_VFT_VALID |
+ (u64)base << QM_SQC_VFT_START_SQN_SHIFT;
+
+ qm_writel(qm, tmp & 0xffffffff, QM_VFT_CFG_DATA_L);
+ qm_writel(qm, tmp >> 32, QM_VFT_CFG_DATA_H);
+
+ qm_writel(qm, 0x0, QM_VFT_CFG_RDY);
+ qm_writel(qm, 0x1, QM_VFT_CFG_OP_ENABLE);
+ ret = qm_acc_check(qm, QM_VFT_CFG_RDY, 0);
+ if (ret)
+ return ret;
+ tmp = 0;
+
+ qm_writel(qm, 0x0, QM_VFT_CFG_OP_WR);
+ qm_writel(qm, QM_CQC_VFT, QM_VFT_CFG_TYPE);
+ qm_writel(qm, qm->pdev->devfn, QM_VFT_CFG_ADDRESS);
+
+ tmp = QM_CQC_VFT_BUF_SIZE |
+ QM_CQC_VFT_SQC_SIZE |
+ QM_CQC_VFT_INDEX_NUMBER |
+ QM_CQC_VFT_VALID;
+
+ qm_writel(qm, tmp & 0xffffffff, QM_VFT_CFG_DATA_L);
+ qm_writel(qm, tmp >> 32, QM_VFT_CFG_DATA_H);
+
+ qm_writel(qm, 0x0, QM_VFT_CFG_RDY);
+ qm_writel(qm, 0x1, QM_VFT_CFG_OP_ENABLE);
+ ret = qm_acc_check(qm, QM_VFT_CFG_RDY, 0);
+ if (ret)
+ return ret;
+ return 0;
+}
+
+static struct hisi_acc_qm_hw_ops qm_hw_ops_v1 = {
+ .vft_config = qm_vft_common_config,
+};
+
+struct hisi_qp *hisi_qm_create_qp(struct qm_info *qm, u8 alg_type)
+{
+ struct hisi_qp *qp;
+ int qp_index;
+ int ret;
+
+ write_lock(&qm->qps_lock);
+ qp_index = find_first_zero_bit(qm->qp_bitmap, qm->qp_num);
+ if (qp_index >= qm->qp_num) {
+ write_unlock(&qm->qps_lock);
+ return ERR_PTR(-EBUSY);
+ }
+ set_bit(qp_index, qm->qp_bitmap);
+
+ qp = kzalloc(sizeof(*qp), GFP_KERNEL);
+ if (!qp) {
+ ret = -ENOMEM;
+ write_unlock(&qm->qps_lock);
+ goto err_with_bitset;
+ }
+
+ qp->queue_id = qp_index;
+ qp->qm = qm;
+ qp->alg_type = alg_type;
+ qm_init_qp_status(qp);
+
+ write_unlock(&qm->qps_lock);
+ return qp;
+
+err_with_bitset:
+ clear_bit(qp_index, qm->qp_bitmap);
+ write_unlock(&qm->qps_lock);
+
+ return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_GPL(hisi_qm_create_qp);
+
+int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long arg)
+{
+ struct qm_info *qm = qp->qm;
+ struct device *dev = &qm->pdev->dev;
+ int ret;
+ struct sqc *sqc;
+ struct cqc *cqc;
+ int qp_index = qp->queue_id;
+ int pasid = arg;
+
+ /* set sq and cq context */
+ qp->sqc.addr = QM_SQC(qm) + qp_index;
+ qp->sqc.dma = qm->sqc.dma + qp_index * sizeof(struct sqc);
+ sqc = QM_SQC(qp);
+
+ qp->cqc.addr = QM_CQC(qm) + qp_index;
+ qp->cqc.dma = qm->cqc.dma + qp_index * sizeof(struct cqc);
+ cqc = QM_CQC(qp);
+
+ /* allocate sq and cq */
+ ret = qm_init_q_buffer(dev,
+ qm->sqe_size * QM_Q_DEPTH + sizeof(struct cqe) * QM_Q_DEPTH,
+ &qp->scqe);
+ if (ret)
+ return ret;
+
+ INIT_QC(sqc, qp->scqe.dma);
+ sqc->pasid = pasid;
+ sqc->dw3 = (0 << SQ_HOP_NUM_SHIFT) |
+ (0 << SQ_PAGE_SIZE_SHIFT) |
+ (0 << SQ_BUF_SIZE_SHIFT) |
+ (ilog2(qm->sqe_size) << SQ_SQE_SIZE_SHIFT);
+ sqc->cq_num = qp_index;
+ sqc->w13 = 0 << SQ_PRIORITY_SHIFT |
+ 1 << SQ_ORDERS_SHIFT |
+ (qp->alg_type & SQ_TYPE_MASK) << SQ_TYPE_SHIFT;
+
+ ret = qm_mb(qm, MAILBOX_CMD_SQC, qp->sqc.dma, qp_index, 0, 0);
+ if (ret)
+ return ret;
+
+ INIT_QC(cqc, qp->scqe.dma + qm->sqe_size * QM_Q_DEPTH);
+ cqc->dw3 = (0 << CQ_HOP_NUM_SHIFT) |
+ (0 << CQ_PAGE_SIZE_SHIFT) |
+ (0 << CQ_BUF_SIZE_SHIFT) |
+ (4 << CQ_SQE_SIZE_SHIFT);
+ cqc->dw6 = 1 << CQ_PHASE_SHIFT | 1 << CQ_FLAG_SHIFT;
+
+ ret = qm_mb(qm, MAILBOX_CMD_CQC, qp->cqc.dma, qp_index, 0, 0);
+ if (ret)
+ return ret;
+
+ write_lock(&qm->qps_lock);
+ qm->qp_array[qp_index] = qp;
+ init_completion(&qp->completion);
+ write_unlock(&qm->qps_lock);
+
+ return qp_index;
+}
+EXPORT_SYMBOL_GPL(hisi_qm_start_qp);
+
+void hisi_qm_release_qp(struct hisi_qp *qp)
+{
+ struct qm_info *qm = qp->qm;
+ struct device *dev = &qm->pdev->dev;
+ int qid = qp->queue_id;
+
+ write_lock(&qm->qps_lock);
+ qm->qp_array[qp->queue_id] = NULL;
+ write_unlock(&qm->qps_lock);
+
+ qm_uninit_q_buffer(dev, &qp->scqe);
+ kfree(qp);
+ bitmap_clear(qm->qp_bitmap, qid, 1);
+}
+EXPORT_SYMBOL_GPL(hisi_qm_release_qp);
+
+static void *qm_get_avail_sqe(struct hisi_qp *qp)
+{
+ struct hisi_acc_qp_status *qp_status = &qp->qp_status;
+ void *sq_base = QP_SQE_ADDR(qp);
+ u16 sq_tail = qp_status->sq_tail;
+
+ if (qp_status->is_sq_full == 1)
+ return NULL;
+
+ return sq_base + sq_tail * qp->qm->sqe_size;
+}
+
+int hisi_qp_send(struct hisi_qp *qp, void *msg)
+{
+ struct hisi_acc_qp_status *qp_status = &qp->qp_status;
+ u16 sq_tail = qp_status->sq_tail;
+ u16 sq_tail_next = (sq_tail + 1) % QM_Q_DEPTH;
+ unsigned long timeout = 100;
+ void *sqe = qm_get_avail_sqe(qp);
+
+ if (sqe == NULL)
+ return -1;
+
+ memcpy(sqe, msg, qp->qm->sqe_size);
+
+ qm_db(qp->qm, qp->queue_id, DOORBELL_CMD_SQ, sq_tail_next, 0);
+
+ /* wait until job finished */
+ wait_for_completion_timeout(&qp->completion, timeout);
+
+ qp_status->sq_tail = sq_tail_next;
+
+ if (qp_status->sq_tail == qp_status->sq_head)
+ qp_status->is_sq_full = 1;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(hisi_qp_send);
+
+int hisi_qm_init(const char *dev_name, struct qm_info *qm)
+{
+ int ret;
+ u16 ecam_val16;
+ struct pci_dev *pdev = qm->pdev;
+
+ pci_set_power_state(pdev, PCI_D0);
+ ecam_val16 = PCI_COMMAND_MASTER | PCI_COMMAND_MEMORY;
+ pci_write_config_word(pdev, PCI_COMMAND, ecam_val16);
+
+ ret = pci_enable_device_mem(pdev);
+ if (ret < 0) {
+ dev_err(&pdev->dev, "Can't enable device mem!\n");
+ return ret;
+ }
+
+ ret = pci_request_mem_regions(pdev, dev_name);
+ if (ret < 0) {
+ dev_err(&pdev->dev, "Can't request mem regions!\n");
+ goto err_with_pcidev;
+ }
+
+ qm->dev_name = dev_name;
+ qm->phys_base = pci_resource_start(pdev, 2);
+ qm->size = pci_resource_len(qm->pdev, 2);
+ qm->io_base = devm_ioremap(&pdev->dev, qm->phys_base, qm->size);
+ if (!qm->io_base) {
+ ret = -EIO;
+ goto err_with_mem_regions;
+ }
+
+ dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
+ pci_set_master(pdev);
+
+ ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSI);
+ if (ret < 0) {
+ dev_err(&pdev->dev, "Enable MSI vectors fail!\n");
+ goto err_with_mem_regions;
+ }
+
+ qm->eq_head = 0;
+ mutex_init(&qm->mailbox_lock);
+ rwlock_init(&qm->qps_lock);
+
+ if (qm->ver)
+ qm->ops = &qm_hw_ops_v1;
+
+ return 0;
+
+err_with_mem_regions:
+ pci_release_mem_regions(pdev);
+err_with_pcidev:
+ pci_disable_device(pdev);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(hisi_qm_init);
+
+void hisi_qm_uninit(struct qm_info *qm)
+{
+ struct pci_dev *pdev = qm->pdev;
+
+ pci_free_irq_vectors(pdev);
+ pci_release_mem_regions(pdev);
+ pci_disable_device(pdev);
+}
+EXPORT_SYMBOL_GPL(hisi_qm_uninit);
+
+static irqreturn_t qm_irq(int irq, void *data)
+{
+ struct qm_info *qm = data;
+ u32 int_source;
+
+ int_source = qm_get_irq_source(qm);
+ if (int_source)
+ return IRQ_WAKE_THREAD;
+
+ dev_err(&qm->pdev->dev, "invalid int source %d\n", int_source);
+
+ return IRQ_HANDLED;
+}
+
+int hisi_qm_start(struct qm_info *qm)
+{
+ struct pci_dev *pdev = qm->pdev;
+ struct device *dev = &pdev->dev;
+ int ret;
+
+ if (qm->pdev->is_physfn)
+ qm->ops->vft_config(qm, qm->qp_base, qm->qp_num);
+
+ ret = qm_init_q_buffer(dev,
+ max_t(size_t, sizeof(struct eqc), sizeof(struct aeqc)),
+ &qm->eqc);
+ if (ret)
+ goto err_out;
+
+ ret = qm_init_q_buffer(dev, sizeof(struct eqe) * QM_Q_DEPTH, &qm->eqe);
+ if (ret)
+ goto err_with_eqc;
+
+ QM_EQC(qm)->base_l = lower_32_bits(qm->eqe.dma);
+ QM_EQC(qm)->base_h = upper_32_bits(qm->eqe.dma);
+ QM_EQC(qm)->dw3 = 2 << MB_EQC_EQE_SHIFT;
+ QM_EQC(qm)->dw6 = (QM_Q_DEPTH - 1) | (1 << MB_EQC_PHASE_SHIFT);
+ ret = qm_mb(qm, MAILBOX_CMD_EQC, qm->eqc.dma, 0, 0, 0);
+ if (ret)
+ goto err_with_eqe;
+
+ qm->qp_bitmap = kcalloc(BITS_TO_LONGS(qm->qp_num), sizeof(long),
+ GFP_KERNEL);
+ if (!qm->qp_bitmap)
+ goto err_with_eqe;
+
+ qm->qp_array = kcalloc(qm->qp_num, sizeof(struct hisi_qp *),
+ GFP_KERNEL);
+ if (!qm->qp_array)
+ goto err_with_bitmap;
+
+ /* Init sqc_bt */
+ ret = qm_init_bt(qm, dev, sizeof(struct sqc) * qm->qp_num, &qm->sqc,
+ MAILBOX_CMD_SQC_BT);
+ if (ret)
+ goto err_with_qp_array;
+
+ /* Init cqc_bt */
+ ret = qm_init_bt(qm, dev, sizeof(struct cqc) * qm->qp_num, &qm->cqc,
+ MAILBOX_CMD_CQC_BT);
+ if (ret)
+ goto err_with_sqc;
+
+ ret = request_threaded_irq(pci_irq_vector(pdev, 0), qm_irq,
+ qm_irq_thread, IRQF_SHARED, qm->dev_name,
+ qm);
+ if (ret)
+ goto err_with_cqc;
+
+ writel(0x0, QM_ADDR(qm, QM_VF_EQ_INT_MASK));
+
+ return 0;
+
+err_with_cqc:
+ qm_uninit_q_buffer(dev, &qm->cqc);
+err_with_sqc:
+ qm_uninit_q_buffer(dev, &qm->sqc);
+err_with_qp_array:
+ kfree(qm->qp_array);
+err_with_bitmap:
+ kfree(qm->qp_bitmap);
+err_with_eqe:
+ qm_uninit_q_buffer(dev, &qm->eqe);
+err_with_eqc:
+ qm_uninit_q_buffer(dev, &qm->eqc);
+err_out:
+ return ret;
+}
+EXPORT_SYMBOL_GPL(hisi_qm_start);
+
+void hisi_qm_stop(struct qm_info *qm)
+{
+ struct pci_dev *pdev = qm->pdev;
+ struct device *dev = &pdev->dev;
+
+ free_irq(pci_irq_vector(pdev, 0), qm);
+ qm_uninit_q_buffer(dev, &qm->cqc);
+ kfree(qm->qp_array);
+ kfree(qm->qp_bitmap);
+ qm_uninit_q_buffer(dev, &qm->eqe);
+ qm_uninit_q_buffer(dev, &qm->eqc);
+}
+EXPORT_SYMBOL_GPL(hisi_qm_stop);
+
+/* put qm into init state, so the acce config become available */
+int hisi_qm_mem_start(struct qm_info *qm)
+{
+ u32 val;
+
+ qm_writel(qm, 0x1, QM_MEM_START_INIT);
+ return readl_relaxed_poll_timeout(QM_ADDR(qm, QM_MEM_INIT_DONE), val,
+ val & BIT(0), 10, 1000);
+}
+EXPORT_SYMBOL_GPL(hisi_qm_mem_start);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Zhou Wang <[email protected]>");
+MODULE_DESCRIPTION("HiSilicon Accelerator queue manager driver");
diff --git a/drivers/crypto/hisilicon/qm.h b/drivers/crypto/hisilicon/qm.h
new file mode 100644
index 000000000000..b3c5c34a0d13
--- /dev/null
+++ b/drivers/crypto/hisilicon/qm.h
@@ -0,0 +1,111 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+#ifndef HISI_ACC_QM_H
+#define HISI_ACC_QM_H
+
+#include <linux/dmapool.h>
+#include <linux/iopoll.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+
+#define QM_CQE_SIZE 16
+/* default queue depth for sq/cq/eq */
+#define QM_Q_DEPTH 1024
+
+/* qm user domain */
+#define QM_ARUSER_M_CFG_1 0x100088
+#define QM_ARUSER_M_CFG_ENABLE 0x100090
+#define QM_AWUSER_M_CFG_1 0x100098
+#define QM_AWUSER_M_CFG_ENABLE 0x1000a0
+#define QM_WUSER_M_CFG_ENABLE 0x1000a8
+
+/* qm cache */
+#define QM_CACHE_CTL 0x100050
+#define QM_AXI_M_CFG 0x1000ac
+#define QM_AXI_M_CFG_ENABLE 0x1000b0
+#define QM_PEH_AXUSER_CFG 0x1000cc
+#define QM_PEH_AXUSER_CFG_ENABLE 0x1000d0
+
+#define QP_SQE_ADDR(qp) ((qp)->scqe.addr)
+
+struct qm_dma_buffer {
+ int size;
+ void *addr;
+ dma_addr_t dma;
+};
+
+struct qm_info {
+ int ver;
+ const char *dev_name;
+ struct pci_dev *pdev;
+
+ resource_size_t phys_base;
+ resource_size_t size;
+ void __iomem *io_base;
+
+ u32 sqe_size;
+ u32 qp_base;
+ u32 qp_num;
+
+ struct qm_dma_buffer sqc, cqc, eqc, eqe, aeqc, aeqe;
+
+ u32 eq_head;
+
+ rwlock_t qps_lock;
+ unsigned long *qp_bitmap;
+ struct hisi_qp **qp_array;
+
+ struct mutex mailbox_lock;
+
+ struct hisi_acc_qm_hw_ops *ops;
+
+};
+#define QM_ADDR(qm, off) ((qm)->io_base + off)
+
+struct hisi_acc_qp_status {
+ u16 sq_tail;
+ u16 sq_head;
+ u16 cq_head;
+ u16 sqn;
+ bool cqc_phase;
+ int is_sq_full;
+};
+
+struct hisi_qp;
+
+struct hisi_qp_ops {
+ int (*fill_sqe)(void *sqe, void *q_parm, void *d_parm);
+};
+
+struct hisi_qp {
+ /* sq number in this function */
+ u32 queue_id;
+ u8 alg_type;
+ u8 req_type;
+
+ struct qm_dma_buffer sqc, cqc;
+ struct qm_dma_buffer scqe;
+
+ struct hisi_acc_qp_status qp_status;
+
+ struct qm_info *qm;
+
+ /* for crypto sync API */
+ struct completion completion;
+
+ struct hisi_qp_ops *hw_ops;
+ void *qp_ctx;
+ void (*event_cb)(struct hisi_qp *qp);
+ void (*req_cb)(struct hisi_qp *qp, void *data);
+};
+
+extern int hisi_qm_init(const char *dev_name, struct qm_info *qm);
+extern void hisi_qm_uninit(struct qm_info *qm);
+extern int hisi_qm_start(struct qm_info *qm);
+extern void hisi_qm_stop(struct qm_info *qm);
+extern int hisi_qm_mem_start(struct qm_info *qm);
+extern struct hisi_qp *hisi_qm_create_qp(struct qm_info *qm, u8 alg_type);
+extern int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long arg);
+extern void hisi_qm_release_qp(struct hisi_qp *qp);
+extern int hisi_qp_send(struct hisi_qp *qp, void *msg);
+#endif
--
2.17.1

2018-08-01 10:22:19

by Kenneth Lee

[permalink] [raw]
Subject: [RFC PATCH 5/7] crypto: Add Hisilicon Zip driver

From: Kenneth Lee <[email protected]>

The Hisilicon ZIP accelerator implements zlib and gzip algorithm support
for the software. It uses Hisilicon QM as the interface to the CPU, so it
is shown up as a PCIE device to the CPU with a group of queues.

This commit provides PCIE driver to the accelerator and register it to
the crypto subsystem.

Signed-off-by: Kenneth Lee <[email protected]>
Signed-off-by: Zhou Wang <[email protected]>
Signed-off-by: Hao Fang <[email protected]>
---
drivers/crypto/hisilicon/Kconfig | 7 +
drivers/crypto/hisilicon/Makefile | 1 +
drivers/crypto/hisilicon/zip/Makefile | 2 +
drivers/crypto/hisilicon/zip/zip.h | 55 ++++
drivers/crypto/hisilicon/zip/zip_crypto.c | 358 ++++++++++++++++++++++
drivers/crypto/hisilicon/zip/zip_crypto.h | 18 ++
drivers/crypto/hisilicon/zip/zip_main.c | 182 +++++++++++
7 files changed, 623 insertions(+)
create mode 100644 drivers/crypto/hisilicon/zip/Makefile
create mode 100644 drivers/crypto/hisilicon/zip/zip.h
create mode 100644 drivers/crypto/hisilicon/zip/zip_crypto.c
create mode 100644 drivers/crypto/hisilicon/zip/zip_crypto.h
create mode 100644 drivers/crypto/hisilicon/zip/zip_main.c

diff --git a/drivers/crypto/hisilicon/Kconfig b/drivers/crypto/hisilicon/Kconfig
index 0dd30f84b90e..48ad682e0a52 100644
--- a/drivers/crypto/hisilicon/Kconfig
+++ b/drivers/crypto/hisilicon/Kconfig
@@ -6,3 +6,10 @@ config CRYPTO_DEV_HISILICON
config CRYPTO_DEV_HISI_QM
tristate
depends on ARM64 && PCI
+
+config CRYPTO_DEV_HISI_ZIP
+ tristate "Support for HISI ZIP Driver"
+ depends on ARM64 && CRYPTO_DEV_HISILICON
+ select CRYPTO_DEV_HISI_QM
+ help
+ Support for HiSilicon HIP08 ZIP Driver
diff --git a/drivers/crypto/hisilicon/Makefile b/drivers/crypto/hisilicon/Makefile
index 3378afc11703..62e40b093c49 100644
--- a/drivers/crypto/hisilicon/Makefile
+++ b/drivers/crypto/hisilicon/Makefile
@@ -1 +1,2 @@
obj-$(CONFIG_CRYPTO_DEV_HISI_QM) += qm.o
+obj-$(CONFIG_CRYPTO_DEV_HISI_ZIP) += zip/
diff --git a/drivers/crypto/hisilicon/zip/Makefile b/drivers/crypto/hisilicon/zip/Makefile
new file mode 100644
index 000000000000..a936f099ee22
--- /dev/null
+++ b/drivers/crypto/hisilicon/zip/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_CRYPTO_DEV_HISI_ZIP) += hisi_zip.o
+hisi_zip-objs = zip_main.o zip_crypto.o
diff --git a/drivers/crypto/hisilicon/zip/zip.h b/drivers/crypto/hisilicon/zip/zip.h
new file mode 100644
index 000000000000..a0c56e4aeb51
--- /dev/null
+++ b/drivers/crypto/hisilicon/zip/zip.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+#ifndef HISI_ZIP_H
+#define HISI_ZIP_H
+
+#include <linux/list.h>
+#include "../qm.h"
+
+#define HZIP_SQE_SIZE 128
+#define HZIP_SQ_SIZE (HZIP_SQE_SIZE * QM_Q_DEPTH)
+#define QM_CQ_SIZE (QM_CQE_SIZE * QM_Q_DEPTH)
+#define HZIP_PF_DEF_Q_NUM 64
+#define HZIP_PF_DEF_Q_BASE 0
+
+struct hisi_zip {
+ struct qm_info qm;
+ struct list_head list;
+
+#ifdef CONFIG_CRYPTO_DEV_HISI_SPIMDEV
+ struct vfio_spimdev *spimdev;
+#endif
+};
+
+struct hisi_zip_sqe {
+ __u32 consumed;
+ __u32 produced;
+ __u32 comp_data_length;
+ __u32 dw3;
+ __u32 input_data_length;
+ __u32 lba_l;
+ __u32 lba_h;
+ __u32 dw7;
+ __u32 dw8;
+ __u32 dw9;
+ __u32 dw10;
+ __u32 priv_info;
+ __u32 dw12;
+ __u32 tag;
+ __u32 dest_avail_out;
+ __u32 rsvd0;
+ __u32 comp_head_addr_l;
+ __u32 comp_head_addr_h;
+ __u32 source_addr_l;
+ __u32 source_addr_h;
+ __u32 dest_addr_l;
+ __u32 dest_addr_h;
+ __u32 stream_ctx_addr_l;
+ __u32 stream_ctx_addr_h;
+ __u32 cipher_key1_addr_l;
+ __u32 cipher_key1_addr_h;
+ __u32 cipher_key2_addr_l;
+ __u32 cipher_key2_addr_h;
+ __u32 rsvd1[4];
+};
+
+#endif
diff --git a/drivers/crypto/hisilicon/zip/zip_crypto.c b/drivers/crypto/hisilicon/zip/zip_crypto.c
new file mode 100644
index 000000000000..a3a5a6a6554d
--- /dev/null
+++ b/drivers/crypto/hisilicon/zip/zip_crypto.c
@@ -0,0 +1,358 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Copyright 2018 (c) HiSilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/crypto.h>
+#include <linux/dma-mapping.h>
+#include <linux/pci.h>
+#include <linux/topology.h>
+#include "../qm.h"
+#include "zip.h"
+
+#define INPUT_BUFFER_SIZE (64 * 1024)
+#define OUTPUT_BUFFER_SIZE (64 * 1024)
+
+extern struct list_head hisi_zip_list;
+
+#define COMP_NAME_TO_TYPE(alg_name) \
+ (!strcmp((alg_name), "zlib-deflate") ? 0x02 : \
+ !strcmp((alg_name), "gzip") ? 0x03 : 0) \
+
+struct hisi_zip_buffer {
+ u8 *input;
+ dma_addr_t input_dma;
+ u8 *output;
+ dma_addr_t output_dma;
+};
+
+struct hisi_zip_qp_ctx {
+ struct hisi_zip_buffer buffer;
+ struct hisi_qp *qp;
+ struct hisi_zip_sqe zip_sqe;
+};
+
+struct hisi_zip_ctx {
+#define QPC_COMP 0
+#define QPC_DECOMP 1
+ struct hisi_zip_qp_ctx qp_ctx[2];
+};
+
+static struct hisi_zip *find_zip_device(int node)
+{
+ struct hisi_zip *hisi_zip, *ret = NULL;
+ struct device *dev;
+ int min_distance = 100;
+
+ list_for_each_entry(hisi_zip, &hisi_zip_list, list) {
+ dev = &hisi_zip->qm.pdev->dev;
+ if (node_distance(dev->numa_node, node) < min_distance) {
+ ret = hisi_zip;
+ min_distance = node_distance(dev->numa_node, node);
+ }
+ }
+
+ return ret;
+}
+
+static void hisi_zip_qp_event_notifier(struct hisi_qp *qp)
+{
+ complete(&qp->completion);
+}
+
+static int hisi_zip_fill_sqe_v1(void *sqe, void *q_parm, u32 len)
+{
+ struct hisi_zip_sqe *zip_sqe = (struct hisi_zip_sqe *)sqe;
+ struct hisi_zip_qp_ctx *qp_ctx = (struct hisi_zip_qp_ctx *)q_parm;
+ struct hisi_zip_buffer *buffer = &qp_ctx->buffer;
+
+ memset(zip_sqe, 0, sizeof(struct hisi_zip_sqe));
+
+ zip_sqe->input_data_length = len;
+ zip_sqe->dw9 = qp_ctx->qp->req_type;
+ zip_sqe->dest_avail_out = OUTPUT_BUFFER_SIZE;
+ zip_sqe->source_addr_l = lower_32_bits(buffer->input_dma);
+ zip_sqe->source_addr_h = upper_32_bits(buffer->input_dma);
+ zip_sqe->dest_addr_l = lower_32_bits(buffer->output_dma);
+ zip_sqe->dest_addr_h = upper_32_bits(buffer->output_dma);
+
+ return 0;
+}
+
+/* let's allocate one buffer now, may have problem in async case */
+static int hisi_zip_alloc_qp_buffer(struct hisi_zip_qp_ctx *hisi_zip_qp_ctx)
+{
+ struct hisi_zip_buffer *buffer = &hisi_zip_qp_ctx->buffer;
+ struct hisi_qp *qp = hisi_zip_qp_ctx->qp;
+ struct device *dev = &qp->qm->pdev->dev;
+ int ret;
+
+ buffer->input = dma_alloc_coherent(dev, INPUT_BUFFER_SIZE,
+ &buffer->input_dma, GFP_KERNEL);
+ if (!buffer->input)
+ return -ENOMEM;
+
+ buffer->output = dma_alloc_coherent(dev, OUTPUT_BUFFER_SIZE,
+ &buffer->output_dma, GFP_KERNEL);
+ if (!buffer->output) {
+ ret = -ENOMEM;
+ goto err_alloc_output_buffer;
+ }
+
+ return 0;
+
+err_alloc_output_buffer:
+ dma_free_coherent(dev, INPUT_BUFFER_SIZE, buffer->input,
+ buffer->input_dma);
+ return ret;
+}
+
+static void hisi_zip_free_qp_buffer(struct hisi_zip_qp_ctx *hisi_zip_qp_ctx)
+{
+ struct hisi_zip_buffer *buffer = &hisi_zip_qp_ctx->buffer;
+ struct hisi_qp *qp = hisi_zip_qp_ctx->qp;
+ struct device *dev = &qp->qm->pdev->dev;
+
+ dma_free_coherent(dev, INPUT_BUFFER_SIZE, buffer->input,
+ buffer->input_dma);
+ dma_free_coherent(dev, OUTPUT_BUFFER_SIZE, buffer->output,
+ buffer->output_dma);
+}
+
+static int hisi_zip_create_qp(struct qm_info *qm, struct hisi_zip_qp_ctx *ctx,
+ int alg_type, int req_type)
+{
+ struct hisi_qp *qp;
+ int ret;
+
+ qp = hisi_qm_create_qp(qm, alg_type);
+
+ if (IS_ERR(qp))
+ return PTR_ERR(qp);
+
+ qp->event_cb = hisi_zip_qp_event_notifier;
+ qp->req_type = req_type;
+
+ qp->qp_ctx = ctx;
+ ctx->qp = qp;
+
+ ret = hisi_zip_alloc_qp_buffer(ctx);
+ if (ret)
+ goto err_with_qp;
+
+ ret = hisi_qm_start_qp(qp, 0);
+ if (ret < 0)
+ goto err_with_qp_buffer;
+
+ return 0;
+err_with_qp_buffer:
+ hisi_zip_free_qp_buffer(ctx);
+err_with_qp:
+ hisi_qm_release_qp(qp);
+ return ret;
+}
+
+static void hisi_zip_release_qp(struct hisi_zip_qp_ctx *ctx)
+{
+ hisi_qm_release_qp(ctx->qp);
+ hisi_zip_free_qp_buffer(ctx);
+}
+
+static int hisi_zip_alloc_comp_ctx(struct crypto_tfm *tfm)
+{
+ struct hisi_zip_ctx *hisi_zip_ctx = crypto_tfm_ctx(tfm);
+ const char *alg_name = crypto_tfm_alg_name(tfm);
+ struct hisi_zip *hisi_zip;
+ struct qm_info *qm;
+ int ret, i, j;
+
+ u8 req_type = COMP_NAME_TO_TYPE(alg_name);
+
+ /* find the proper zip device */
+ hisi_zip = find_zip_device(cpu_to_node(smp_processor_id()));
+ if (!hisi_zip) {
+ pr_err("Can not find proper ZIP device!\n");
+ return -1;
+ }
+ qm = &hisi_zip->qm;
+
+ for (i = 0; i < 2; i++) {
+ /* it is just happen that 0 is compress, 1 is decompress on alg_type */
+ ret = hisi_zip_create_qp(qm, &hisi_zip_ctx->qp_ctx[i], i,
+ req_type);
+ if (ret)
+ goto err;
+ }
+
+ return 0;
+err:
+ for (j = i-1; j >= 0; j--)
+ hisi_zip_release_qp(&hisi_zip_ctx->qp_ctx[j]);
+
+ return ret;
+}
+
+static void hisi_zip_free_comp_ctx(struct crypto_tfm *tfm)
+{
+ struct hisi_zip_ctx *hisi_zip_ctx = crypto_tfm_ctx(tfm);
+ int i;
+
+ /* release the qp */
+ for (i = 1; i >= 0; i--)
+ hisi_zip_release_qp(&hisi_zip_ctx->qp_ctx[i]);
+}
+
+static int hisi_zip_copy_data_to_buffer(struct hisi_zip_qp_ctx *qp_ctx,
+ const u8 *src, unsigned int slen)
+{
+ struct hisi_zip_buffer *buffer = &qp_ctx->buffer;
+
+ if (slen > INPUT_BUFFER_SIZE)
+ return -EINVAL;
+
+ memcpy(buffer->input, src, slen);
+
+ return 0;
+}
+
+static struct hisi_zip_sqe *hisi_zip_get_writeback_sqe(struct hisi_qp *qp)
+{
+ struct hisi_acc_qp_status *qp_status = &qp->qp_status;
+ struct hisi_zip_sqe *sq_base = QP_SQE_ADDR(qp);
+ u16 sq_head = qp_status->sq_head;
+
+ return sq_base + sq_head;
+}
+
+static int hisi_zip_copy_data_from_buffer(struct hisi_zip_qp_ctx *qp_ctx,
+ u8 *dst, unsigned int *dlen)
+{
+ struct hisi_zip_buffer *buffer = &qp_ctx->buffer;
+ struct hisi_qp *qp = qp_ctx->qp;
+ struct hisi_zip_sqe *zip_sqe = hisi_zip_get_writeback_sqe(qp);
+ u32 status = zip_sqe->dw3 & 0xff;
+
+ if (status != 0) {
+ pr_err("hisi zip: %s fail!\n", (qp->alg_type == 0) ?
+ "compression" : "decompression");
+ return status;
+ }
+
+ if (zip_sqe->produced > OUTPUT_BUFFER_SIZE)
+ return -ENOMEM;
+
+ memcpy(dst, buffer->output, zip_sqe->produced);
+ *dlen = zip_sqe->produced;
+ qp->qp_status.sq_head++;
+
+ return 0;
+}
+
+static int hisi_zip_compress(struct crypto_tfm *tfm, const u8 *src,
+ unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+ struct hisi_zip_ctx *hisi_zip_ctx = crypto_tfm_ctx(tfm);
+ struct hisi_zip_qp_ctx *qp_ctx = &hisi_zip_ctx->qp_ctx[QPC_COMP];
+ struct hisi_qp *qp = qp_ctx->qp;
+ struct hisi_zip_sqe *zip_sqe = &qp_ctx->zip_sqe;
+ int ret;
+
+ ret = hisi_zip_copy_data_to_buffer(qp_ctx, src, slen);
+ if (ret < 0)
+ return ret;
+
+ hisi_zip_fill_sqe_v1(zip_sqe, qp_ctx, slen);
+
+ /* send command to start the compress job */
+ hisi_qp_send(qp, zip_sqe);
+
+ return hisi_zip_copy_data_from_buffer(qp_ctx, dst, dlen);
+}
+
+static int hisi_zip_decompress(struct crypto_tfm *tfm, const u8 *src,
+ unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+ struct hisi_zip_ctx *hisi_zip_ctx = crypto_tfm_ctx(tfm);
+ struct hisi_zip_qp_ctx *qp_ctx = &hisi_zip_ctx->qp_ctx[QPC_DECOMP];
+ struct hisi_qp *qp = qp_ctx->qp;
+ struct hisi_zip_sqe *zip_sqe = &qp_ctx->zip_sqe;
+ int ret;
+
+ ret = hisi_zip_copy_data_to_buffer(qp_ctx, src, slen);
+ if (ret < 0)
+ return ret;
+
+ hisi_zip_fill_sqe_v1(zip_sqe, qp_ctx, slen);
+
+ /* send command to start the compress job */
+ hisi_qp_send(qp, zip_sqe);
+
+ return hisi_zip_copy_data_from_buffer(qp_ctx, dst, dlen);
+}
+
+static struct crypto_alg hisi_zip_zlib = {
+ .cra_name = "zlib-deflate",
+ .cra_flags = CRYPTO_ALG_TYPE_COMPRESS,
+ .cra_ctxsize = sizeof(struct hisi_zip_ctx),
+ .cra_priority = 300,
+ .cra_module = THIS_MODULE,
+ .cra_init = hisi_zip_alloc_comp_ctx,
+ .cra_exit = hisi_zip_free_comp_ctx,
+ .cra_u = {
+ .compress = {
+ .coa_compress = hisi_zip_compress,
+ .coa_decompress = hisi_zip_decompress
+ }
+ }
+};
+
+static struct crypto_alg hisi_zip_gzip = {
+ .cra_name = "gzip",
+ .cra_flags = CRYPTO_ALG_TYPE_COMPRESS,
+ .cra_ctxsize = sizeof(struct hisi_zip_ctx),
+ .cra_priority = 300,
+ .cra_module = THIS_MODULE,
+ .cra_init = hisi_zip_alloc_comp_ctx,
+ .cra_exit = hisi_zip_free_comp_ctx,
+ .cra_u = {
+ .compress = {
+ .coa_compress = hisi_zip_compress,
+ .coa_decompress = hisi_zip_decompress
+ }
+ }
+};
+
+int hisi_zip_register_to_crypto(void)
+{
+ int ret;
+
+ ret = crypto_register_alg(&hisi_zip_zlib);
+ if (ret < 0) {
+ pr_err("Zlib algorithm registration failed\n");
+ return ret;
+ }
+
+ ret = crypto_register_alg(&hisi_zip_gzip);
+ if (ret < 0) {
+ pr_err("Gzip algorithm registration failed\n");
+ goto err_unregister_zlib;
+ }
+
+ return 0;
+
+err_unregister_zlib:
+ crypto_unregister_alg(&hisi_zip_zlib);
+
+ return ret;
+}
+
+void hisi_zip_unregister_from_crypto(void)
+{
+ crypto_unregister_alg(&hisi_zip_zlib);
+ crypto_unregister_alg(&hisi_zip_gzip);
+}
diff --git a/drivers/crypto/hisilicon/zip/zip_crypto.h b/drivers/crypto/hisilicon/zip/zip_crypto.h
new file mode 100644
index 000000000000..6fae34b4df3a
--- /dev/null
+++ b/drivers/crypto/hisilicon/zip/zip_crypto.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Copyright (c) 2018 HiSilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#ifndef HISI_ZIP_CRYPTO_H
+#define HISI_ZIP_CRYPTO_H
+
+int hisi_zip_register_to_crypto(void);
+void hisi_zip_unregister_from_crypto(void);
+
+#endif
diff --git a/drivers/crypto/hisilicon/zip/zip_main.c b/drivers/crypto/hisilicon/zip/zip_main.c
new file mode 100644
index 000000000000..2478cf6c5350
--- /dev/null
+++ b/drivers/crypto/hisilicon/zip/zip_main.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0+
+#include <linux/io.h>
+#include <linux/bitops.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/vfio_spimdev.h>
+#include "zip.h"
+#include "zip_crypto.h"
+
+#define HZIP_VF_NUM 63
+#define HZIP_QUEUE_NUM_V1 4096
+#define HZIP_QUEUE_NUM_V2 1024
+
+#define HZIP_FSM_MAX_CNT 0x301008
+
+#define HZIP_PORT_ARCA_CHE_0 0x301040
+#define HZIP_PORT_ARCA_CHE_1 0x301044
+#define HZIP_PORT_AWCA_CHE_0 0x301060
+#define HZIP_PORT_AWCA_CHE_1 0x301064
+
+#define HZIP_BD_RUSER_32_63 0x301110
+#define HZIP_SGL_RUSER_32_63 0x30111c
+#define HZIP_DATA_RUSER_32_63 0x301128
+#define HZIP_DATA_WUSER_32_63 0x301134
+#define HZIP_BD_WUSER_32_63 0x301140
+
+LIST_HEAD(hisi_zip_list);
+DEFINE_MUTEX(hisi_zip_list_lock);
+
+static const char hisi_zip_name[] = "hisi_zip";
+
+static const struct pci_device_id hisi_zip_dev_ids[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, 0xa250) },
+ { 0, }
+};
+
+static inline void hisi_zip_add_to_list(struct hisi_zip *hisi_zip)
+{
+ mutex_lock(&hisi_zip_list_lock);
+ list_add_tail(&hisi_zip->list, &hisi_zip_list);
+ mutex_unlock(&hisi_zip_list_lock);
+}
+
+static void hisi_zip_set_user_domain_and_cache(struct hisi_zip *hisi_zip)
+{
+ u32 val;
+
+ /* qm user domain */
+ writel(0x40001070, hisi_zip->qm.io_base + QM_ARUSER_M_CFG_1);
+ writel(0xfffffffe, hisi_zip->qm.io_base + QM_ARUSER_M_CFG_ENABLE);
+ writel(0x40001070, hisi_zip->qm.io_base + QM_AWUSER_M_CFG_1);
+ writel(0xfffffffe, hisi_zip->qm.io_base + QM_AWUSER_M_CFG_ENABLE);
+ writel(0xffffffff, hisi_zip->qm.io_base + QM_WUSER_M_CFG_ENABLE);
+ writel(0x4893, hisi_zip->qm.io_base + QM_CACHE_CTL);
+
+ val = readl(hisi_zip->qm.io_base + QM_PEH_AXUSER_CFG);
+ val |= (1 << 11);
+ writel(val, hisi_zip->qm.io_base + QM_PEH_AXUSER_CFG);
+
+ /* qm cache */
+ writel(0xffff, hisi_zip->qm.io_base + QM_AXI_M_CFG);
+ writel(0xffffffff, hisi_zip->qm.io_base + QM_AXI_M_CFG_ENABLE);
+ writel(0xffffffff, hisi_zip->qm.io_base + QM_PEH_AXUSER_CFG_ENABLE);
+
+ /* cache */
+ writel(0xffffffff, hisi_zip->qm.io_base + HZIP_PORT_ARCA_CHE_0);
+ writel(0xffffffff, hisi_zip->qm.io_base + HZIP_PORT_ARCA_CHE_1);
+ writel(0xffffffff, hisi_zip->qm.io_base + HZIP_PORT_AWCA_CHE_0);
+ writel(0xffffffff, hisi_zip->qm.io_base + HZIP_PORT_AWCA_CHE_1);
+ /* user domain configurations */
+ writel(0x40001070, hisi_zip->qm.io_base + HZIP_BD_RUSER_32_63);
+ writel(0x40001070, hisi_zip->qm.io_base + HZIP_SGL_RUSER_32_63);
+ writel(0x40001071, hisi_zip->qm.io_base + HZIP_DATA_RUSER_32_63);
+ writel(0x40001071, hisi_zip->qm.io_base + HZIP_DATA_WUSER_32_63);
+ writel(0x40001070, hisi_zip->qm.io_base + HZIP_BD_WUSER_32_63);
+
+ /* fsm count */
+ writel(0xfffffff, hisi_zip->qm.io_base + HZIP_FSM_MAX_CNT);
+
+ /* clock gating, core, decompress verify enable */
+ writel(0x10005, hisi_zip->qm.io_base + 0x301004);
+}
+
+static int hisi_zip_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+ struct hisi_zip *hisi_zip;
+ struct qm_info *qm;
+ int ret;
+ u8 rev_id;
+
+ hisi_zip = devm_kzalloc(&pdev->dev, sizeof(*hisi_zip), GFP_KERNEL);
+ if (!hisi_zip)
+ return -ENOMEM;
+ hisi_zip_add_to_list(hisi_zip);
+
+ qm = &hisi_zip->qm;
+ qm->pdev = pdev;
+
+ pci_read_config_byte(pdev, PCI_REVISION_ID, &rev_id);
+ if (rev_id == 0x20)
+ qm->ver = 1;
+ qm->sqe_size = HZIP_SQE_SIZE;
+ ret = hisi_qm_init(hisi_zip_name, qm);
+ if (ret)
+ goto err_with_hisi_zip;
+
+ if (pdev->is_physfn) {
+ ret = hisi_qm_mem_start(qm);
+ if (ret)
+ goto err_with_qm_init;
+
+ hisi_zip_set_user_domain_and_cache(hisi_zip);
+
+ qm->qp_base = HZIP_PF_DEF_Q_BASE;
+ qm->qp_num = HZIP_PF_DEF_Q_NUM;
+ }
+
+ ret = hisi_qm_start(qm);
+ if (ret)
+ goto err_with_qm_init;
+
+ return 0;
+
+err_with_qm_init:
+ hisi_qm_uninit(qm);
+err_with_hisi_zip:
+ kfree(hisi_zip);
+ return ret;
+}
+
+static void hisi_zip_remove(struct pci_dev *pdev)
+{
+ struct hisi_zip *hisi_zip = pci_get_drvdata(pdev);
+ struct qm_info *qm = &hisi_zip->qm;
+
+ hisi_qm_stop(qm);
+ hisi_qm_uninit(qm);
+ kfree(hisi_zip);
+}
+
+static struct pci_driver hisi_zip_pci_driver = {
+ .name = "hisi_zip",
+ .id_table = hisi_zip_dev_ids,
+ .probe = hisi_zip_probe,
+ .remove = hisi_zip_remove,
+};
+
+static int __init hisi_zip_init(void)
+{
+ int ret;
+
+ ret = pci_register_driver(&hisi_zip_pci_driver);
+ if (ret < 0) {
+ pr_err("zip: can't register hisi zip driver.\n");
+ return ret;
+ }
+
+ ret = hisi_zip_register_to_crypto();
+ if (ret < 0) {
+ pr_err("zip: can't register hisi zip to crypto.\n");
+ pci_unregister_driver(&hisi_zip_pci_driver);
+ return ret;
+ }
+
+ return 0;
+}
+
+static void __exit hisi_zip_exit(void)
+{
+ hisi_zip_unregister_from_crypto();
+ pci_unregister_driver(&hisi_zip_pci_driver);
+}
+
+module_init(hisi_zip_init);
+module_exit(hisi_zip_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Zhou Wang <[email protected]>");
+MODULE_DESCRIPTION("Driver for HiSilicon ZIP accelerator");
+MODULE_DEVICE_TABLE(pci, hisi_zip_dev_ids);
--
2.17.1

2018-08-01 10:22:20

by Kenneth Lee

[permalink] [raw]
Subject: [RFC PATCH 6/7] crypto: add spimdev support to Hisilicon QM

From: Kenneth Lee <[email protected]>

The commit add spimdev support to the Hislicon QM driver, any
accelerator that use QM can share its queues to the user space.

Signed-off-by: Kenneth Lee <[email protected]>
Signed-off-by: Zhou Wang <[email protected]>
Signed-off-by: Hao Fang <[email protected]>
Signed-off-by: Zaibo Xu <[email protected]>
---
drivers/crypto/hisilicon/qm.c | 150 ++++++++++++++++++++++++++++++++++
drivers/crypto/hisilicon/qm.h | 12 +++
2 files changed, 162 insertions(+)

diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
index e779bc661500..06da8387dc58 100644
--- a/drivers/crypto/hisilicon/qm.c
+++ b/drivers/crypto/hisilicon/qm.c
@@ -667,6 +667,146 @@ int hisi_qp_send(struct hisi_qp *qp, void *msg)
}
EXPORT_SYMBOL_GPL(hisi_qp_send);

+#ifdef CONFIG_CRYPTO_DEV_HISI_SPIMDEV
+/* mdev->supported_type_groups */
+static struct attribute *hisi_qm_type_attrs[] = {
+ VFIO_SPIMDEV_DEFAULT_MDEV_TYPE_ATTRS,
+ NULL,
+};
+static struct attribute_group hisi_qm_type_group = {
+ .attrs = hisi_qm_type_attrs,
+};
+static struct attribute_group *mdev_type_groups[] = {
+ &hisi_qm_type_group,
+ NULL,
+};
+
+static void qm_qp_event_notifier(struct hisi_qp *qp)
+{
+ vfio_spimdev_wake_up(qp->spimdev_q);
+}
+
+static int hisi_qm_get_queue(struct vfio_spimdev *spimdev, unsigned long arg,
+ struct vfio_spimdev_queue **q)
+{
+ struct qm_info *qm = spimdev->priv;
+ struct hisi_qp *qp = NULL;
+ struct vfio_spimdev_queue *wd_q;
+ u8 alg_type = 0; /* fix me here */
+ int ret;
+ int pasid = arg;
+
+ qp = hisi_qm_create_qp(qm, alg_type);
+ if (IS_ERR(qp))
+ return PTR_ERR(qp);
+
+ wd_q = kzalloc(sizeof(struct vfio_spimdev_queue), GFP_KERNEL);
+ if (!wd_q) {
+ ret = -ENOMEM;
+ goto err_with_qp;
+ }
+
+ wd_q->priv = qp;
+ wd_q->spimdev = spimdev;
+ wd_q->qid = (u16)ret;
+ *q = wd_q;
+ qp->spimdev_q = wd_q;
+ qp->event_cb = qm_qp_event_notifier;
+
+ ret = hisi_qm_start_qp(qp, arg);
+ if (ret < 0)
+ goto err_with_wd_q;
+
+ return ret;
+
+err_with_wd_q:
+ kfree(wd_q);
+err_with_qp:
+ hisi_qm_release_qp(qp);
+ return ret;
+}
+
+static int hisi_qm_put_queue(struct vfio_spimdev_queue *q)
+{
+ struct hisi_qp *qp = q->priv;
+
+ /* need to stop hardware, but can not support in v1 */
+ hisi_qm_release_qp(qp);
+ kfree(q);
+ return 0;
+}
+
+/* map sq/cq/doorbell to user space */
+static int hisi_qm_mmap(struct vfio_spimdev_queue *q,
+ struct vm_area_struct *vma)
+{
+ struct hisi_qp *qp = (struct hisi_qp *)q->priv;
+ struct qm_info *qm = qp->qm;
+ struct device *dev = &qm->pdev->dev;
+ size_t sz = vma->vm_end - vma->vm_start;
+ u8 region;
+
+ vma->vm_flags |= (VM_IO | VM_LOCKED | VM_DONTEXPAND | VM_DONTDUMP);
+ region = _VFIO_SPIMDEV_REGION(vma->vm_pgoff);
+
+ switch (region) {
+ case 0:
+ if (sz > PAGE_SIZE)
+ return -EINVAL;
+ /*
+ * Warning: This is not safe as multiple queues use the same
+ * doorbell, v1 hardware interface problem. v2 will fix it
+ */
+ return remap_pfn_range(vma, vma->vm_start,
+ qm->phys_base >> PAGE_SHIFT,
+ sz, pgprot_noncached(vma->vm_page_prot));
+ case 1:
+ vma->vm_pgoff = 0;
+ if (sz > qp->scqe.size)
+ return -EINVAL;
+
+ return dma_mmap_coherent(dev, vma, qp->scqe.addr, qp->scqe.dma,
+ sz);
+
+ default:
+ return -EINVAL;
+ }
+}
+
+static const struct vfio_spimdev_ops qm_ops = {
+ .get_queue = hisi_qm_get_queue,
+ .put_queue = hisi_qm_put_queue,
+ .mmap = hisi_qm_mmap,
+};
+
+static int qm_register_spimdev(struct qm_info *qm)
+{
+ struct pci_dev *pdev = qm->pdev;
+ struct vfio_spimdev *spimdev = &qm->spimdev;
+
+ spimdev->iommu_type = VFIO_TYPE1_IOMMU;
+#ifdef CONFIG_IOMMU_SVA
+ spimdev->dma_flag = VFIO_SPIMDEV_DMA_MULTI_PROC_MAP;
+#else
+ spimdev->dma_flag = VFIO_SPIMDEV_DMA_SINGLE_PROC_MAP;
+#endif
+ spimdev->owner = THIS_MODULE;
+ spimdev->name = qm->dev_name;
+ spimdev->dev = &pdev->dev;
+ spimdev->is_vf = pdev->is_virtfn;
+ spimdev->priv = qm;
+ spimdev->api_ver = "hisi_qm_v1";
+ spimdev->flags = 0;
+
+ spimdev->mdev_fops.mdev_attr_groups = qm->mdev_dev_groups;
+ hisi_qm_type_group.name = qm->dev_name;
+ spimdev->mdev_fops.supported_type_groups = mdev_type_groups;
+ spimdev->ops = &qm_ops;
+
+ return vfio_spimdev_register(spimdev);
+}
+#endif
+
int hisi_qm_init(const char *dev_name, struct qm_info *qm)
{
int ret;
@@ -804,6 +944,12 @@ int hisi_qm_start(struct qm_info *qm)
if (ret)
goto err_with_cqc;

+#ifdef CONFIG_CRYPTO_DEV_HISI_SPIMDEV
+ ret = qm_register_spimdev(qm);
+ if (ret)
+ goto err_with_irq;
+#endif
+
writel(0x0, QM_ADDR(qm, QM_VF_EQ_INT_MASK));

return 0;
@@ -830,6 +976,10 @@ void hisi_qm_stop(struct qm_info *qm)
struct pci_dev *pdev = qm->pdev;
struct device *dev = &pdev->dev;

+#ifdef CONFIG_CRYPTO_DEV_HISI_SPIMDEV
+ vfio_spimdev_unregister(&qm->spimdev);
+#endif
+
free_irq(pci_irq_vector(pdev, 0), qm);
qm_uninit_q_buffer(dev, &qm->cqc);
kfree(qm->qp_array);
diff --git a/drivers/crypto/hisilicon/qm.h b/drivers/crypto/hisilicon/qm.h
index b3c5c34a0d13..f73c08098b82 100644
--- a/drivers/crypto/hisilicon/qm.h
+++ b/drivers/crypto/hisilicon/qm.h
@@ -8,6 +8,10 @@
#include <linux/pci.h>
#include <linux/slab.h>

+#ifdef CONFIG_CRYPTO_DEV_HISI_SPIMDEV
+#include <linux/vfio_spimdev.h>
+#endif
+
#define QM_CQE_SIZE 16
/* default queue depth for sq/cq/eq */
#define QM_Q_DEPTH 1024
@@ -59,6 +63,10 @@ struct qm_info {

struct hisi_acc_qm_hw_ops *ops;

+#ifdef CONFIG_CRYPTO_DEV_HISI_SPIMDEV
+ struct vfio_spimdev spimdev;
+ const struct attribute_group **mdev_dev_groups;
+#endif
};
#define QM_ADDR(qm, off) ((qm)->io_base + off)

@@ -90,6 +98,10 @@ struct hisi_qp {

struct qm_info *qm;

+#ifdef CONFIG_CRYPTO_DEV_HISI_SPIMDEV
+ struct vfio_spimdev_queue *spimdev_q;
+#endif
+
/* for crypto sync API */
struct completion completion;

--
2.17.1

2018-08-01 10:22:21

by Kenneth Lee

[permalink] [raw]
Subject: [RFC PATCH 7/7] vfio/spimdev: add user sample for spimdev

From: Kenneth Lee <[email protected]>

This is the sample code to demostrate how WrapDrive user application
should be.

It contains:

1. wd.[ch], the common library to provide WrapDrive interface.
2. drv/*, the user driver to access the hardware upon spimdev
3. test/*, the test application to use WrapDrive interface to access the
hardware queue(s) of the accelerator.

The Hisilicon HIP08 ZIP accelerator is used in this sample.

Signed-off-by: Zaibo Xu <[email protected]>
Signed-off-by: Kenneth Lee <[email protected]>
Signed-off-by: Hao Fang <[email protected]>
Signed-off-by: Zhou Wang <[email protected]>
---
samples/warpdrive/AUTHORS | 2 +
samples/warpdrive/ChangeLog | 1 +
samples/warpdrive/Makefile.am | 9 +
samples/warpdrive/NEWS | 1 +
samples/warpdrive/README | 32 +++
samples/warpdrive/autogen.sh | 3 +
samples/warpdrive/cleanup.sh | 13 +
samples/warpdrive/configure.ac | 52 ++++
samples/warpdrive/drv/hisi_qm_udrv.c | 223 +++++++++++++++++
samples/warpdrive/drv/hisi_qm_udrv.h | 53 ++++
samples/warpdrive/test/Makefile.am | 7 +
samples/warpdrive/test/comp_hw.h | 23 ++
samples/warpdrive/test/test_hisi_zip.c | 204 ++++++++++++++++
samples/warpdrive/wd.c | 325 +++++++++++++++++++++++++
samples/warpdrive/wd.h | 153 ++++++++++++
samples/warpdrive/wd_adapter.c | 74 ++++++
samples/warpdrive/wd_adapter.h | 43 ++++
17 files changed, 1218 insertions(+)
create mode 100644 samples/warpdrive/AUTHORS
create mode 100644 samples/warpdrive/ChangeLog
create mode 100644 samples/warpdrive/Makefile.am
create mode 100644 samples/warpdrive/NEWS
create mode 100644 samples/warpdrive/README
create mode 100755 samples/warpdrive/autogen.sh
create mode 100755 samples/warpdrive/cleanup.sh
create mode 100644 samples/warpdrive/configure.ac
create mode 100644 samples/warpdrive/drv/hisi_qm_udrv.c
create mode 100644 samples/warpdrive/drv/hisi_qm_udrv.h
create mode 100644 samples/warpdrive/test/Makefile.am
create mode 100644 samples/warpdrive/test/comp_hw.h
create mode 100644 samples/warpdrive/test/test_hisi_zip.c
create mode 100644 samples/warpdrive/wd.c
create mode 100644 samples/warpdrive/wd.h
create mode 100644 samples/warpdrive/wd_adapter.c
create mode 100644 samples/warpdrive/wd_adapter.h

diff --git a/samples/warpdrive/AUTHORS b/samples/warpdrive/AUTHORS
new file mode 100644
index 000000000000..fe7dc2413b0d
--- /dev/null
+++ b/samples/warpdrive/AUTHORS
@@ -0,0 +1,2 @@
+Kenneth Lee<[email protected]>
+Zaibo Xu<[email protected]>
diff --git a/samples/warpdrive/ChangeLog b/samples/warpdrive/ChangeLog
new file mode 100644
index 000000000000..b1b716105590
--- /dev/null
+++ b/samples/warpdrive/ChangeLog
@@ -0,0 +1 @@
+init
diff --git a/samples/warpdrive/Makefile.am b/samples/warpdrive/Makefile.am
new file mode 100644
index 000000000000..41154a880a97
--- /dev/null
+++ b/samples/warpdrive/Makefile.am
@@ -0,0 +1,9 @@
+ACLOCAL_AMFLAGS = -I m4
+AUTOMAKE_OPTIONS = foreign subdir-objects
+AM_CFLAGS=-Wall -O0 -fno-strict-aliasing
+
+lib_LTLIBRARIES=libwd.la
+libwd_la_SOURCES=wd.c wd_adapter.c wd.h wd_adapter.h \
+ drv/hisi_qm_udrv.c drv/hisi_qm_udrv.h
+
+SUBDIRS=. test
diff --git a/samples/warpdrive/NEWS b/samples/warpdrive/NEWS
new file mode 100644
index 000000000000..b1b716105590
--- /dev/null
+++ b/samples/warpdrive/NEWS
@@ -0,0 +1 @@
+init
diff --git a/samples/warpdrive/README b/samples/warpdrive/README
new file mode 100644
index 000000000000..3adf66b112fc
--- /dev/null
+++ b/samples/warpdrive/README
@@ -0,0 +1,32 @@
+WD User Land Demonstration
+==========================
+
+This directory contains some applications and libraries to demonstrate how a
+
+WrapDrive application can be constructed.
+
+
+As a demo, we try to make it simple and clear for understanding. It is not
+
+supposed to be used in business scenario.
+
+
+The directory contains the following elements:
+
+wd.[ch]
+ A demonstration WrapDrive fundamental library which wraps the basic
+ operations to the WrapDrive-ed device.
+
+wd_adapter.[ch]
+ User driver adaptor for wd.[ch]
+
+wd_utils.[ch]
+ Some utitlities function used by WD and its drivers
+
+drv/*
+ User drivers. It helps to fulfill the semantic of wd.[ch] for
+ particular hardware
+
+test/*
+ Test applications to use the wrapdrive library
+
diff --git a/samples/warpdrive/autogen.sh b/samples/warpdrive/autogen.sh
new file mode 100755
index 000000000000..58deaf49de2a
--- /dev/null
+++ b/samples/warpdrive/autogen.sh
@@ -0,0 +1,3 @@
+#!/bin/sh -x
+
+autoreconf -i -f -v
diff --git a/samples/warpdrive/cleanup.sh b/samples/warpdrive/cleanup.sh
new file mode 100755
index 000000000000..c5f3d21e5dc1
--- /dev/null
+++ b/samples/warpdrive/cleanup.sh
@@ -0,0 +1,13 @@
+#!/bin/sh
+
+if [ -r Makefile ]; then
+ make distclean
+fi
+
+FILES="aclocal.m4 autom4te.cache compile config.guess config.h.in config.log \
+ config.status config.sub configure cscope.out depcomp install-sh \
+ libsrc/Makefile libsrc/Makefile.in libtool ltmain.sh Makefile \
+ ar-lib m4 \
+ Makefile.in missing src/Makefile src/Makefile.in test/Makefile.in"
+
+rm -vRf $FILES
diff --git a/samples/warpdrive/configure.ac b/samples/warpdrive/configure.ac
new file mode 100644
index 000000000000..53262f3197c2
--- /dev/null
+++ b/samples/warpdrive/configure.ac
@@ -0,0 +1,52 @@
+AC_PREREQ([2.69])
+AC_INIT([wrapdrive], [0.1], [[email protected]])
+AC_CONFIG_SRCDIR([wd.c])
+AM_INIT_AUTOMAKE([1.10 no-define])
+
+AC_CONFIG_MACRO_DIR([m4])
+AC_CONFIG_HEADERS([config.h])
+
+# Checks for programs.
+AC_PROG_CXX
+AC_PROG_AWK
+AC_PROG_CC
+AC_PROG_CPP
+AC_PROG_INSTALL
+AC_PROG_LN_S
+AC_PROG_MAKE_SET
+AC_PROG_RANLIB
+
+AM_PROG_AR
+AC_PROG_LIBTOOL
+AM_PROG_LIBTOOL
+LT_INIT
+AM_PROG_CC_C_O
+
+AC_DEFINE([HAVE_SVA], [0], [enable SVA support])
+AC_ARG_ENABLE([sva],
+ [ --enable-sva enable to support sva feature],
+ AC_DEFINE([HAVE_SVA], [1]))
+
+# Checks for libraries.
+
+# Checks for header files.
+AC_CHECK_HEADERS([fcntl.h stdint.h stdlib.h string.h sys/ioctl.h sys/time.h unistd.h])
+
+# Checks for typedefs, structures, and compiler characteristics.
+AC_CHECK_HEADER_STDBOOL
+AC_C_INLINE
+AC_TYPE_OFF_T
+AC_TYPE_SIZE_T
+AC_TYPE_UINT16_T
+AC_TYPE_UINT32_T
+AC_TYPE_UINT64_T
+AC_TYPE_UINT8_T
+
+# Checks for library functions.
+AC_FUNC_MALLOC
+AC_FUNC_MMAP
+AC_CHECK_FUNCS([memset munmap])
+
+AC_CONFIG_FILES([Makefile
+ test/Makefile])
+AC_OUTPUT
diff --git a/samples/warpdrive/drv/hisi_qm_udrv.c b/samples/warpdrive/drv/hisi_qm_udrv.c
new file mode 100644
index 000000000000..1213dff8daae
--- /dev/null
+++ b/samples/warpdrive/drv/hisi_qm_udrv.c
@@ -0,0 +1,223 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stdlib.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <sys/mman.h>
+#include <assert.h>
+#include <string.h>
+#include <stdint.h>
+#include <fcntl.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <sys/epoll.h>
+#include <sys/eventfd.h>
+
+#include "hisi_qm_udrv.h"
+
+#if __AARCH64EL__ == 1
+#define mb() {asm volatile("dsb sy" : : : "memory"); }
+#else
+#warning "this file need to be used on AARCH64EL mode"
+#define mb()
+#endif
+
+#define QM_SQE_SIZE 128
+#define QM_CQE_SIZE 16
+#define QM_EQ_DEPTH 1024
+
+/* cqe shift */
+#define CQE_PHASE(cq) (((*((__u32 *)(cq) + 3)) >> 16) & 0x1)
+#define CQE_SQ_NUM(cq) ((*((__u32 *)(cq) + 2)) >> 16)
+#define CQE_SQ_HEAD_INDEX(cq) ((*((__u32 *)(cq) + 2)) & 0xffff)
+
+#define QM_IOMEM_SIZE 4096
+
+#define QM_DOORBELL_OFFSET 0x340
+
+struct cqe {
+ __le32 rsvd0;
+ __le16 cmd_id;
+ __le16 rsvd1;
+ __le16 sq_head;
+ __le16 sq_num;
+ __le16 rsvd2;
+ __le16 w7; /* phase, status */
+};
+
+struct hisi_qm_queue_info {
+ void *sq_base;
+ void *cq_base;
+ void *doorbell_base;
+ __u16 sq_tail_index;
+ __u16 sq_head_index;
+ __u16 cq_head_index;
+ __u16 sqn;
+ bool cqc_phase;
+ void *req_cache[QM_EQ_DEPTH];
+ int is_sq_full;
+};
+
+int hacc_db(struct hisi_qm_queue_info *q, __u8 cmd, __u16 index, __u8 priority)
+{
+ void *base = q->doorbell_base;
+ __u16 sqn = q->sqn;
+ __u64 doorbell = 0;
+
+ doorbell = (__u64)sqn | ((__u64)cmd << 16);
+ doorbell |= ((__u64)index | ((__u64)priority << 16)) << 32;
+
+ *((__u64 *)base) = doorbell;
+
+ return 0;
+}
+
+static int hisi_qm_fill_sqe(void *msg, struct hisi_qm_queue_info *info, __u16 i)
+{
+ struct hisi_qm_msg *sqe = (struct hisi_qm_msg *)info->sq_base + i;
+ memcpy((void *)sqe, msg, sizeof(struct hisi_qm_msg));
+ assert(!info->req_cache[i]);
+ info->req_cache[i] = msg;
+
+ return 0;
+}
+
+static int hisi_qm_recv_sqe(struct hisi_qm_msg *sqe, struct hisi_qm_queue_info *info, __u16 i)
+{
+ __u32 status = sqe->dw3 & 0xff;
+ __u32 type = sqe->dw9 & 0xff;
+
+ if (status != 0 && status != 0x0d) {
+ fprintf(stderr, "bad status (s=%d, t=%d)\n", status, type);
+ return -EIO;
+ }
+
+ assert(info->req_cache[i]);
+ memcpy((void *)info->req_cache[i], sqe, sizeof(struct hisi_qm_msg));
+ return 0;
+}
+
+int hisi_qm_set_queue_dio(struct wd_queue *q)
+{
+ struct hisi_qm_queue_info *info;
+ void *vaddr;
+ int ret;
+
+ alloc_obj(info);
+ if (!info)
+ return -1;
+
+ q->priv = info;
+
+ vaddr = mmap(NULL,
+ QM_SQE_SIZE * QM_EQ_DEPTH + QM_CQE_SIZE * QM_EQ_DEPTH,
+ PROT_READ | PROT_WRITE, MAP_SHARED, q->fd, 4096);
+ if (vaddr <= 0) {
+ ret = (intptr_t)vaddr;
+ goto err_with_info;
+ }
+ info->sq_base = vaddr;
+ info->cq_base = vaddr + QM_SQE_SIZE * QM_EQ_DEPTH;
+
+ vaddr = mmap(NULL, QM_IOMEM_SIZE,
+ PROT_READ | PROT_WRITE, MAP_SHARED, q->fd, 0);
+ if (vaddr <= 0) {
+ ret = (intptr_t)vaddr;
+ goto err_with_scq;
+ }
+ info->doorbell_base = vaddr + QM_DOORBELL_OFFSET;
+ info->sq_tail_index = 0;
+ info->sq_head_index = 0;
+ info->cq_head_index = 0;
+ info->cqc_phase = 1;
+
+ info->is_sq_full = 0;
+
+ return 0;
+
+err_with_scq:
+ munmap(info->sq_base,
+ QM_SQE_SIZE * QM_EQ_DEPTH + QM_CQE_SIZE * QM_EQ_DEPTH);
+err_with_info:
+ free(info);
+ return ret;
+}
+
+void hisi_qm_unset_queue_dio(struct wd_queue *q)
+{
+ struct hisi_qm_queue_info *info = (struct hisi_qm_queue_info *)q->priv;
+
+ munmap(info->doorbell_base - QM_DOORBELL_OFFSET, QM_IOMEM_SIZE);
+ munmap(info->cq_base, QM_CQE_SIZE * QM_EQ_DEPTH);
+ munmap(info->sq_base, QM_SQE_SIZE * QM_EQ_DEPTH);
+ free(info);
+ q->priv = NULL;
+}
+
+int hisi_qm_add_to_dio_q(struct wd_queue *q, void *req)
+{
+ struct hisi_qm_queue_info *info = (struct hisi_qm_queue_info *)q->priv;
+ __u16 i;
+
+ if (info->is_sq_full)
+ return -EBUSY;
+
+ i = info->sq_tail_index;
+
+ hisi_qm_fill_sqe(req, q->priv, i);
+
+ mb();
+
+ if (i == (QM_EQ_DEPTH - 1))
+ i = 0;
+ else
+ i++;
+
+ hacc_db(info, DOORBELL_CMD_SQ, i, 0);
+
+ info->sq_tail_index = i;
+
+ if (i == info->sq_head_index)
+ info->is_sq_full = 1;
+
+ return 0;
+}
+
+int hisi_qm_get_from_dio_q(struct wd_queue *q, void **resp)
+{
+ struct hisi_qm_queue_info *info = (struct hisi_qm_queue_info *)q->priv;
+ __u16 i = info->cq_head_index;
+ struct cqe *cq_base = info->cq_base;
+ struct hisi_qm_msg *sq_base = info->sq_base;
+ struct cqe *cqe = cq_base + i;
+ struct hisi_qm_msg *sqe;
+ int ret;
+
+ if (info->cqc_phase == CQE_PHASE(cqe)) {
+ sqe = sq_base + CQE_SQ_HEAD_INDEX(cqe);
+ ret = hisi_qm_recv_sqe(sqe, info, i);
+ if (ret < 0)
+ return -EIO;
+
+ if (info->is_sq_full)
+ info->is_sq_full = 0;
+ } else {
+ return -EAGAIN;
+ }
+
+ *resp = info->req_cache[i];
+ info->req_cache[i] = NULL;
+
+ if (i == (QM_EQ_DEPTH - 1)) {
+ info->cqc_phase = !(info->cqc_phase);
+ i = 0;
+ } else
+ i++;
+
+ hacc_db(info, DOORBELL_CMD_CQ, i, 0);
+
+ info->cq_head_index = i;
+ info->sq_head_index = i;
+
+
+ return ret;
+}
diff --git a/samples/warpdrive/drv/hisi_qm_udrv.h b/samples/warpdrive/drv/hisi_qm_udrv.h
new file mode 100644
index 000000000000..6a7a06a089c9
--- /dev/null
+++ b/samples/warpdrive/drv/hisi_qm_udrv.h
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef __HZIP_DRV_H__
+#define __HZIP_DRV_H__
+
+#include <linux/types.h>
+#include "../wd.h"
+
+/* this is unnecessary big, the hardware should optimize it */
+struct hisi_qm_msg {
+ __u32 consumed;
+ __u32 produced;
+ __u32 comp_date_length;
+ __u32 dw3;
+ __u32 input_date_length;
+ __u32 lba_l;
+ __u32 lba_h;
+ __u32 dw7; /* ... */
+ __u32 dw8; /* ... */
+ __u32 dw9; /* ... */
+ __u32 dw10; /* ... */
+ __u32 priv_info;
+ __u32 dw12; /* ... */
+ __u32 tag;
+ __u32 dest_avail_out;
+ __u32 rsvd0;
+ __u32 comp_head_addr_l;
+ __u32 comp_head_addr_h;
+ __u32 source_addr_l;
+ __u32 source_addr_h;
+ __u32 dest_addr_l;
+ __u32 dest_addr_h;
+ __u32 stream_ctx_addr_l;
+ __u32 stream_ctx_addr_h;
+ __u32 cipher_key1_addr_l;
+ __u32 cipher_key1_addr_h;
+ __u32 cipher_key2_addr_l;
+ __u32 cipher_key2_addr_h;
+ __u32 rsvd1[4];
+};
+
+struct hisi_acc_qm_sqc {
+ __u16 sqn;
+};
+
+#define DOORBELL_CMD_SQ 0
+#define DOORBELL_CMD_CQ 1
+
+int hisi_qm_set_queue_dio(struct wd_queue *q);
+void hisi_qm_unset_queue_dio(struct wd_queue *q);
+int hisi_qm_add_to_dio_q(struct wd_queue *q, void *req);
+int hisi_qm_get_from_dio_q(struct wd_queue *q, void **resp);
+
+#endif
diff --git a/samples/warpdrive/test/Makefile.am b/samples/warpdrive/test/Makefile.am
new file mode 100644
index 000000000000..ad80e80a47d7
--- /dev/null
+++ b/samples/warpdrive/test/Makefile.am
@@ -0,0 +1,7 @@
+AM_CFLAGS=-Wall -O0 -fno-strict-aliasing
+
+bin_PROGRAMS=test_hisi_zip
+
+test_hisi_zip_SOURCES=test_hisi_zip.c
+
+test_hisi_zip_LDADD=../.libs/libwd.a
diff --git a/samples/warpdrive/test/comp_hw.h b/samples/warpdrive/test/comp_hw.h
new file mode 100644
index 000000000000..79328fd0c1a0
--- /dev/null
+++ b/samples/warpdrive/test/comp_hw.h
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * This file is shared bewteen user and kernel space Wrapdrive which is
+ * including algorithm attibutions that both user and driver are caring for
+ */
+
+#ifndef __VFIO_WDEV_COMP_H
+#define __VFIO_WDEV_COMP_H
+
+/* De-compressing algorithms' parameters */
+struct vfio_wdev_comp_param {
+ __u32 window_size;
+ __u32 comp_level;
+ __u32 mode;
+ __u32 alg;
+};
+
+/* WD defines all the De-compressing algorithm names here */
+#define VFIO_WDEV_ZLIB "zlib"
+#define VFIO_WDEV_GZIP "gzip"
+#define VFIO_WDEV_LZ4 "lz4"
+
+#endif
diff --git a/samples/warpdrive/test/test_hisi_zip.c b/samples/warpdrive/test/test_hisi_zip.c
new file mode 100644
index 000000000000..72dc2c712649
--- /dev/null
+++ b/samples/warpdrive/test/test_hisi_zip.c
@@ -0,0 +1,204 @@
+// SPDX-License-Identifier: GPL-2.0+
+#include <stdio.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/time.h>
+#include <unistd.h>
+#include "../wd.h"
+#include "comp_hw.h"
+#include "../drv/hisi_qm_udrv.h"
+
+#if defined(MSDOS) || defined(OS2) || defined(WIN32) || defined(__CYGWIN__)
+# include <fcntl.h>
+# include <io.h>
+# define SET_BINARY_MODE(file) setmode(fileno(file), O_BINARY)
+#else
+# define SET_BINARY_MODE(file)
+#endif
+
+#define PAGE_SHIFT 12
+#define PAGE_SIZE (1 << PAGE_SHIFT)
+
+#define ASIZE (8*512*4096) /*16MB*/
+
+#define SYS_ERR_COND(cond, msg) \
+do { \
+ if (cond) { \
+ perror(msg); \
+ exit(EXIT_FAILURE); \
+ } \
+} while (0)
+
+#define ZLIB 0
+#define GZIP 1
+
+#define CHUNK 65535
+
+
+int hizip_deflate(FILE *source, FILE *dest, int type)
+{
+ __u64 in, out;
+ struct wd_queue q;
+ struct hisi_qm_msg *msg, *recv_msg;
+ void *a, *b;
+ char *src, *dst;
+ int ret, total_len;
+ int output_num;
+ int fd, file_msize;
+
+ q.container = -1;
+ q.mdev_name = "22e09922-7a82-11e8-9cf6-d74cffa9e87b";
+ q.vfio_group_path = "/dev/vfio/7"; //fixme to the right path
+ q.iommu_ext_path = "/sys/class/spimdev/0000:75:00.0/device/params/iommu_type";
+ q.dmaflag_ext_path = "/sys/class/spimdev/0000:75:00.0/device/params/dma_flag";
+ q.device_api_path = "/sys/class/spimdev/0000:75:00.0/device/mdev_supported_types/hisi_zip-hisi_zip/device_api";
+ ret = wd_request_queue(&q);
+ SYS_ERR_COND(ret, "wd_request_queue");
+
+ fprintf(stderr, "pasid=%d, dma_flag=%d\n", q.pasid, q.dma_flag);
+ fd = fileno(source);
+ struct stat s;
+
+ if (fstat(fd, &s) < 0) {
+ close(fd);
+ perror("fd error\n");
+ return -1;
+ }
+ total_len = s.st_size;
+
+ if (!total_len) {
+ ret = -EINVAL;
+ SYS_ERR_COND(ret, "input file length zero");
+ }
+ if (total_len > 16*1024*1024) {
+ fputs("error, input file size too large(<16MB)!\n", stderr);
+ goto release_q;
+ }
+ file_msize = !(total_len%PAGE_SIZE) ? total_len :
+ (total_len/PAGE_SIZE+1)*PAGE_SIZE;
+ /* mmap file and DMA mapping */
+ a = mmap((void *)0x0, file_msize, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE, fd, 0);
+ if (!a) {
+ fputs("mmap file fail!\n", stderr);
+ goto release_q;
+ }
+ ret = wd_mem_share(&q, a, file_msize, 0);
+ if (ret) {
+ fprintf(stderr, "wd_mem_share dma a buf fail!err=%d\n", -errno);
+ goto unmap_file;
+ }
+ /* Allocate some space and setup a DMA mapping */
+ b = mmap((void *)0x0, ASIZE, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+ if (!b) {
+ fputs("mmap b fail!\n", stderr);
+ goto unshare_file;
+ }
+ memset(b, 0, ASIZE);
+ ret = wd_mem_share(&q, b, ASIZE, 0);
+ if (ret) {
+ fputs("wd_mem_share dma b buf fail!\n", stderr);
+ goto unmap_mem;
+ }
+ src = (char *)a;
+ dst = (char *)b;
+
+ msg = malloc(sizeof(*msg));
+ if (!msg) {
+ fputs("alloc msg fail!\n", stderr);
+ goto alloc_msg_fail;
+ }
+ memset((void *)msg, 0, sizeof(*msg));
+ msg->input_date_length = total_len;
+ if (type == ZLIB)
+ msg->dw9 = 2;
+ else
+ msg->dw9 = 3;
+ msg->dest_avail_out = 0x800000;
+
+ in = (__u64)src;
+ out = (__u64)dst;
+
+ msg->source_addr_l = in & 0xffffffff;
+ msg->source_addr_h = in >> 32;
+ msg->dest_addr_l = out & 0xffffffff;
+ msg->dest_addr_h = out >> 32;
+
+ ret = wd_send(&q, msg);
+ if (ret == -EBUSY) {
+ usleep(1);
+ goto recv_again;
+ }
+ SYS_ERR_COND(ret, "send fail!\n");
+recv_again:
+ ret = wd_recv(&q, (void **)&recv_msg);
+ if (ret == -EIO) {
+ fputs(" wd_recv fail!\n", stderr);
+ goto alloc_msg_fail;
+ /* synchronous mode, if get none, then get again */
+ } else if (ret == -EAGAIN)
+ goto recv_again;
+
+ output_num = recv_msg->produced;
+ /* add zlib compress head and write head + compressed date to a file */
+ char zip_head[2] = {0x78, 0x9c};
+
+ fwrite(zip_head, 1, 2, dest);
+ fwrite((char *)out, 1, output_num, dest);
+ fclose(dest);
+
+ free(msg);
+alloc_msg_fail:
+ wd_mem_unshare(&q, b, ASIZE);
+unmap_mem:
+ munmap(b, ASIZE);
+unshare_file:
+ wd_mem_unshare(&q, a, file_msize);
+unmap_file:
+ munmap(a, file_msize);
+release_q:
+ wd_release_queue(&q);
+
+ return ret;
+}
+
+int main(int argc, char *argv[])
+{
+ int alg_type = 0;
+
+ /* avoid end-of-line conversions */
+ SET_BINARY_MODE(stdin);
+ SET_BINARY_MODE(stdout);
+
+ if (!argv[1]) {
+ fputs("<<use ./test_hisi_zip -h get more details>>\n", stderr);
+ goto EXIT;
+ }
+
+ if (!strcmp(argv[1], "-z"))
+ alg_type = ZLIB;
+ else if (!strcmp(argv[1], "-g")) {
+ alg_type = GZIP;
+ } else if (!strcmp(argv[1], "-h")) {
+ fputs("[version]:1.0.2\n", stderr);
+ fputs("[usage]: ./test_hisi_zip [type] <src_file> dest_file\n",
+ stderr);
+ fputs(" [type]:\n", stderr);
+ fputs(" -z = zlib\n", stderr);
+ fputs(" -g = gzip\n", stderr);
+ fputs(" -h = usage\n", stderr);
+ fputs("Example:\n", stderr);
+ fputs("./test_hisi_zip -z < test.data > out.data\n", stderr);
+ goto EXIT;
+ } else {
+ fputs("Unknow option\n", stderr);
+ fputs("<<use ./test_comp_iommu -h get more details>>\n",
+ stderr);
+ goto EXIT;
+ }
+
+ hizip_deflate(stdin, stdout, alg_type);
+EXIT:
+ return EXIT_SUCCESS;
+}
diff --git a/samples/warpdrive/wd.c b/samples/warpdrive/wd.c
new file mode 100644
index 000000000000..e7855652374e
--- /dev/null
+++ b/samples/warpdrive/wd.c
@@ -0,0 +1,325 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "config.h"
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/queue.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <errno.h>
+#include <sys/mman.h>
+#include <string.h>
+#include <assert.h>
+#include <dirent.h>
+#include <sys/poll.h>
+#include "wd.h"
+#include "wd_adapter.h"
+
+#if (defined(HAVE_SVA) & HAVE_SVA)
+static int _wd_bind_process(struct wd_queue *q)
+{
+ struct bind_data {
+ struct vfio_iommu_type1_bind bind;
+ struct vfio_iommu_type1_bind_process data;
+ } wd_bind;
+ int ret;
+ __u32 flags = 0;
+
+ if (q->dma_flag & VFIO_SPIMDEV_DMA_MULTI_PROC_MAP)
+ flags = VFIO_IOMMU_BIND_PRIV;
+ else if (q->dma_flag & VFIO_SPIMDEV_DMA_SVM_NO_FAULT)
+ flags = VFIO_IOMMU_BIND_NOPF;
+
+ wd_bind.bind.flags = VFIO_IOMMU_BIND_PROCESS;
+ wd_bind.bind.argsz = sizeof(wd_bind);
+ wd_bind.data.flags = flags;
+ ret = ioctl(q->container, VFIO_IOMMU_BIND, &wd_bind);
+ if (ret)
+ return ret;
+ q->pasid = wd_bind.data.pasid;
+ return ret;
+}
+
+static int _wd_unbind_process(struct wd_queue *q)
+{
+ struct bind_data {
+ struct vfio_iommu_type1_bind bind;
+ struct vfio_iommu_type1_bind_process data;
+ } wd_bind;
+ __u32 flags = 0;
+
+ if (q->dma_flag & VFIO_SPIMDEV_DMA_MULTI_PROC_MAP)
+ flags = VFIO_IOMMU_BIND_PRIV;
+ else if (q->dma_flag & VFIO_SPIMDEV_DMA_SVM_NO_FAULT)
+ flags = VFIO_IOMMU_BIND_NOPF;
+
+ wd_bind.bind.flags = VFIO_IOMMU_BIND_PROCESS;
+ wd_bind.data.pasid = q->pasid;
+ wd_bind.data.flags = flags;
+ wd_bind.bind.argsz = sizeof(wd_bind);
+
+ return ioctl(q->container, VFIO_IOMMU_UNBIND, &wd_bind);
+}
+#endif
+
+int wd_request_queue(struct wd_queue *q)
+{
+ struct vfio_group_status group_status = {
+ .argsz = sizeof(group_status) };
+ int iommu_ext;
+ int ret;
+
+ if (!q->vfio_group_path ||
+ !q->device_api_path ||
+ !q->iommu_ext_path) {
+ WD_ERR("please set vfio_group_path,"
+ "device_api_path,and iommu_ext_path before call %s", __func__);
+ return -EINVAL;
+ }
+
+ q->hw_type_id = 0; /* this can be set according to the device api_version in the future */
+
+ q->group = open(q->vfio_group_path, O_RDWR);
+ if (q->group < 0) {
+ WD_ERR("open vfio group(%s) fail, errno=%d\n",
+ q->vfio_group_path, errno);
+ return -errno;
+ }
+
+ if (q->container <= 0) {
+ q->container = open("/dev/vfio/vfio", O_RDWR);
+ if (q->container < 0) {
+ WD_ERR("Create VFIO container fail!\n");
+ ret = -ENODEV;
+ goto err_with_group;
+ }
+ }
+
+ if (ioctl(q->container, VFIO_GET_API_VERSION) != VFIO_API_VERSION) {
+ WD_ERR("VFIO version check fail!\n");
+ ret = -EINVAL;
+ goto err_with_container;
+ }
+
+ q->dma_flag = _get_attr_int(q->dmaflag_ext_path);
+ if (q->dma_flag < 0) {
+ ret = q->dma_flag;
+ goto err_with_container;
+ }
+
+ iommu_ext = _get_attr_int(q->iommu_ext_path);
+ if (iommu_ext < 0) {
+ ret = iommu_ext;
+ goto err_with_container;
+ }
+ ret = ioctl(q->container, VFIO_CHECK_EXTENSION, iommu_ext);
+ if (!ret) {
+ WD_ERR("VFIO iommu check (%d) fail (%d)!\n", iommu_ext, ret);
+ goto err_with_container;
+ }
+
+ ret = _get_attr_str(q->device_api_path, q->hw_type);
+ if (ret)
+ goto err_with_container;
+
+ ret = ioctl(q->group, VFIO_GROUP_GET_STATUS, &group_status);
+ if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
+ WD_ERR("VFIO group is not viable\n");
+ goto err_with_container;
+ }
+
+ ret = ioctl(q->group, VFIO_GROUP_SET_CONTAINER, &q->container);
+ if (ret) {
+ WD_ERR("VFIO group fail on VFIO_GROUP_SET_CONTAINER\n");
+ goto err_with_container;
+ }
+
+ ret = ioctl(q->container, VFIO_SET_IOMMU, iommu_ext);
+ if (ret) {
+ WD_ERR("VFIO fail on VFIO_SET_IOMMU(%d)\n", iommu_ext);
+ goto err_with_container;
+ }
+
+ q->mdev = ioctl(q->group, VFIO_GROUP_GET_DEVICE_FD, q->mdev_name);
+ if (q->mdev < 0) {
+ WD_ERR("VFIO fail on VFIO_GROUP_GET_DEVICE_FD (%d)\n", q->mdev);
+ ret = q->mdev;
+ goto err_with_container;
+ }
+#if (defined(HAVE_SVA) & HAVE_SVA)
+ if (!(q->dma_flag & (VFIO_SPIMDEV_DMA_PHY | VFIO_SPIMDEV_DMA_SINGLE_PROC_MAP))) {
+ ret = _wd_bind_process(q);
+ if (ret) {
+ close(q->mdev);
+ WD_ERR("VFIO fails to bind process!\n");
+ goto err_with_container;
+
+ }
+ }
+#endif
+ ret = q->fd = ioctl(q->mdev, VFIO_SPIMDEV_CMD_GET_Q, (unsigned long)q->pasid);
+ if (ret < 0) {
+ WD_ERR("get queue fail,ret=%d\n", errno);
+ goto err_with_mdev;
+ }
+
+ ret = drv_open(q);
+ if (ret)
+ goto err_with_queue;
+
+ return 0;
+
+err_with_queue:
+ close(q->fd);
+err_with_mdev:
+ close(q->mdev);
+err_with_container:
+ close(q->container);
+err_with_group:
+ close(q->group);
+ return ret;
+}
+
+void wd_release_queue(struct wd_queue *q)
+{
+ drv_close(q);
+ close(q->fd);
+#if (defined(HAVE_SVA) & HAVE_SVA)
+ if (!(q->dma_flag & (VFIO_SPIMDEV_DMA_PHY | VFIO_SPIMDEV_DMA_SINGLE_PROC_MAP))) {
+ if (q->pasid <= 0) {
+ WD_ERR("Wd queue pasid ! pasid=%d\n", q->pasid);
+ return;
+ }
+ if (_wd_unbind_process(q)) {
+ WD_ERR("VFIO fails to unbind process!\n");
+ return;
+ }
+ }
+#endif
+ close(q->mdev);
+ close(q->container);
+ close(q->group);
+}
+
+int wd_send(struct wd_queue *q, void *req)
+{
+ return drv_send(q, req);
+}
+
+int wd_recv(struct wd_queue *q, void **resp)
+{
+ return drv_recv(q, resp);
+}
+
+static int wd_flush_and_wait(struct wd_queue *q, __u16 ms)
+{
+ struct pollfd fds[1];
+
+ wd_flush(q);
+ fds[0].fd = q->fd;
+ fds[0].events = POLLIN;
+ return poll(fds, 1, ms);
+}
+
+int wd_send_sync(struct wd_queue *q, void *req, __u16 ms)
+{
+ int ret;
+
+ while (1) {
+ ret = wd_send(q, req);
+ if (ret == -EBUSY) {
+ ret = wd_flush_and_wait(q, ms);
+ if (ret)
+ return ret;
+ } else
+ return ret;
+ }
+}
+
+int wd_recv_sync(struct wd_queue *q, void **resp, __u16 ms)
+{
+ int ret;
+
+ while (1) {
+ ret = wd_recv(q, resp);
+ if (ret == -EBUSY) {
+ ret = wd_flush_and_wait(q, ms);
+ if (ret)
+ return ret;
+ } else
+ return ret;
+ }
+}
+
+void wd_flush(struct wd_queue *q)
+{
+ drv_flush(q);
+}
+
+static int _wd_mem_share_type1(struct wd_queue *q, const void *addr,
+ size_t size, int flags)
+{
+ struct vfio_iommu_type1_dma_map dma_map;
+
+ if (q->dma_flag & VFIO_SPIMDEV_DMA_SVM_NO_FAULT)
+ return mlock(addr, size);
+
+#if (defined(HAVE_SVA) & HAVE_SVA)
+ else if ((q->dma_flag & VFIO_SPIMDEV_DMA_MULTI_PROC_MAP) &&
+ (q->pasid > 0))
+ dma_map.pasid = q->pasid;
+#endif
+ else if ((q->dma_flag & VFIO_SPIMDEV_DMA_SINGLE_PROC_MAP))
+ ; //todo
+ else
+ return -1;
+
+ dma_map.vaddr = (__u64)addr;
+ dma_map.size = size;
+ dma_map.iova = (__u64)addr;
+ dma_map.flags =
+ VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE | flags;
+ dma_map.argsz = sizeof(dma_map);
+
+ return ioctl(q->container, VFIO_IOMMU_MAP_DMA, &dma_map);
+}
+
+static void _wd_mem_unshare_type1(struct wd_queue *q, const void *addr,
+ size_t size)
+{
+#if (defined(HAVE_SVA) & HAVE_SVA)
+ struct vfio_iommu_type1_dma_unmap dma_unmap;
+#endif
+
+ if (q->dma_flag & VFIO_SPIMDEV_DMA_SVM_NO_FAULT) {
+ (void)munlock(addr, size);
+ return;
+ }
+
+#if (defined(HAVE_SVA) & HAVE_SVA)
+ dma_unmap.iova = (__u64)addr;
+ if ((q->dma_flag & VFIO_SPIMDEV_DMA_MULTI_PROC_MAP) && (q->pasid > 0))
+ dma_unmap.flags = 0;
+ dma_unmap.size = size;
+ dma_unmap.argsz = sizeof(dma_unmap);
+ ioctl(q->container, VFIO_IOMMU_UNMAP_DMA, &dma_unmap);
+#endif
+}
+
+int wd_mem_share(struct wd_queue *q, const void *addr, size_t size, int flags)
+{
+ if (drv_can_do_mem_share(q))
+ return drv_share(q, addr, size, flags);
+ else
+ return _wd_mem_share_type1(q, addr, size, flags);
+}
+
+void wd_mem_unshare(struct wd_queue *q, const void *addr, size_t size)
+{
+ if (drv_can_do_mem_share(q))
+ drv_unshare(q, addr, size);
+ else
+ _wd_mem_unshare_type1(q, addr, size);
+}
+
diff --git a/samples/warpdrive/wd.h b/samples/warpdrive/wd.h
new file mode 100644
index 000000000000..d9f3b2fa6c73
--- /dev/null
+++ b/samples/warpdrive/wd.h
@@ -0,0 +1,153 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef __WD_H
+#define __WD_H
+#include <stdlib.h>
+#include <errno.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <stdint.h>
+#include <unistd.h>
+#include "../../include/uapi/linux/vfio.h"
+#include "../../include/uapi/linux/vfio_spimdev.h"
+
+#define SYS_VAL_SIZE 16
+#define PATH_STR_SIZE 256
+#define WD_NAME_SIZE 64
+#define WD_MAX_MEMLIST_SZ 128
+
+
+#ifndef dma_addr_t
+#define dma_addr_t __u64
+#endif
+
+typedef int bool;
+
+#ifndef true
+#define true 1
+#endif
+
+#ifndef false
+#define false 0
+#endif
+
+/* the flags used by wd_capa->flags, the high 16bits are for algorithm
+ * and the low 16bits are for Framework
+ */
+#define WD_FLAGS_FW_PREFER_LOCAL_ZONE 1
+
+#define WD_FLAGS_FW_MASK 0x0000FFFF
+#ifndef WD_ERR
+#define WD_ERR(format, args...) fprintf(stderr, format, ##args)
+#endif
+
+/* Default page size should be 4k size */
+#define WDQ_MAP_REGION(region_index) ((region_index << 12) & 0xf000)
+#define WDQ_MAP_Q(q_index) ((q_index << 16) & 0xffff0000)
+
+static inline void wd_reg_write(void *reg_addr, uint32_t value)
+{
+ *((volatile uint32_t *)reg_addr) = value;
+}
+
+static inline uint32_t wd_reg_read(void *reg_addr)
+{
+ uint32_t temp;
+
+ temp = *((volatile uint32_t *)reg_addr);
+
+ return temp;
+}
+
+static inline int _get_attr_str(const char *path, char value[PATH_STR_SIZE])
+{
+ int fd, ret;
+
+ fd = open(path, O_RDONLY);
+ if (fd < 0) {
+ WD_ERR("open %s fail\n", path);
+ return fd;
+ }
+ memset(value, 0, PATH_STR_SIZE);
+ ret = read(fd, value, PATH_STR_SIZE);
+ if (ret > 0) {
+ close(fd);
+ return 0;
+ }
+ close(fd);
+
+ WD_ERR("read nothing from %s\n", path);
+ return -EINVAL;
+}
+
+static inline int _get_attr_int(const char *path)
+{
+ char value[PATH_STR_SIZE];
+ _get_attr_str(path, value);
+ return atoi(value);
+}
+
+/* Memory in accelerating message can be different */
+enum wd_addr_flags {
+ WD_AATTR_INVALID = 0,
+
+ /* Common user virtual memory */
+ _WD_AATTR_COM_VIRT = 1,
+
+ /* Physical address*/
+ _WD_AATTR_PHYS = 2,
+
+ /* I/O virtual address*/
+ _WD_AATTR_IOVA = 4,
+
+ /* SGL, user cares for */
+ WD_AATTR_SGL = 8,
+};
+
+#define WD_CAPA_PRIV_DATA_SIZE 64
+
+#define alloc_obj(objp) do { \
+ objp = malloc(sizeof(*objp)); \
+ memset(objp, 0, sizeof(*objp)); \
+}while(0)
+#define free_obj(objp) if (objp)free(objp)
+
+struct wd_queue {
+ const char *mdev_name;
+ char hw_type[PATH_STR_SIZE];
+ int hw_type_id;
+ int dma_flag;
+ void *priv; /* private data used by the drv layer */
+ int container;
+ int group;
+ int mdev;
+ int fd;
+ int pasid;
+ int iommu_type;
+ char *vfio_group_path;
+ char *iommu_ext_path;
+ char *dmaflag_ext_path;
+ char *device_api_path;
+};
+
+extern int wd_request_queue(struct wd_queue *q);
+extern void wd_release_queue(struct wd_queue *q);
+extern int wd_send(struct wd_queue *q, void *req);
+extern int wd_recv(struct wd_queue *q, void **resp);
+extern void wd_flush(struct wd_queue *q);
+extern int wd_send_sync(struct wd_queue *q, void *req, __u16 ms);
+extern int wd_recv_sync(struct wd_queue *q, void **resp, __u16 ms);
+extern int wd_mem_share(struct wd_queue *q, const void *addr,
+ size_t size, int flags);
+extern void wd_mem_unshare(struct wd_queue *q, const void *addr, size_t size);
+
+/* for debug only */
+extern int wd_dump_all_algos(void);
+
+/* this is only for drv used */
+extern int wd_set_queue_attr(struct wd_queue *q, const char *name,
+ char *value);
+extern int __iommu_type(struct wd_queue *q);
+#endif
diff --git a/samples/warpdrive/wd_adapter.c b/samples/warpdrive/wd_adapter.c
new file mode 100644
index 000000000000..d8d55b75e99a
--- /dev/null
+++ b/samples/warpdrive/wd_adapter.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stdio.h>
+#include <string.h>
+#include <dirent.h>
+
+
+#include "wd_adapter.h"
+#include "./drv/hisi_qm_udrv.h"
+
+static struct wd_drv_dio_if hw_dio_tbl[] = { {
+ .hw_type = "hisi_qm_v1",
+ .open = hisi_qm_set_queue_dio,
+ .close = hisi_qm_unset_queue_dio,
+ .send = hisi_qm_add_to_dio_q,
+ .recv = hisi_qm_get_from_dio_q,
+ },
+ /* Add other drivers direct IO operations here */
+};
+
+/* todo: there should be some stable way to match the device and the driver */
+#define MAX_HW_TYPE (sizeof(hw_dio_tbl) / sizeof(hw_dio_tbl[0]))
+
+int drv_open(struct wd_queue *q)
+{
+ int i;
+
+ //todo: try to find another dev if the user driver is not avaliable
+ for (i = 0; i < MAX_HW_TYPE; i++) {
+ if (!strcmp(q->hw_type,
+ hw_dio_tbl[i].hw_type)) {
+ q->hw_type_id = i;
+ return hw_dio_tbl[q->hw_type_id].open(q);
+ }
+ }
+ WD_ERR("No matching driver to use!\n");
+ errno = ENODEV;
+ return -ENODEV;
+}
+
+void drv_close(struct wd_queue *q)
+{
+ hw_dio_tbl[q->hw_type_id].close(q);
+}
+
+int drv_send(struct wd_queue *q, void *req)
+{
+ return hw_dio_tbl[q->hw_type_id].send(q, req);
+}
+
+int drv_recv(struct wd_queue *q, void **req)
+{
+ return hw_dio_tbl[q->hw_type_id].recv(q, req);
+}
+
+int drv_share(struct wd_queue *q, const void *addr, size_t size, int flags)
+{
+ return hw_dio_tbl[q->hw_type_id].share(q, addr, size, flags);
+}
+
+void drv_unshare(struct wd_queue *q, const void *addr, size_t size)
+{
+ hw_dio_tbl[q->hw_type_id].unshare(q, addr, size);
+}
+
+bool drv_can_do_mem_share(struct wd_queue *q)
+{
+ return hw_dio_tbl[q->hw_type_id].share != NULL;
+}
+
+void drv_flush(struct wd_queue *q)
+{
+ if (hw_dio_tbl[q->hw_type_id].flush)
+ hw_dio_tbl[q->hw_type_id].flush(q);
+}
diff --git a/samples/warpdrive/wd_adapter.h b/samples/warpdrive/wd_adapter.h
new file mode 100644
index 000000000000..bb3ab3ec112a
--- /dev/null
+++ b/samples/warpdrive/wd_adapter.h
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0
+/* the common drv header define the unified interface for wd */
+#ifndef __WD_ADAPTER_H__
+#define __WD_ADAPTER_H__
+
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+
+
+#include "wd.h"
+
+struct wd_drv_dio_if {
+ char *hw_type;
+ int (*open)(struct wd_queue *q);
+ void (*close)(struct wd_queue *q);
+ int (*set_pasid)(struct wd_queue *q);
+ int (*unset_pasid)(struct wd_queue *q);
+ int (*send)(struct wd_queue *q, void *req);
+ int (*recv)(struct wd_queue *q, void **req);
+ void (*flush)(struct wd_queue *q);
+ int (*share)(struct wd_queue *q, const void *addr,
+ size_t size, int flags);
+ int (*unshare)(struct wd_queue *q, const void *addr, size_t size);
+};
+
+extern int drv_open(struct wd_queue *q);
+extern void drv_close(struct wd_queue *q);
+extern int drv_send(struct wd_queue *q, void *req);
+extern int drv_recv(struct wd_queue *q, void **req);
+extern void drv_flush(struct wd_queue *q);
+extern int drv_share(struct wd_queue *q, const void *addr,
+ size_t size, int flags);
+extern void drv_unshare(struct wd_queue *q, const void *addr, size_t size);
+extern bool drv_can_do_mem_share(struct wd_queue *q);
+
+#endif
--
2.17.1

2018-08-01 16:23:52

by Randy Dunlap

[permalink] [raw]
Subject: Re: [RFC PATCH 3/7] vfio: add spimdev support

On 08/01/2018 03:22 AM, Kenneth Lee wrote:
> From: Kenneth Lee <[email protected]>
>
> SPIMDEV is "Share Parent IOMMU Mdev". It is a vfio-mdev. But differ from
> the general vfio-mdev:
>
> 1. It shares its parent's IOMMU.
> 2. There is no hardware resource attached to the mdev is created. The
> hardware resource (A `queue') is allocated only when the mdev is
> opened.
>
> Currently only the vfio type-1 driver is updated to make it to be aware
> of.
>
> Signed-off-by: Kenneth Lee <[email protected]>
> Signed-off-by: Zaibo Xu <[email protected]>
> Signed-off-by: Zhou Wang <[email protected]>
> ---
> drivers/vfio/Kconfig | 1 +
> drivers/vfio/Makefile | 1 +
> drivers/vfio/spimdev/Kconfig | 10 +
> drivers/vfio/spimdev/Makefile | 3 +
> drivers/vfio/spimdev/vfio_spimdev.c | 421 ++++++++++++++++++++++++++++
> drivers/vfio/vfio_iommu_type1.c | 136 ++++++++-
> include/linux/vfio_spimdev.h | 95 +++++++
> include/uapi/linux/vfio_spimdev.h | 28 ++
> 8 files changed, 689 insertions(+), 6 deletions(-)
> create mode 100644 drivers/vfio/spimdev/Kconfig
> create mode 100644 drivers/vfio/spimdev/Makefile
> create mode 100644 drivers/vfio/spimdev/vfio_spimdev.c
> create mode 100644 include/linux/vfio_spimdev.h
> create mode 100644 include/uapi/linux/vfio_spimdev.h
>
> diff --git a/drivers/vfio/spimdev/Kconfig b/drivers/vfio/spimdev/Kconfig
> new file mode 100644
> index 000000000000..1226301f9d0e
> --- /dev/null
> +++ b/drivers/vfio/spimdev/Kconfig
> @@ -0,0 +1,10 @@
> +# SPDX-License-Identifier: GPL-2.0
> +config VFIO_SPIMDEV
> + tristate "Support for Share Parent IOMMU MDEV"
> + depends on VFIO_MDEV_DEVICE
> + help
> + Support for VFIO Share Parent IOMMU MDEV, which enable the kernel to

enables

> + support for the light weight hardware accelerator framework, WrapDrive.

support the lightweight hardware accelerator framework, WrapDrive.

> +
> + To compile this as a module, choose M here: the module will be called
> + spimdev.


--
~Randy

2018-08-01 16:56:44

by Jerome Glisse

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Wed, Aug 01, 2018 at 06:22:14PM +0800, Kenneth Lee wrote:
> From: Kenneth Lee <liguozhu-C8/M+/[email protected]>
>
> WarpDrive is an accelerator framework to expose the hardware capabilities
> directly to the user space. It makes use of the exist vfio and vfio-mdev
> facilities. So the user application can send request and DMA to the
> hardware without interaction with the kernel. This remove the latency
> of syscall and context switch.
>
> The patchset contains documents for the detail. Please refer to it for more
> information.
>
> This patchset is intended to be used with Jean Philippe Brucker's SVA
> patch [1] (Which is also in RFC stage). But it is not mandatory. This
> patchset is tested in the latest mainline kernel without the SVA patches.
> So it support only one process for each accelerator.
>
> With SVA support, WarpDrive can support multi-process in the same
> accelerator device. We tested it in our SoC integrated Accelerator (board
> ID: D06, Chip ID: HIP08). A reference work tree can be found here: [2].

I have not fully inspected things nor do i know enough about
this Hisilicon ZIP accelerator to ascertain, but from glimpsing
at the code it seems that it is unsafe to use even with SVA due
to the doorbell. There is a comment talking about safetyness
in patch 7.

Exposing thing to userspace is always enticing, but if it is
a security risk then it should clearly say so and maybe a
kernel boot flag should be necessary to allow such device to
be use.


My more general question is do we want to grow VFIO to become
a more generic device driver API. This patchset adds a command
queue concept to it (i don't think it exist today but i have
not follow VFIO closely).

Why is that any better that existing driver model ? Where a
device create a device file (can be character device, block
device, ...). Such models also allow for direct hardware
access from userspace. For instance see the AMD KFD driver
inside drivers/gpu/drm/amd

So you can already do what you are doing with the Hisilicon
driver today without this new infrastructure. This only need
hardware that have command queue and doorbell like mechanisms.


Unlike mdev which unify a very high level concept, it seems
to me spimdev just introduce low level concept (namely command
queue) and i don't see the intrinsic value here.


Cheers,
J?r?me

2018-08-02 02:33:12

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

> From: Jerome Glisse
> Sent: Thursday, August 2, 2018 12:57 AM
>
> On Wed, Aug 01, 2018 at 06:22:14PM +0800, Kenneth Lee wrote:
> > From: Kenneth Lee <liguozhu-C8/M+/[email protected]>
> >
> > WarpDrive is an accelerator framework to expose the hardware
> capabilities
> > directly to the user space. It makes use of the exist vfio and vfio-mdev
> > facilities. So the user application can send request and DMA to the
> > hardware without interaction with the kernel. This remove the latency
> > of syscall and context switch.
> >
> > The patchset contains documents for the detail. Please refer to it for
> more
> > information.
> >
> > This patchset is intended to be used with Jean Philippe Brucker's SVA
> > patch [1] (Which is also in RFC stage). But it is not mandatory. This
> > patchset is tested in the latest mainline kernel without the SVA patches.
> > So it support only one process for each accelerator.
> >
> > With SVA support, WarpDrive can support multi-process in the same
> > accelerator device. We tested it in our SoC integrated Accelerator (board
> > ID: D06, Chip ID: HIP08). A reference work tree can be found here: [2].
>
> I have not fully inspected things nor do i know enough about
> this Hisilicon ZIP accelerator to ascertain, but from glimpsing
> at the code it seems that it is unsafe to use even with SVA due
> to the doorbell. There is a comment talking about safetyness
> in patch 7.
>
> Exposing thing to userspace is always enticing, but if it is
> a security risk then it should clearly say so and maybe a
> kernel boot flag should be necessary to allow such device to
> be use.
>
>
> My more general question is do we want to grow VFIO to become
> a more generic device driver API. This patchset adds a command
> queue concept to it (i don't think it exist today but i have
> not follow VFIO closely).
>
> Why is that any better that existing driver model ? Where a
> device create a device file (can be character device, block
> device, ...). Such models also allow for direct hardware
> access from userspace. For instance see the AMD KFD driver
> inside drivers/gpu/drm/amd

One motivation I guess, is that most accelerators lack of a
well-abstracted high level APIs similar to GPU side (e.g. OpenCL
clearly defines Shared Virtual Memory models). VFIO mdev
might be an alternative common interface to enable SVA usages
on various accelerators...

>
> So you can already do what you are doing with the Hisilicon
> driver today without this new infrastructure. This only need
> hardware that have command queue and doorbell like mechanisms.
>
>
> Unlike mdev which unify a very high level concept, it seems
> to me spimdev just introduce low level concept (namely command
> queue) and i don't see the intrinsic value here.
>
>
> Cheers,
> J?r?me
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

2018-08-02 02:59:33

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

> From: Kenneth Lee
> Sent: Wednesday, August 1, 2018 6:22 PM
>
> From: Kenneth Lee <[email protected]>
>
> WarpDrive is an accelerator framework to expose the hardware capabilities
> directly to the user space. It makes use of the exist vfio and vfio-mdev
> facilities. So the user application can send request and DMA to the
> hardware without interaction with the kernel. This remove the latency
> of syscall and context switch.
>
> The patchset contains documents for the detail. Please refer to it for more
> information.
>
> This patchset is intended to be used with Jean Philippe Brucker's SVA
> patch [1] (Which is also in RFC stage). But it is not mandatory. This
> patchset is tested in the latest mainline kernel without the SVA patches.
> So it support only one process for each accelerator.

If no sharing, then why not just assigning the whole parent device to
the process? IMO if SVA usage is the clear goal of your series, it
might be made clearly so then Jean's series is mandatory dependency...

>
> With SVA support, WarpDrive can support multi-process in the same
> accelerator device. We tested it in our SoC integrated Accelerator (board
> ID: D06, Chip ID: HIP08). A reference work tree can be found here: [2].
>
> We have noticed the IOMMU aware mdev RFC announced recently [3].
>
> The IOMMU aware mdev has similar idea but different intention comparing
> to
> WarpDrive. It intends to dedicate part of the hardware resource to a VM.

Not just to VM, though I/O Virtualization is in the name. You can assign
such mdev to either VMs, containers, or bare metal processes. It's just
a fully-isolated device from user space p.o.v.

> And the design is supposed to be used with Scalable I/O Virtualization.
> While spimdev is intended to share the hardware resource with a big
> amount
> of processes. It just requires the hardware supporting address
> translation per process (PCIE's PASID or ARM SMMU's substream ID).
>
> But we don't see serious confliction on both design. We believe they can be
> normalized as one.

yes there are something which can be shared, e.g. regarding to
the interface to IOMMU.

Conceptually I see them different mindset on device resource sharing:

WarpDrive more aims to provide a generic framework to enable SVA
usages on various accelerators, which lack of a well-abstracted user
API like OpenCL. SVA is a hardware capability - sort of exposing resources
composing ONE capability to user space through mdev framework. It is
not like a VF which naturally carries most capabilities as PF.

Intel Scalable I/O virtualization is a thorough design to partition the
device into minimal sharable copies (queue, queue pair, context),
while each copy carries most PF capabilities (including SVA) similar to
VF. Also with IOMMU scalable mode support, the copy can be
independently assigned to any client (process, container, VM, etc.)

Thanks
Kevin

2018-08-02 03:07:01

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 3/7] vfio: add spimdev support

On Wed, Aug 01, 2018 at 09:23:52AM -0700, Randy Dunlap wrote:
> Date: Wed, 1 Aug 2018 09:23:52 -0700
> From: Randy Dunlap <[email protected]>
> To: Kenneth Lee <[email protected]>, Jonathan Corbet <[email protected]>,
> Herbert Xu <[email protected]>, "David S . Miller"
> <[email protected]>, Joerg Roedel <[email protected]>, Alex Williamson
> <[email protected]>, Kenneth Lee <[email protected]>, Hao
> Fang <[email protected]>, Zhou Wang <[email protected]>, Zaibo Xu
> <[email protected]>, Philippe Ombredanne <[email protected]>, Greg
> Kroah-Hartman <[email protected]>, Thomas Gleixner
> <[email protected]>, [email protected],
> [email protected], [email protected],
> [email protected], [email protected],
> [email protected], Lu Baolu <[email protected]>,
> Sanjay Kumar <[email protected]>
> CC: [email protected]
> Subject: Re: [RFC PATCH 3/7] vfio: add spimdev support
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
> Thunderbird/52.9.1
> Message-ID: <[email protected]>
>
> On 08/01/2018 03:22 AM, Kenneth Lee wrote:
> > From: Kenneth Lee <[email protected]>
> >
> > SPIMDEV is "Share Parent IOMMU Mdev". It is a vfio-mdev. But differ from
> > the general vfio-mdev:
> >
> > 1. It shares its parent's IOMMU.
> > 2. There is no hardware resource attached to the mdev is created. The
> > hardware resource (A `queue') is allocated only when the mdev is
> > opened.
> >
> > Currently only the vfio type-1 driver is updated to make it to be aware
> > of.
> >
> > Signed-off-by: Kenneth Lee <[email protected]>
> > Signed-off-by: Zaibo Xu <[email protected]>
> > Signed-off-by: Zhou Wang <[email protected]>
> > ---
> > drivers/vfio/Kconfig | 1 +
> > drivers/vfio/Makefile | 1 +
> > drivers/vfio/spimdev/Kconfig | 10 +
> > drivers/vfio/spimdev/Makefile | 3 +
> > drivers/vfio/spimdev/vfio_spimdev.c | 421 ++++++++++++++++++++++++++++
> > drivers/vfio/vfio_iommu_type1.c | 136 ++++++++-
> > include/linux/vfio_spimdev.h | 95 +++++++
> > include/uapi/linux/vfio_spimdev.h | 28 ++
> > 8 files changed, 689 insertions(+), 6 deletions(-)
> > create mode 100644 drivers/vfio/spimdev/Kconfig
> > create mode 100644 drivers/vfio/spimdev/Makefile
> > create mode 100644 drivers/vfio/spimdev/vfio_spimdev.c
> > create mode 100644 include/linux/vfio_spimdev.h
> > create mode 100644 include/uapi/linux/vfio_spimdev.h
> >
> > diff --git a/drivers/vfio/spimdev/Kconfig b/drivers/vfio/spimdev/Kconfig
> > new file mode 100644
> > index 000000000000..1226301f9d0e
> > --- /dev/null
> > +++ b/drivers/vfio/spimdev/Kconfig
> > @@ -0,0 +1,10 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +config VFIO_SPIMDEV
> > + tristate "Support for Share Parent IOMMU MDEV"
> > + depends on VFIO_MDEV_DEVICE
> > + help
> > + Support for VFIO Share Parent IOMMU MDEV, which enable the kernel to
>
> enables
>
> > + support for the light weight hardware accelerator framework, WrapDrive.
>
> support the lightweight hardware accelerator framework, WrapDrive.
>
> > +
> > + To compile this as a module, choose M here: the module will be called
> > + spimdev.
>
>
> --
> ~Randy

Thanks, I will update it in next version.

--
-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

2018-08-02 03:14:38

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFC PATCH 1/7] vfio/spimdev: Add documents for WarpDrive framework

> From: Kenneth Lee
> Sent: Wednesday, August 1, 2018 6:22 PM
>
> From: Kenneth Lee <liguozhu-C8/M+/[email protected]>
>
> WarpDrive is a common user space accelerator framework. Its main
> component
> in Kernel is called spimdev, Share Parent IOMMU Mediated Device. It

Not sure whether "share parent IOMMU" is a good term here. better
stick to what capabity you bring to user space, instead of describing
internal trick...

> exposes
> the hardware capabilities to the user space via vfio-mdev. So processes in
> user land can obtain a "queue" by open the device and direct access the
> hardware MMIO space or do DMA operation via VFIO interface.
>
> WarpDrive is intended to be used with Jean Philippe Brucker's SVA patchset
> (it is still in RFC stage) to support multi-process. But This is not a must.
> Without the SVA patches, WarpDrive can still work for one process for
> every
> hardware device.
>
> This patch add detail documents for the framework.
>
> Signed-off-by: Kenneth Lee <liguozhu-C8/M+/[email protected]>
> ---
> Documentation/00-INDEX | 2 +
> Documentation/warpdrive/warpdrive.rst | 153 ++++++
> Documentation/warpdrive/wd-arch.svg | 732
> ++++++++++++++++++++++++++
> Documentation/warpdrive/wd.svg | 526 ++++++++++++++++++
> 4 files changed, 1413 insertions(+)
> create mode 100644 Documentation/warpdrive/warpdrive.rst
> create mode 100644 Documentation/warpdrive/wd-arch.svg
> create mode 100644 Documentation/warpdrive/wd.svg
>
> diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
> index 2754fe83f0d4..9959affab599 100644
> --- a/Documentation/00-INDEX
> +++ b/Documentation/00-INDEX
> @@ -410,6 +410,8 @@ vm/
> - directory with info on the Linux vm code.
> w1/
> - directory with documents regarding the 1-wire (w1) subsystem.
> +warpdrive/
> + - directory with documents about WarpDrive accelerator
> framework.
> watchdog/
> - how to auto-reboot Linux if it has "fallen and can't get up". ;-)
> wimax/
> diff --git a/Documentation/warpdrive/warpdrive.rst
> b/Documentation/warpdrive/warpdrive.rst
> new file mode 100644
> index 000000000000..3792b2780ea6
> --- /dev/null
> +++ b/Documentation/warpdrive/warpdrive.rst
> @@ -0,0 +1,153 @@
> +Introduction of WarpDrive
> +=========================
> +
> +*WarpDrive* is a general accelerator framework built on top of vfio.
> +It can be taken as a light weight virtual function, which you can use
> without
> +*SR-IOV* like facility and can be shared among multiple processes.
> +
> +It can be used as the quick channel for accelerators, network adaptors or
> +other hardware in user space. It can make some implementation simpler.
> E.g.
> +you can reuse most of the *netdev* driver and just share some ring buffer
> to
> +the user space driver for *DPDK* or *ODP*. Or you can combine the RSA
> +accelerator with the *netdev* in the user space as a Web reversed proxy,
> etc.
> +
> +The name *WarpDrive* is simply a cool and general name meaning the
> framework
> +makes the application faster. In kernel, the framework is called SPIMDEV,
> +namely "Share Parent IOMMU Mediated Device".
> +
> +
> +How does it work
> +================
> +
> +*WarpDrive* takes the Hardware Accelerator as a heterogeneous
> processor which
> +can share some load for the CPU:
> +
> +.. image:: wd.svg
> + :alt: This is a .svg image, if your browser cannot show it,
> + try to download and view it locally
> +
> +So it provides the capability to the user application to:
> +
> +1. Send request to the hardware
> +2. Share memory with the application and other accelerators
> +
> +These requirements can be fulfilled by VFIO if the accelerator can serve
> each
> +application with a separated Virtual Function. But a *SR-IOV* like VF (we
> will
> +call it *HVF* hereinafter) design is too heavy for the accelerator which
> +service thousands of processes.
> +
> +And the *HVF* is not good for the scenario that a device keep most of its
> +resource but share partial of the function to the user space. E.g. a *NIC*
> +works as a *netdev* but share some hardware queues to the user
> application to
> +send packets direct to the hardware.
> +
> +*VFIO-mdev* can solve some of the problem here. But *VFIO-mdev* has
> two problem:
> +
> +1. it cannot make use of its parent device's IOMMU.
> +2. it is assumed to be openned only once.
> +
> +So it will need some add-on for better resource control and let the VFIO
> +driver be aware of this.
> +
> +
> +Architecture
> +------------
> +
> +The full *WarpDrive* architecture is represented in the following class
> +diagram:
> +
> +.. image:: wd-arch.svg
> + :alt: This is a .svg image, if your browser cannot show it,
> + try to download and view it locally
> +
> +The idea is: when a device is probed, it can be registered to the general
> +framework, e.g. *netdev* or *crypto*, and the *SPIMDEV* at the same
> time.
> +
> +If *SPIMDEV* is registered. A *mdev* creation interface is created. Then
> the
> +system administrator can create a *mdev* in the user space and set its
> +parameters via its sysfs interfacev. But not like the other mdev
> +implementation, hardware resource will not be allocated until it is opened
> by
> +an application.
> +
> +With this strategy, the hardware resource can be easily scheduled among
> +multiple processes.
> +
> +
> +The user API
> +------------
> +
> +We adopt a polling style interface in the user space: ::
> +
> + int wd_request_queue(int container, struct wd_queue *q,
> + const char *mdev)
> + void wd_release_queue(struct wd_queue *q);
> +
> + int wd_send(struct wd_queue *q, void *req);
> + int wd_recv(struct wd_queue *q, void **req);
> + int wd_recv_sync(struct wd_queue *q, void **req);
> +
> +the ..._sync() interface is a wrapper to the non sync version. They wait on
> the
> +device until the queue become available.
> +
> +Memory can be done by VFIO DMA API. Or the following helper function
> can be
> +adopted: ::
> +
> + int wd_mem_share(struct wd_queue *q, const void *addr,
> + size_t size, int flags);
> + void wd_mem_unshare(struct wd_queue *q, const void *addr, size_t
> size);
> +
> +Todo: if the IOMMU support *ATS* or *SMMU* stall mode. mem share is
> not
> +necessary. This can be check with SPImdev sysfs interface.
> +
> +The user API is not mandatory. It is simply a suggestion and hint what the
> +kernel interface is supposed to support.
> +
> +
> +The user driver
> +---------------
> +
> +*WarpDrive* expose the hardware IO space to the user process (via
> *mmap*). So
> +it will require user driver for implementing the user API. The following API
> +is suggested for a user driver: ::
> +
> + int open(struct wd_queue *q);
> + int close(struct wd_queue *q);
> + int send(struct wd_queue *q, void *req);
> + int recv(struct wd_queue *q, void **req);
> +
> +These callback enable the communication between the user application
> and the
> +device. You will still need the hardware-depend algorithm driver to access
> the
> +algorithm functionality of the accelerator itself.
> +
> +
> +Multiple processes support
> +==========================
> +
> +In the latest mainline kernel (4.18) when this document is written.
> +Multi-process is not supported in VFIO yet.
> +
> +*JPB* has a patchset to enable this[2]_. We have tested it with our
> hardware
> +(which is known as *D06*). It works well. *WarpDrive* rely on them to
> support
> +multiple processes. If it is not enabled, *WarpDrive* can still work, but it
> +support only one process, which will share the same io map table with
> kernel
> +(but the user application cannot access the kernel address, So it is not
> going
> +to be a security problem)
> +
> +
> +Legacy Mode Support
> +===================
> +For the hardware on which IOMMU is not support, WarpDrive can run on
> *NOIOMMU*
> +mode.
> +
> +
> +References
> +==========
> +.. [1] Accroding to the comment in in mm/gup.c, The *gup* is only safe
> within
> + a syscall. Because it can only keep the physical memory in place
> + without making sure the VMA will always point to it. Maybe we should
> + raise the VM_PINNED patchset (see
> + https://lists.gt.net/linux/kernel/1931993) again to solve this problem.
> +.. [2] https://patchwork.kernel.org/patch/10394851/
> +.. [3] https://zhuanlan.zhihu.com/p/35489035
> +
> +.. vim: tw=78
> diff --git a/Documentation/warpdrive/wd-arch.svg
> b/Documentation/warpdrive/wd-arch.svg
> new file mode 100644
> index 000000000000..1b3d1817c4ba
> --- /dev/null
> +++ b/Documentation/warpdrive/wd-arch.svg
> @@ -0,0 +1,732 @@
> +<?xml version="1.0" encoding="UTF-8" standalone="no"?>
> +<!-- Created with Inkscape (http://www.inkscape.org/) -->
> +
> +<svg
> + xmlns:dc="http://purl.org/dc/elements/1.1/"
> + xmlns:cc="http://creativecommons.org/ns#"
> + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> + xmlns:svg="http://www.w3.org/2000/svg"
> + xmlns="http://www.w3.org/2000/svg"
> + xmlns:xlink="http://www.w3.org/1999/xlink"
> + xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
> + xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
> + width="210mm"
> + height="193mm"
> + viewBox="0 0 744.09449 683.85823"
> + id="svg2"
> + version="1.1"
> + inkscape:version="0.92.3 (2405546, 2018-03-11)"
> + sodipodi:docname="wd-arch.svg">
> + <defs
> + id="defs4">
> + <linearGradient
> + inkscape:collect="always"
> + id="linearGradient6830">
> + <stop
> + style="stop-color:#000000;stop-opacity:1;"
> + offset="0"
> + id="stop6832" />
> + <stop
> + style="stop-color:#000000;stop-opacity:0;"
> + offset="1"
> + id="stop6834" />
> + </linearGradient>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="translate(-89.949614,405.94594)" />
> + <linearGradient
> + inkscape:collect="always"
> + id="linearGradient5026">
> + <stop
> + style="stop-color:#f2f2f2;stop-opacity:1;"
> + offset="0"
> + id="stop5028" />
> + <stop
> + style="stop-color:#f2f2f2;stop-opacity:0;"
> + offset="1"
> + id="stop5030" />
> + </linearGradient>
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6" />
> + </filter>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-1"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="translate(175.77842,400.29111)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-0"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-9" />
> + </filter>
> + <marker
> + markerWidth="18.960653"
> + markerHeight="11.194658"
> + refX="9.4803267"
> + refY="5.5973287"
> + orient="auto"
> + id="marker4613">
> + <rect
> + y="-5.1589785"
> + x="5.8504119"
> + height="10.317957"
> + width="10.317957"
> + id="rect4212"
> + style="fill:#ffffff;stroke:#000000;stroke-width:0.69143367;stroke-
> miterlimit:4;stroke-dasharray:none"
> + transform="matrix(0.86111274,0.50841405,-
> 0.86111274,0.50841405,0,0)">
> + <title
> + id="title4262">generation</title>
> + </rect>
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3-9"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(1.2452511,0,0,0.98513016,-
> 190.95632,540.33156)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-5-8"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-3-9" />
> + </filter>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-1">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-9"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3-9-7"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(1.3742742,0,0,0.97786398,-
> 234.52617,654.63367)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-5-8-5"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-3-9-0" />
> + </filter>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-6">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-1"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3-9-4"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(1.3742912,0,0,2.0035845,-
> 468.34428,342.56603)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-5-8-54"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-3-9-7" />
> + </filter>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-1-8">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-9-6"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-1-8-8">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-9-6-9"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-0">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-93"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-0-2">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-93-6"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter5382"
> + x="-0.089695387"
> + width="1.1793908"
> + y="-0.10052069"
> + height="1.2010413">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="0.86758925"
> + id="feGaussianBlur5384" />
> + </filter>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient6830"
> + id="linearGradient6836"
> + x1="362.73923"
> + y1="700.04059"
> + x2="340.4751"
> + y2="678.25488"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="translate(-23.771026,-135.76835)" />
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-6-2">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-1-9"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + </defs>
> + <sodipodi:namedview
> + id="base"
> + pagecolor="#ffffff"
> + bordercolor="#666666"
> + borderopacity="1.0"
> + inkscape:pageopacity="0.0"
> + inkscape:pageshadow="2"
> + inkscape:zoom="0.98994949"
> + inkscape:cx="222.32868"
> + inkscape:cy="370.44492"
> + inkscape:document-units="px"
> + inkscape:current-layer="layer1"
> + showgrid="false"
> + inkscape:window-width="1916"
> + inkscape:window-height="1033"
> + inkscape:window-x="0"
> + inkscape:window-y="22"
> + inkscape:window-maximized="0"
> + fit-margin-right="0.3"
> + inkscape:snap-global="false" />
> + <metadata
> + id="metadata7">
> + <rdf:RDF>
> + <cc:Work
> + rdf:about="">
> + <dc:format>image/svg+xml</dc:format>
> + <dc:type
> + rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
> + <dc:title />
> + </cc:Work>
> + </rdf:RDF>
> + </metadata>
> + <g
> + inkscape:label="Layer 1"
> + inkscape:groupmode="layer"
> + id="layer1"
> + transform="translate(0,-368.50374)">
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-
> width:0.6465112;filter:url(#filter4169-3)"
> + id="rect4136-3-6"
> + width="101.07784"
> + height="31.998148"
> + x="283.01144"
> + y="588.80896" />
> + <rect
> + style="fill:url(#linearGradient5032);fill-
> opacity:1;stroke:#000000;stroke-width:0.6465112"
> + id="rect4136-2"
> + width="101.07784"
> + height="31.998148"
> + x="281.63498"
> + y="586.75739" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> linejoin:miter;stroke-opacity:1"
> + x="294.21747"
> + y="612.50073"
> + id="text4138-6"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1"
> + x="294.21747"
> + y="612.50073"
> + style="font-size:15px;line-height:1.25">WarpDrive</tspan></text>
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-
> width:0.6465112;filter:url(#filter4169-3-0)"
> + id="rect4136-3-6-3"
> + width="101.07784"
> + height="31.998148"
> + x="548.7395"
> + y="583.15417" />
> + <rect
> + style="fill:url(#linearGradient5032-1);fill-
> opacity:1;stroke:#000000;stroke-width:0.6465112"
> + id="rect4136-2-60"
> + width="101.07784"
> + height="31.998148"
> + x="547.36304"
> + y="581.1026" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> linejoin:miter;stroke-opacity:1"
> + x="557.83484"
> + y="602.32745"
> + id="text4138-6-6"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-2"
> + x="557.83484"
> + y="602.32745"
> + style="font-size:15px;line-height:1.25">user_driver</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> opacity:1;marker-end:url(#marker4613)"
> + d="m 547.36304,600.78954 -156.58203,0.0691"
> + id="path4855"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-
> width:0.6465112;filter:url(#filter4169-3-5-8)"
> + id="rect4136-3-6-5-7"
> + width="101.07784"
> + height="31.998148"
> + x="128.74678"
> + y="80.648842"
> + transform="matrix(1.2452511,0,0,0.98513016,113.15182,641.02594)"
> />
> + <rect
> + style="fill:url(#linearGradient5032-3-9);fill-
> opacity:1;stroke:#000000;stroke-width:0.71606314"
> + id="rect4136-2-6-3"
> + width="125.86729"
> + height="31.522341"
> + x="271.75983"
> + y="718.45435" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> linejoin:miter;stroke-opacity:1"
> + x="306.29599"
> + y="746.50073"
> + id="text4138-6-2-6"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-1"
> + x="306.29599"
> + y="746.50073"
> + style="font-size:15px;line-height:1.25">spimdev</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> opacity:1;marker-end:url(#marker4825-6-2)"
> + d="m 329.57309,619.72453 5.0373,97.14447"
> + id="path4661-3"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> opacity:1;marker-end:url(#marker4825-6-2-1)"
> + d="m 342.57219,830.63108 -5.67699,-79.2841"
> + id="path4661-3-4"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-
> width:0.6465112;filter:url(#filter4169-3-5-8-5)"
> + id="rect4136-3-6-5-7-3"
> + width="101.07784"
> + height="31.998148"
> + x="128.74678"
> + y="80.648842"
> + transform="matrix(1.3742742,0,0,0.97786398,101.09126,754.58534)"
> />
> + <rect
> + style="fill:url(#linearGradient5032-3-9-7);fill-
> opacity:1;stroke:#000000;stroke-width:0.74946606"
> + id="rect4136-2-6-3-6"
> + width="138.90866"
> + height="31.289837"
> + x="276.13297"
> + y="831.44263" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> linejoin:miter;stroke-opacity:1"
> + x="295.67819"
> + y="852.98224"
> + id="text4138-6-2-6-1"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-1-0"
> + x="295.67819"
> + y="852.98224"
> + style="font-size:15px;line-height:1.25">Device Driver</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> linejoin:miter;stroke-opacity:1"
> + x="349.31198"
> + y="829.46118"
> + id="text4138-6-2-6-1-6"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-1-0-3"
> + x="349.31198"
> + y="829.46118"
> + style="font-size:15px;line-height:1.25">*</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> linejoin:miter;stroke-opacity:1"
> + x="349.98282"
> + y="768.698"
> + id="text4138-6-2-6-1-6-2"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-1-0-3-0"
> + x="349.98282"
> + y="768.698"
> + style="font-size:15px;line-height:1.25">1</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> opacity:1;marker-end:url(#marker4825-6-2-6)"
> + d="m 568.1238,614.05402 0.51369,333.80219"
> + id="path4661-3-5"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-
> anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="371.8013"
> + y="664.62476"
> + id="text4138-6-2-6-1-6-2-5"><tspan
> + sodipodi:role="line"
> + x="371.8013"
> + y="664.62476"
> + id="tspan4274"
> + style="font-size:15px;line-
> height:1.25">&lt;&lt;vfio&gt;&gt;</tspan><tspan
> + sodipodi:role="line"
> + x="371.8013"
> + y="683.37476"
> + id="tspan4305"
> + style="font-size:15px;line-height:1.25">resource
> management</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> linejoin:miter;stroke-opacity:1"
> + x="389.92969"
> + y="587.44836"
> + id="text4138-6-2-6-1-6-2-56"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-1-0-3-0-9"
> + x="389.92969"
> + y="587.44836"
> + style="font-size:15px;line-height:1.25">1</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> linejoin:miter;stroke-opacity:1"
> + x="528.64813"
> + y="600.08429"
> + id="text4138-6-2-6-1-6-3"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-1-0-3-7"
> + x="528.64813"
> + y="600.08429"
> + style="font-size:15px;line-height:1.25">*</tspan></text>
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-
> width:0.6465112;filter:url(#filter4169-3-5-8-54)"
> + id="rect4136-3-6-5-7-4"
> + width="101.07784"
> + height="31.998148"
> + x="128.74678"
> + y="80.648842"
> + transform="matrix(1.3745874,0,0,1.8929066,-132.7754,556.04505)" />
> + <rect
> + style="fill:url(#linearGradient5032-3-9-4);fill-
> opacity:1;stroke:#000000;stroke-width:1.07280123"
> + id="rect4136-2-6-3-4"
> + width="138.91039"
> + height="64.111"
> + x="42.321312"
> + y="704.8371" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> linejoin:miter;stroke-opacity:1"
> + x="110.30745"
> + y="722.94025"
> + id="text4138-6-2-6-3"><tspan
> + sodipodi:role="line"
> + x="111.99202"
> + y="722.94025"
> + id="tspan4366"
> + style="font-size:15px;line-height:1.25;text-align:center;text-
> anchor:middle">other standard </tspan><tspan
> + sodipodi:role="line"
> + x="110.30745"
> + y="741.69025"
> + id="tspan4368"
> + style="font-size:15px;line-height:1.25;text-align:center;text-
> anchor:middle">framework</tspan><tspan
> + sodipodi:role="line"
> + x="110.30745"
> + y="760.44025"
> + style="font-size:15px;line-height:1.25;text-align:center;text-
> anchor:middle"
> + id="tspan6840">(crypto/nic/others)</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> opacity:1;marker-end:url(#marker4825-6-2-1-8)"
> + d="M 276.29661,849.04109 134.04449,771.90853"
> + id="path4661-3-4-8"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> linejoin:miter;stroke-opacity:1"
> + x="313.70813"
> + y="730.06366"
> + id="text4138-6-2-6-36"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-1-7"
> + x="313.70813"
> + y="730.06366"
> + style="font-size:10px;line-
> height:1.25">&lt;&lt;lkm&gt;&gt;</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;text-align:start;letter-spacing:0px;word-spacing:0px;text-
> anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-
> linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="343.81625"
> + y="786.44141"
> + id="text4138-6-2-6-1-6-2-5-7-5"><tspan
> + sodipodi:role="line"
> + x="343.81625"
> + y="786.44141"
> + style="font-size:15px;line-height:1.25;text-align:start;text-
> anchor:start"
> + id="tspan2278">regist<tspan
> + style="text-align:start;text-anchor:start"
> + id="tspan2280">er as mdev with &quot;share </tspan></tspan><tspan
> + sodipodi:role="line"
> + x="343.81625"
> + y="805.19141"
> + style="font-size:15px;line-height:1.25;text-align:start;text-
> anchor:start"
> + id="tspan2357"><tspan
> + style="text-align:start;text-anchor:start"
> + id="tspan2359">parent iommu&quot; attribu</tspan>te</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> linejoin:miter;stroke-opacity:1"
> + x="29.145819"
> + y="833.44244"
> + id="text4138-6-2-6-1-6-2-5-7-5-2"><tspan
> + sodipodi:role="line"
> + x="29.145819"
> + y="833.44244"
> + id="tspan4301"
> + style="font-size:15px;line-height:1.25">register to other
> subsystem</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> linejoin:miter;stroke-opacity:1"
> + x="301.20813"
> + y="597.29437"
> + id="text4138-6-2-6-36-1"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-1-7-2"
> + x="301.20813"
> + y="597.29437"
> + style="font-size:10px;line-
> height:1.25">&lt;&lt;user_lib&gt;&gt;</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-
> anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="649.09613"
> + y="774.4798"
> + id="text4138-6-2-6-1-6-2-5-3"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-1-0-3-0-4-6"
> + x="649.09613"
> + y="774.4798"
> + style="font-size:15px;line-
> height:1.25">&lt;&lt;vfio&gt;&gt;</tspan><tspan
> + sodipodi:role="line"
> + x="649.09613"
> + y="793.2298"
> + id="tspan4274-7"
> + style="font-size:15px;line-height:1.25">Hardware
> Accessing</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-
> anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="371.01291"
> + y="529.23682"
> + id="text4138-6-2-6-1-6-2-5-36"><tspan
> + sodipodi:role="line"
> + x="371.01291"
> + y="529.23682"
> + id="tspan4305-3"
> + style="font-size:15px;line-height:1.25">wd user api</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + d="m 328.19325,585.87943 0,-23.57142"
> + id="path4348"
> + inkscape:connector-curvature="0" />
> + <ellipse
> + style="opacity:1;fill:#ffffff;fill-opacity:1;fill-
> rule:evenodd;stroke:#000000;stroke-width:1;stroke-miterlimit:4;stroke-
> dasharray:none;stroke-dashoffset:0"
> + id="path4350"
> + cx="328.01468"
> + cy="551.95081"
> + rx="11.607142"
> + ry="10.357142" />
> + <path
> + style="opacity:0.444;fill:url(#linearGradient6836);fill-opacity:1;fill-
> rule:evenodd;stroke:none;stroke-width:1;stroke-miterlimit:4;stroke-
> dasharray:none;stroke-dashoffset:0;filter:url(#filter5382)"
> + id="path4350-2"
> + sodipodi:type="arc"
> + sodipodi:cx="329.44327"
> + sodipodi:cy="553.37933"
> + sodipodi:rx="11.607142"
> + sodipodi:ry="10.357142"
> + sodipodi:start="0"
> + sodipodi:end="6.2509098"
> + d="m 341.05041,553.37933 a 11.607142,10.357142 0 0 1 -
> 11.51349,10.35681 11.607142,10.357142 0 0 1 -11.69928,-10.18967
> 11.607142,10.357142 0 0 1 11.32469,-10.52124 11.607142,10.357142 0 0 1
> 11.88204,10.01988"
> + sodipodi:open="true" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-
> anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="543.91455"
> + y="978.22363"
> + id="text4138-6-2-6-1-6-2-5-36-3"><tspan
> + sodipodi:role="line"
> + x="543.91455"
> + y="978.22363"
> + id="tspan4305-3-67"
> + style="font-size:15px;line-
> height:1.25">Device(Hardware)</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> opacity:1;marker-end:url(#marker4825-6-2-6-2)"
> + d="m 347.51164,865.4527 153.19752,91.52439"
> + id="path4661-3-5-1"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:12px;line-
> height:0%;font-family:sans-serif;letter-spacing:0px;word-
> spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-
> linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="343.6398"
> + y="716.47754"
> + id="text4138-6-2-6-1-6-2-5-7-5-2-6"><tspan
> + sodipodi:role="line"
> + x="343.6398"
> + y="716.47754"
> + id="tspan4301-4"
> + style="font-style:italic;font-variant:normal;font-weight:normal;font-
> stretch:normal;font-size:15px;line-height:1.25;font-family:sans-serif;-
> inkscape-font-specification:'sans-serif Italic';stroke-width:1px">Share
> Parent's IOMMU mdev</tspan></text>
> + </g>
> +</svg>
> diff --git a/Documentation/warpdrive/wd.svg
> b/Documentation/warpdrive/wd.svg
> new file mode 100644
> index 000000000000..87ab92ebfbc6
> --- /dev/null
> +++ b/Documentation/warpdrive/wd.svg
> @@ -0,0 +1,526 @@
> +<?xml version="1.0" encoding="UTF-8" standalone="no"?>
> +<!-- Created with Inkscape (http://www.inkscape.org/) -->
> +
> +<svg
> + xmlns:dc="http://purl.org/dc/elements/1.1/"
> + xmlns:cc="http://creativecommons.org/ns#"
> + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> + xmlns:svg="http://www.w3.org/2000/svg"
> + xmlns="http://www.w3.org/2000/svg"
> + xmlns:xlink="http://www.w3.org/1999/xlink"
> + xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
> + xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
> + width="210mm"
> + height="116mm"
> + viewBox="0 0 744.09449 411.02338"
> + id="svg2"
> + version="1.1"
> + inkscape:version="0.92.3 (2405546, 2018-03-11)"
> + sodipodi:docname="wd.svg">
> + <defs
> + id="defs4">
> + <linearGradient
> + inkscape:collect="always"
> + id="linearGradient5026">
> + <stop
> + style="stop-color:#f2f2f2;stop-opacity:1;"
> + offset="0"
> + id="stop5028" />
> + <stop
> + style="stop-color:#f2f2f2;stop-opacity:0;"
> + offset="1"
> + id="stop5030" />
> + </linearGradient>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(2.7384117,0,0,0.91666329,-
> 952.8283,571.10143)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-5"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-3" />
> + </filter>
> + <marker
> + markerWidth="18.960653"
> + markerHeight="11.194658"
> + refX="9.4803267"
> + refY="5.5973287"
> + orient="auto"
> + id="marker4613">
> + <rect
> + y="-5.1589785"
> + x="5.8504119"
> + height="10.317957"
> + width="10.317957"
> + id="rect4212"
> + style="fill:#ffffff;stroke:#000000;stroke-width:0.69143367;stroke-
> miterlimit:4;stroke-dasharray:none"
> + transform="matrix(0.86111274,0.50841405,-
> 0.86111274,0.50841405,0,0)">
> + <title
> + id="title4262">generation</title>
> + </rect>
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3-9"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(1.2452511,0,0,0.98513016,-
> 190.95632,540.33156)" />
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-1">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-9"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-6">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-1"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-1-8">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-9-6"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-1-8-8">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-9-6-9"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-0">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-93"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-0-2">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-93-6"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-6-2">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-1-9"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3-8"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(1.0104674,0,0,1.0052679,-
> 218.642,661.15448)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-5-8"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-3-9" />
> + </filter>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3-8-2"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(2.1450559,0,0,1.0052679,-
> 521.97704,740.76422)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-5-8-5"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-3-9-1" />
> + </filter>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3-8-0"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> +
> gradientTransform="matrix(1.0104674,0,0,1.0052679,83.456748,660.20747
> )" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-5-8-6"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-3-9-2" />
> + </filter>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3-84"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(1.9884948,0,0,0.94903536,-
> 318.42665,564.37696)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-5-4"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-3-0" />
> + </filter>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-0-0">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-93-8"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-3">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-1"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844
> 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + </defs>
> + <sodipodi:namedview
> + id="base"
> + pagecolor="#ffffff"
> + bordercolor="#666666"
> + borderopacity="1.0"
> + inkscape:pageopacity="0.0"
> + inkscape:pageshadow="2"
> + inkscape:zoom="0.98994949"
> + inkscape:cx="457.47339"
> + inkscape:cy="250.14781"
> + inkscape:document-units="px"
> + inkscape:current-layer="layer1"
> + showgrid="false"
> + inkscape:window-width="1916"
> + inkscape:window-height="1033"
> + inkscape:window-x="0"
> + inkscape:window-y="22"
> + inkscape:window-maximized="0"
> + fit-margin-right="0.3" />
> + <metadata
> + id="metadata7">
> + <rdf:RDF>
> + <cc:Work
> + rdf:about="">
> + <dc:format>image/svg+xml</dc:format>
> + <dc:type
> + rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
> + <dc:title></dc:title>
> + </cc:Work>
> + </rdf:RDF>
> + </metadata>
> + <g
> + inkscape:label="Layer 1"
> + inkscape:groupmode="layer"
> + id="layer1"
> + transform="translate(0,-641.33861)">
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-
> width:0.6465112;filter:url(#filter4169-3-5)"
> + id="rect4136-3-6-5"
> + width="101.07784"
> + height="31.998148"
> + x="128.74678"
> + y="80.648842"
> + transform="matrix(2.7384116,0,0,0.91666328,-284.06895,664.79751)"
> />
> + <rect
> + style="fill:url(#linearGradient5032-3);fill-
> opacity:1;stroke:#000000;stroke-width:1.02430749"
> + id="rect4136-2-6"
> + width="276.79272"
> + height="29.331528"
> + x="64.723419"
> + y="736.84473" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-
> family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> linejoin:miter;stroke-opacity:1"
> + x="78.223282"
> + y="756.79803"
> + id="text4138-6-2"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9"
> + x="78.223282"
> + y="756.79803"
> + style="font-size:15px;line-height:1.25">user application (running by
> the CPU</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> opacity:1;marker-end:url(#marker4825-6)"
> + d="m 217.67507,876.6738 113.40331,45.0758"
> + id="path4661"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> opacity:1;marker-end:url(#marker4825-6-0)"
> + d="m 208.10197,767.69811 0.29362,76.03656"
> + id="path4661-6"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-
> width:0.6465112;filter:url(#filter4169-3-5-8)"
> + id="rect4136-3-6-5-3"
> + width="101.07784"
> + height="31.998148"
> + x="128.74678"
> + y="80.648842"
> + transform="matrix(1.0104673,0,0,1.0052679,28.128628,763.90722)" />
> + <rect
> + style="fill:url(#linearGradient5032-3-8);fill-
> opacity:1;stroke:#000000;stroke-width:0.65159565"
> + id="rect4136-2-6-6"
> + width="102.13586"
> + height="32.16671"
> + x="156.83217"
> + y="842.91852" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:12px;line-
> height:0%;font-family:sans-serif;letter-spacing:0px;word-
> spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-
> linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="188.58519"
> + y="864.47125"
> + id="text4138-6-2-8"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-0"
> + x="188.58519"
> + y="864.47125"
> + style="font-size:15px;line-height:1.25;stroke-
> width:1px">MMU</tspan></text>
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-
> width:0.6465112;filter:url(#filter4169-3-5-8-5)"
> + id="rect4136-3-6-5-3-1"
> + width="101.07784"
> + height="31.998148"
> + x="128.74678"
> + y="80.648842"
> + transform="matrix(2.1450556,0,0,1.0052679,1.87637,843.51696)" />
> + <rect
> + style="fill:url(#linearGradient5032-3-8-2);fill-
> opacity:1;stroke:#000000;stroke-width:0.94937181"
> + id="rect4136-2-6-6-0"
> + width="216.8176"
> + height="32.16671"
> + x="275.09283"
> + y="922.5282" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:12px;line-
> height:0%;font-family:sans-serif;letter-spacing:0px;word-
> spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-
> linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="347.81482"
> + y="943.23291"
> + id="text4138-6-2-8-8"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-0-5"
> + x="347.81482"
> + y="943.23291"
> + style="font-size:15px;line-height:1.25;stroke-
> width:1px">Memory</tspan></text>
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-
> width:0.6465112;filter:url(#filter4169-3-5-8-6)"
> + id="rect4136-3-6-5-3-5"
> + width="101.07784"
> + height="31.998148"
> + x="128.74678"
> + y="80.648842"
> + transform="matrix(1.0104673,0,0,1.0052679,330.22737,762.9602)" />
> + <rect
> + style="fill:url(#linearGradient5032-3-8-0);fill-
> opacity:1;stroke:#000000;stroke-width:0.65159565"
> + id="rect4136-2-6-6-8"
> + width="102.13586"
> + height="32.16671"
> + x="458.93091"
> + y="841.9715" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:12px;line-
> height:0%;font-family:sans-serif;letter-spacing:0px;word-
> spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-
> linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="490.68393"
> + y="863.52423"
> + id="text4138-6-2-8-6"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-0-2"
> + x="490.68393"
> + y="863.52423"
> + style="font-size:15px;line-height:1.25;stroke-
> width:1px">IOMMU</tspan></text>
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-
> width:0.6465112;filter:url(#filter4169-3-5-4)"
> + id="rect4136-3-6-5-6"
> + width="101.07784"
> + height="31.998148"
> + x="128.74678"
> + y="80.648842"
> + transform="matrix(1.9884947,0,0,0.94903537,167.19229,661.38193)"
> />
> + <rect
> + style="fill:url(#linearGradient5032-3-84);fill-
> opacity:1;stroke:#000000;stroke-width:0.88813609"
> + id="rect4136-2-6-2"
> + width="200.99274"
> + height="30.367374"
> + x="420.4675"
> + y="735.97351" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:12px;line-
> height:0%;font-family:sans-serif;letter-spacing:0px;word-
> spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-
> linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="441.95297"
> + y="755.9068"
> + id="text4138-6-2-9"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-9"
> + x="441.95297"
> + y="755.9068"
> + style="font-size:15px;line-height:1.25;stroke-width:1px">Hardware
> Accelerator</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> opacity:1;marker-end:url(#marker4825-6-0-0)"
> + d="m 508.2914,766.55885 0.29362,76.03656"
> + id="path4661-6-1"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> opacity:1;marker-end:url(#marker4825-6-3)"
> + d="M 499.70201,876.47297 361.38296,920.80258"
> + id="path4661-1"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + </g>
> +</svg>
> --
> 2.17.1

2018-08-02 03:17:03

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFC PATCH 2/7] iommu: Add share domain interface in iommu for spimdev

> From: Kenneth Lee
> Sent: Wednesday, August 1, 2018 6:22 PM
>
> From: Kenneth Lee <liguozhu-C8/M+/[email protected]>
>
> This patch add sharing interface for a iommu_group. The new interface:
>
> iommu_group_share_domain()
> iommu_group_unshare_domain()
>
> can be used by some virtual iommu_group (such as iommu_group for
> spimdev)
> to share their parent's iommu_group.
>
> When the domain of the group is shared, it cannot be changed before
> unshared. In the future, notification can be added if update is required.

Is it necessary or just asking VFIO to use parent domain directly?

>
> Signed-off-by: Kenneth Lee <liguozhu-C8/M+/[email protected]>
> ---
> drivers/iommu/iommu.c | 28 +++++++++++++++++++++++++++-
> include/linux/iommu.h | 2 ++
> 2 files changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 63b37563db7e..a832aafe660d 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -54,6 +54,9 @@ struct iommu_group {
> int id;
> struct iommu_domain *default_domain;
> struct iommu_domain *domain;
> + atomic_t domain_shared_ref; /* Number of user of current domain.
> + * The domain cannot be modified if ref >
> 0
> + */
> };
>
> struct group_device {
> @@ -353,6 +356,7 @@ struct iommu_group *iommu_group_alloc(void)
> return ERR_PTR(ret);
> }
> group->id = ret;
> + atomic_set(&group->domain_shared_ref, 0);
>
> ret = kobject_init_and_add(&group->kobj, &iommu_group_ktype,
> NULL, "%d", group->id);
> @@ -482,6 +486,25 @@ int iommu_group_set_name(struct iommu_group
> *group, const char *name)
> }
> EXPORT_SYMBOL_GPL(iommu_group_set_name);
>
> +struct iommu_domain *iommu_group_share_domain(struct
> iommu_group *group)
> +{
> + /* the domain can be shared only when the default domain is used
> */
> + /* todo: more shareable check */
> + if (group->domain != group->default_domain)
> + return ERR_PTR(-EINVAL);
> +
> + atomic_inc(&group->domain_shared_ref);
> + return group->domain;
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_share_domain);
> +
> +void iommu_group_unshare_domain(struct iommu_group *group)
> +{
> + atomic_dec(&group->domain_shared_ref);
> + WARN_ON(atomic_read(&group->domain_shared_ref) < 0);
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_unshare_domain);
> +
> static int iommu_group_create_direct_mappings(struct iommu_group
> *group,
> struct device *dev)
> {
> @@ -1401,7 +1424,8 @@ static int __iommu_attach_group(struct
> iommu_domain *domain,
> {
> int ret;
>
> - if (group->default_domain && group->domain != group-
> >default_domain)
> + if ((group->default_domain && group->domain != group-
> >default_domain) ||
> + atomic_read(&group->domain_shared_ref) > 0)
> return -EBUSY;
>
> ret = __iommu_group_for_each_dev(group, domain,
> @@ -1438,6 +1462,8 @@ static void __iommu_detach_group(struct
> iommu_domain *domain,
> {
> int ret;
>
> + WARN_ON(atomic_read(&group->domain_shared_ref) > 0);
> +
> if (!group->default_domain) {
> __iommu_group_for_each_dev(group, domain,
> iommu_group_do_detach_device);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 19938ee6eb31..278d60e3ec39 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -349,6 +349,8 @@ extern int iommu_domain_get_attr(struct
> iommu_domain *domain, enum iommu_attr,
> void *data);
> extern int iommu_domain_set_attr(struct iommu_domain *domain, enum
> iommu_attr,
> void *data);
> +extern struct iommu_domain *iommu_group_share_domain(struct
> iommu_group *group);
> +extern void iommu_group_unshare_domain(struct iommu_group *group);
>
> /* Window handling function prototypes */
> extern int iommu_domain_window_enable(struct iommu_domain
> *domain, u32 wnd_nr,
> --
> 2.17.1

2018-08-02 03:21:25

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFC PATCH 3/7] vfio: add spimdev support

> From: Kenneth Lee
> Sent: Wednesday, August 1, 2018 6:22 PM
>
> From: Kenneth Lee <[email protected]>
>
> SPIMDEV is "Share Parent IOMMU Mdev". It is a vfio-mdev. But differ from
> the general vfio-mdev:
>
> 1. It shares its parent's IOMMU.
> 2. There is no hardware resource attached to the mdev is created. The
> hardware resource (A `queue') is allocated only when the mdev is
> opened.

Alex has concern on doing so, as pointed out in:

https://www.spinics.net/lists/kvm/msg172652.html

resource allocation should be reserved at creation time.

>
> Currently only the vfio type-1 driver is updated to make it to be aware
> of.
>
> Signed-off-by: Kenneth Lee <[email protected]>
> Signed-off-by: Zaibo Xu <[email protected]>
> Signed-off-by: Zhou Wang <[email protected]>
> ---
> drivers/vfio/Kconfig | 1 +
> drivers/vfio/Makefile | 1 +
> drivers/vfio/spimdev/Kconfig | 10 +
> drivers/vfio/spimdev/Makefile | 3 +
> drivers/vfio/spimdev/vfio_spimdev.c | 421
> ++++++++++++++++++++++++++++
> drivers/vfio/vfio_iommu_type1.c | 136 ++++++++-
> include/linux/vfio_spimdev.h | 95 +++++++
> include/uapi/linux/vfio_spimdev.h | 28 ++
> 8 files changed, 689 insertions(+), 6 deletions(-)
> create mode 100644 drivers/vfio/spimdev/Kconfig
> create mode 100644 drivers/vfio/spimdev/Makefile
> create mode 100644 drivers/vfio/spimdev/vfio_spimdev.c
> create mode 100644 include/linux/vfio_spimdev.h
> create mode 100644 include/uapi/linux/vfio_spimdev.h
>
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index c84333eb5eb5..3719eba72ef1 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -47,4 +47,5 @@ menuconfig VFIO_NOIOMMU
> source "drivers/vfio/pci/Kconfig"
> source "drivers/vfio/platform/Kconfig"
> source "drivers/vfio/mdev/Kconfig"
> +source "drivers/vfio/spimdev/Kconfig"
> source "virt/lib/Kconfig"
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index de67c4725cce..28f3ef0cdce1 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -9,3 +9,4 @@ obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
> obj-$(CONFIG_VFIO_PCI) += pci/
> obj-$(CONFIG_VFIO_PLATFORM) += platform/
> obj-$(CONFIG_VFIO_MDEV) += mdev/
> +obj-$(CONFIG_VFIO_SPIMDEV) += spimdev/
> diff --git a/drivers/vfio/spimdev/Kconfig b/drivers/vfio/spimdev/Kconfig
> new file mode 100644
> index 000000000000..1226301f9d0e
> --- /dev/null
> +++ b/drivers/vfio/spimdev/Kconfig
> @@ -0,0 +1,10 @@
> +# SPDX-License-Identifier: GPL-2.0
> +config VFIO_SPIMDEV
> + tristate "Support for Share Parent IOMMU MDEV"
> + depends on VFIO_MDEV_DEVICE
> + help
> + Support for VFIO Share Parent IOMMU MDEV, which enable the
> kernel to
> + support for the light weight hardware accelerator framework,
> WrapDrive.
> +
> + To compile this as a module, choose M here: the module will be
> called
> + spimdev.
> diff --git a/drivers/vfio/spimdev/Makefile b/drivers/vfio/spimdev/Makefile
> new file mode 100644
> index 000000000000..d02fb69c37e4
> --- /dev/null
> +++ b/drivers/vfio/spimdev/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0
> +spimdev-y := spimdev.o
> +obj-$(CONFIG_VFIO_SPIMDEV) += vfio_spimdev.o
> diff --git a/drivers/vfio/spimdev/vfio_spimdev.c
> b/drivers/vfio/spimdev/vfio_spimdev.c
> new file mode 100644
> index 000000000000..1b6910c9d27d
> --- /dev/null
> +++ b/drivers/vfio/spimdev/vfio_spimdev.c
> @@ -0,0 +1,421 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +#include <linux/anon_inodes.h>
> +#include <linux/idr.h>
> +#include <linux/module.h>
> +#include <linux/poll.h>
> +#include <linux/vfio_spimdev.h>
> +
> +struct spimdev_mdev_state {
> + struct vfio_spimdev *spimdev;
> +};
> +
> +static struct class *spimdev_class;
> +static DEFINE_IDR(spimdev_idr);
> +
> +static int vfio_spimdev_dev_exist(struct device *dev, void *data)
> +{
> + return !strcmp(dev_name(dev), dev_name((struct device *)data));
> +}
> +
> +#ifdef CONFIG_IOMMU_SVA
> +static bool vfio_spimdev_is_valid_pasid(int pasid)
> +{
> + struct mm_struct *mm;
> +
> + mm = iommu_sva_find(pasid);
> + if (mm) {
> + mmput(mm);
> + return mm == current->mm;
> + }
> +
> + return false;
> +}
> +#endif
> +
> +/* Check if the device is a mediated device belongs to vfio_spimdev */
> +int vfio_spimdev_is_spimdev(struct device *dev)
> +{
> + struct mdev_device *mdev;
> + struct device *pdev;
> +
> + mdev = mdev_from_dev(dev);
> + if (!mdev)
> + return 0;
> +
> + pdev = mdev_parent_dev(mdev);
> + if (!pdev)
> + return 0;
> +
> + return class_for_each_device(spimdev_class, NULL, pdev,
> + vfio_spimdev_dev_exist);
> +}
> +EXPORT_SYMBOL_GPL(vfio_spimdev_is_spimdev);
> +
> +struct vfio_spimdev *vfio_spimdev_pdev_spimdev(struct device *dev)
> +{
> + struct device *class_dev;
> +
> + if (!dev)
> + return ERR_PTR(-EINVAL);
> +
> + class_dev = class_find_device(spimdev_class, NULL, dev,
> + (int(*)(struct device *, const void
> *))vfio_spimdev_dev_exist);
> + if (!class_dev)
> + return ERR_PTR(-ENODEV);
> +
> + return container_of(class_dev, struct vfio_spimdev, cls_dev);
> +}
> +EXPORT_SYMBOL_GPL(vfio_spimdev_pdev_spimdev);
> +
> +struct vfio_spimdev *mdev_spimdev(struct mdev_device *mdev)
> +{
> + struct device *pdev = mdev_parent_dev(mdev);
> +
> + return vfio_spimdev_pdev_spimdev(pdev);
> +}
> +EXPORT_SYMBOL_GPL(mdev_spimdev);
> +
> +static ssize_t iommu_type_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct vfio_spimdev *spimdev = vfio_spimdev_pdev_spimdev(dev);
> +
> + if (!spimdev)
> + return -ENODEV;
> +
> + return sprintf(buf, "%d\n", spimdev->iommu_type);
> +}
> +
> +static DEVICE_ATTR_RO(iommu_type);
> +
> +static ssize_t dma_flag_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct vfio_spimdev *spimdev = vfio_spimdev_pdev_spimdev(dev);
> +
> + if (!spimdev)
> + return -ENODEV;
> +
> + return sprintf(buf, "%d\n", spimdev->dma_flag);
> +}
> +
> +static DEVICE_ATTR_RO(dma_flag);
> +
> +/* mdev->dev_attr_groups */
> +static struct attribute *vfio_spimdev_attrs[] = {
> + &dev_attr_iommu_type.attr,
> + &dev_attr_dma_flag.attr,
> + NULL,
> +};
> +static const struct attribute_group vfio_spimdev_group = {
> + .name = VFIO_SPIMDEV_PDEV_ATTRS_GRP_NAME,
> + .attrs = vfio_spimdev_attrs,
> +};
> +const struct attribute_group *vfio_spimdev_groups[] = {
> + &vfio_spimdev_group,
> + NULL,
> +};
> +
> +/* default attributes for mdev->supported_type_groups, used by
> registerer*/
> +#define MDEV_TYPE_ATTR_RO_EXPORT(name) \
> + MDEV_TYPE_ATTR_RO(name); \
> + EXPORT_SYMBOL_GPL(mdev_type_attr_##name);
> +
> +#define DEF_SIMPLE_SPIMDEV_ATTR(_name, spimdev_member, format)
> \
> +static ssize_t _name##_show(struct kobject *kobj, struct device *dev, \
> + char *buf) \
> +{ \
> + struct vfio_spimdev *spimdev = vfio_spimdev_pdev_spimdev(dev);
> \
> + if (!spimdev) \
> + return -ENODEV; \
> + return sprintf(buf, format, spimdev->spimdev_member); \
> +} \
> +MDEV_TYPE_ATTR_RO_EXPORT(_name)
> +
> +DEF_SIMPLE_SPIMDEV_ATTR(flags, flags, "%d");
> +DEF_SIMPLE_SPIMDEV_ATTR(name, name, "%s"); /* this should be
> algorithm name, */
> + /* but you would not care if you have only one algorithm */
> +DEF_SIMPLE_SPIMDEV_ATTR(device_api, api_ver, "%s");
> +
> +/* this return total queue left, not mdev left */
> +static ssize_t
> +available_instances_show(struct kobject *kobj, struct device *dev, char
> *buf)
> +{
> + struct vfio_spimdev *spimdev = vfio_spimdev_pdev_spimdev(dev);
> +
> + return sprintf(buf, "%d",
> + spimdev->ops->get_available_instances(spimdev));
> +}
> +MDEV_TYPE_ATTR_RO_EXPORT(available_instances);
> +
> +static int vfio_spimdev_mdev_create(struct kobject *kobj,
> + struct mdev_device *mdev)
> +{
> + struct device *dev = mdev_dev(mdev);
> + struct device *pdev = mdev_parent_dev(mdev);
> + struct spimdev_mdev_state *mdev_state;
> + struct vfio_spimdev *spimdev = mdev_spimdev(mdev);
> +
> + if (!spimdev->ops->get_queue)
> + return -ENODEV;
> +
> + mdev_state = devm_kzalloc(dev, sizeof(struct
> spimdev_mdev_state),
> + GFP_KERNEL);
> + if (!mdev_state)
> + return -ENOMEM;
> + mdev_set_drvdata(mdev, mdev_state);
> + mdev_state->spimdev = spimdev;
> + dev->iommu_fwspec = pdev->iommu_fwspec;
> + get_device(pdev);
> + __module_get(spimdev->owner);
> +
> + return 0;
> +}
> +
> +static int vfio_spimdev_mdev_remove(struct mdev_device *mdev)
> +{
> + struct device *dev = mdev_dev(mdev);
> + struct device *pdev = mdev_parent_dev(mdev);
> + struct vfio_spimdev *spimdev = mdev_spimdev(mdev);
> +
> + put_device(pdev);
> + module_put(spimdev->owner);
> + dev->iommu_fwspec = NULL;
> + mdev_set_drvdata(mdev, NULL);
> +
> + return 0;
> +}
> +
> +/* Wake up the process who is waiting this queue */
> +void vfio_spimdev_wake_up(struct vfio_spimdev_queue *q)
> +{
> + wake_up(&q->wait);
> +}
> +EXPORT_SYMBOL_GPL(vfio_spimdev_wake_up);
> +
> +static int vfio_spimdev_q_file_open(struct inode *inode, struct file *file)
> +{
> + return 0;
> +}
> +
> +static int vfio_spimdev_q_file_release(struct inode *inode, struct file *file)
> +{
> + struct vfio_spimdev_queue *q =
> + (struct vfio_spimdev_queue *)file->private_data;
> + struct vfio_spimdev *spimdev = q->spimdev;
> + int ret;
> +
> + ret = spimdev->ops->put_queue(q);
> + if (ret) {
> + dev_err(spimdev->dev, "drv put queue fail (%d)!\n", ret);
> + return ret;
> + }
> +
> + put_device(mdev_dev(q->mdev));
> +
> + return 0;
> +}
> +
> +static long vfio_spimdev_q_file_ioctl(struct file *file, unsigned int cmd,
> + unsigned long arg)
> +{
> + struct vfio_spimdev_queue *q =
> + (struct vfio_spimdev_queue *)file->private_data;
> + struct vfio_spimdev *spimdev = q->spimdev;
> +
> + if (spimdev->ops->ioctl)
> + return spimdev->ops->ioctl(q, cmd, arg);
> +
> + dev_err(spimdev->dev, "ioctl cmd (%d) is not supported!\n", cmd);
> +
> + return -EINVAL;
> +}
> +
> +static int vfio_spimdev_q_file_mmap(struct file *file,
> + struct vm_area_struct *vma)
> +{
> + struct vfio_spimdev_queue *q =
> + (struct vfio_spimdev_queue *)file->private_data;
> + struct vfio_spimdev *spimdev = q->spimdev;
> +
> + if (spimdev->ops->mmap)
> + return spimdev->ops->mmap(q, vma);
> +
> + dev_err(spimdev->dev, "no driver mmap!\n");
> + return -EINVAL;
> +}
> +
> +static __poll_t vfio_spimdev_q_file_poll(struct file *file, poll_table *wait)
> +{
> + struct vfio_spimdev_queue *q =
> + (struct vfio_spimdev_queue *)file->private_data;
> + struct vfio_spimdev *spimdev = q->spimdev;
> +
> + poll_wait(file, &q->wait, wait);
> + if (spimdev->ops->is_q_updated(q))
> + return EPOLLIN | EPOLLRDNORM;
> +
> + return 0;
> +}
> +
> +static const struct file_operations spimdev_q_file_ops = {
> + .owner = THIS_MODULE,
> + .open = vfio_spimdev_q_file_open,
> + .unlocked_ioctl = vfio_spimdev_q_file_ioctl,
> + .release = vfio_spimdev_q_file_release,
> + .poll = vfio_spimdev_q_file_poll,
> + .mmap = vfio_spimdev_q_file_mmap,
> +};
> +
> +static long vfio_spimdev_mdev_get_queue(struct mdev_device *mdev,
> + struct vfio_spimdev *spimdev, unsigned long arg)
> +{
> + struct vfio_spimdev_queue *q;
> + int ret;
> +
> +#ifdef CONFIG_IOMMU_SVA
> + int pasid = arg;
> +
> + if (!vfio_spimdev_is_valid_pasid(pasid))
> + return -EINVAL;
> +#endif
> +
> + if (!spimdev->ops->get_queue)
> + return -EINVAL;
> +
> + ret = spimdev->ops->get_queue(spimdev, arg, &q);
> + if (ret < 0) {
> + dev_err(spimdev->dev, "get_queue failed\n");
> + return -ENODEV;
> + }
> +
> + ret = anon_inode_getfd("spimdev_q", &spimdev_q_file_ops,
> + q, O_CLOEXEC | O_RDWR);
> + if (ret < 0) {
> + dev_err(spimdev->dev, "getfd fail %d\n", ret);
> + goto err_with_queue;
> + }
> +
> + q->fd = ret;
> + q->spimdev = spimdev;
> + q->mdev = mdev;
> + q->container = arg;
> + init_waitqueue_head(&q->wait);
> + get_device(mdev_dev(mdev));
> +
> + return ret;
> +
> +err_with_queue:
> + spimdev->ops->put_queue(q);
> + return ret;
> +}
> +
> +static long vfio_spimdev_mdev_ioctl(struct mdev_device *mdev, unsigned
> int cmd,
> + unsigned long arg)
> +{
> + struct spimdev_mdev_state *mdev_state;
> + struct vfio_spimdev *spimdev;
> +
> + if (!mdev)
> + return -ENODEV;
> +
> + mdev_state = mdev_get_drvdata(mdev);
> + if (!mdev_state)
> + return -ENODEV;
> +
> + spimdev = mdev_state->spimdev;
> + if (!spimdev)
> + return -ENODEV;
> +
> + if (cmd == VFIO_SPIMDEV_CMD_GET_Q)
> + return vfio_spimdev_mdev_get_queue(mdev, spimdev, arg);
> +
> + dev_err(spimdev->dev,
> + "%s, ioctl cmd (0x%x) is not supported!\n", __func__, cmd);
> + return -EINVAL;
> +}
> +
> +static void vfio_spimdev_release(struct device *dev) { }
> +static void vfio_spimdev_mdev_release(struct mdev_device *mdev) { }
> +static int vfio_spimdev_mdev_open(struct mdev_device *mdev) { return
> 0; }
> +
> +/**
> + * vfio_spimdev_register - register a spimdev
> + * @spimdev: device structure
> + */
> +int vfio_spimdev_register(struct vfio_spimdev *spimdev)
> +{
> + int ret;
> + const char *drv_name;
> +
> + if (!spimdev->dev)
> + return -ENODEV;
> +
> + drv_name = dev_driver_string(spimdev->dev);
> + if (strstr(drv_name, "-")) {
> + pr_err("spimdev: parent driver name cannot include '-'!\n");
> + return -EINVAL;
> + }
> +
> + spimdev->dev_id = idr_alloc(&spimdev_idr, spimdev, 0, 0,
> GFP_KERNEL);
> + if (spimdev->dev_id < 0)
> + return spimdev->dev_id;
> +
> + atomic_set(&spimdev->ref, 0);
> + spimdev->cls_dev.parent = spimdev->dev;
> + spimdev->cls_dev.class = spimdev_class;
> + spimdev->cls_dev.release = vfio_spimdev_release;
> + dev_set_name(&spimdev->cls_dev, "%s", dev_name(spimdev-
> >dev));
> + ret = device_register(&spimdev->cls_dev);
> + if (ret)
> + return ret;
> +
> + spimdev->mdev_fops.owner = spimdev->owner;
> + spimdev->mdev_fops.dev_attr_groups =
> vfio_spimdev_groups;
> + WARN_ON(!spimdev->mdev_fops.supported_type_groups);
> + spimdev->mdev_fops.create =
> vfio_spimdev_mdev_create;
> + spimdev->mdev_fops.remove =
> vfio_spimdev_mdev_remove;
> + spimdev->mdev_fops.ioctl = vfio_spimdev_mdev_ioctl;
> + spimdev->mdev_fops.open =
> vfio_spimdev_mdev_open;
> + spimdev->mdev_fops.release =
> vfio_spimdev_mdev_release;
> +
> + ret = mdev_register_device(spimdev->dev, &spimdev->mdev_fops);
> + if (ret)
> + device_unregister(&spimdev->cls_dev);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(vfio_spimdev_register);
> +
> +/**
> + * vfio_spimdev_unregister - unregisters a spimdev
> + * @spimdev: device to unregister
> + *
> + * Unregister a miscellaneous device that wat previously successully
> registered
> + * with vfio_spimdev_register().
> + */
> +void vfio_spimdev_unregister(struct vfio_spimdev *spimdev)
> +{
> + mdev_unregister_device(spimdev->dev);
> + device_unregister(&spimdev->cls_dev);
> +}
> +EXPORT_SYMBOL_GPL(vfio_spimdev_unregister);
> +
> +static int __init vfio_spimdev_init(void)
> +{
> + spimdev_class = class_create(THIS_MODULE,
> VFIO_SPIMDEV_CLASS_NAME);
> + return PTR_ERR_OR_ZERO(spimdev_class);
> +}
> +
> +static __exit void vfio_spimdev_exit(void)
> +{
> + class_destroy(spimdev_class);
> + idr_destroy(&spimdev_idr);
> +}
> +
> +module_init(vfio_spimdev_init);
> +module_exit(vfio_spimdev_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Hisilicon Tech. Co., Ltd.");
> +MODULE_DESCRIPTION("VFIO Share Parent's IOMMU Mediated Device");
> diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c
> index 3e5b17710a4f..0ec38a17c98c 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -41,6 +41,7 @@
> #include <linux/notifier.h>
> #include <linux/dma-iommu.h>
> #include <linux/irqdomain.h>
> +#include <linux/vfio_spimdev.h>
>
> #define DRIVER_VERSION "0.2"
> #define DRIVER_AUTHOR "Alex Williamson
> <[email protected]>"
> @@ -89,6 +90,8 @@ struct vfio_dma {
> };
>
> struct vfio_group {
> + /* iommu_group of mdev's parent device */
> + struct iommu_group *parent_group;
> struct iommu_group *iommu_group;
> struct list_head next;
> };
> @@ -1327,6 +1330,109 @@ static bool vfio_iommu_has_sw_msi(struct
> iommu_group *group, phys_addr_t *base)
> return ret;
> }
>
> +/* return 0 if the device is not spimdev.
> + * return 1 if the device is spimdev, the data will be updated with parent
> + * device's group.
> + * return -errno if other error.
> + */
> +static int vfio_spimdev_type(struct device *dev, void *data)
> +{
> + struct iommu_group **group = data;
> + struct iommu_group *pgroup;
> + int (*spimdev_mdev)(struct device *dev);
> + struct device *pdev;
> + int ret = 1;
> +
> + /* vfio_spimdev module is not configurated */
> + spimdev_mdev = symbol_get(vfio_spimdev_is_spimdev);
> + if (!spimdev_mdev)
> + return 0;
> +
> + /* check if it belongs to vfio_spimdev device */
> + if (!spimdev_mdev(dev)) {
> + ret = 0;
> + goto get_exit;
> + }
> +
> + pdev = dev->parent;
> + pgroup = iommu_group_get(pdev);
> + if (!pgroup) {
> + ret = -ENODEV;
> + goto get_exit;
> + }
> +
> + if (group) {
> + /* check if all parent devices is the same */
> + if (*group && *group != pgroup)
> + ret = -ENODEV;
> + else
> + *group = pgroup;
> + }
> +
> + iommu_group_put(pgroup);
> +
> +get_exit:
> + symbol_put(vfio_spimdev_is_spimdev);
> +
> + return ret;
> +}
> +
> +/* return 0 or -errno */
> +static int vfio_spimdev_bus(struct device *dev, void *data)
> +{
> + struct bus_type **bus = data;
> +
> + if (!dev->bus)
> + return -ENODEV;
> +
> + /* ensure all devices has the same bus_type */
> + if (*bus && *bus != dev->bus)
> + return -EINVAL;
> +
> + *bus = dev->bus;
> + return 0;
> +}
> +
> +/* return 0 means it is not spi group, 1 means it is, or -EXXX for error */
> +static int vfio_iommu_type1_attach_spigroup(struct vfio_domain *domain,
> + struct vfio_group *group,
> + struct iommu_group
> *iommu_group)
> +{
> + int ret;
> + struct bus_type *pbus = NULL;
> + struct iommu_group *pgroup = NULL;
> +
> + ret = iommu_group_for_each_dev(iommu_group, &pgroup,
> + vfio_spimdev_type);
> + if (ret < 0)
> + goto out;
> + else if (ret > 0) {
> + domain->domain = iommu_group_share_domain(pgroup);
> + if (IS_ERR(domain->domain))
> + goto out;
> + ret = iommu_group_for_each_dev(pgroup, &pbus,
> + vfio_spimdev_bus);
> + if (ret < 0)
> + goto err_with_share_domain;
> +
> + if (pbus && iommu_capable(pbus,
> IOMMU_CAP_CACHE_COHERENCY))
> + domain->prot |= IOMMU_CACHE;
> +
> + group->parent_group = pgroup;
> + INIT_LIST_HEAD(&domain->group_list);
> + list_add(&group->next, &domain->group_list);
> +
> + return 1;
> + }
> +
> + return 0;
> +
> +err_with_share_domain:
> + iommu_group_unshare_domain(pgroup);
> +out:
> + return ret;
> +}
> +
> static int vfio_iommu_type1_attach_group(void *iommu_data,
> struct iommu_group
> *iommu_group)
> {
> @@ -1335,8 +1441,8 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
> struct vfio_domain *domain, *d;
> struct bus_type *bus = NULL, *mdev_bus;
> int ret;
> - bool resv_msi, msi_remap;
> - phys_addr_t resv_msi_base;
> + bool resv_msi = false, msi_remap;
> + phys_addr_t resv_msi_base = 0;
>
> mutex_lock(&iommu->lock);
>
> @@ -1373,6 +1479,14 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
> if (mdev_bus) {
> if ((bus == mdev_bus) && !iommu_present(bus)) {
> symbol_put(mdev_bus_type);
> +
> + ret = vfio_iommu_type1_attach_spigroup(domain,
> group,
> + iommu_group);
> + if (ret < 0)
> + goto out_free;
> + else if (ret > 0)
> + goto replay_check;
> +
> if (!iommu->external_domain) {
> INIT_LIST_HEAD(&domain->group_list);
> iommu->external_domain = domain;
> @@ -1451,12 +1565,13 @@ static int
> vfio_iommu_type1_attach_group(void *iommu_data,
>
> vfio_test_domain_fgsp(domain);
>
> +replay_check:
> /* replay mappings on new domains */
> ret = vfio_iommu_replay(iommu, domain);
> if (ret)
> goto out_detach;
>
> - if (resv_msi) {
> + if (!group->parent_group && resv_msi) {
> ret = iommu_get_msi_cookie(domain->domain,
> resv_msi_base);
> if (ret)
> goto out_detach;
> @@ -1471,7 +1586,10 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
> out_detach:
> iommu_detach_group(domain->domain, iommu_group);
> out_domain:
> - iommu_domain_free(domain->domain);
> + if (group->parent_group)
> + iommu_group_unshare_domain(group->parent_group);
> + else
> + iommu_domain_free(domain->domain);
> out_free:
> kfree(domain);
> kfree(group);
> @@ -1533,6 +1651,7 @@ static void
> vfio_iommu_type1_detach_group(void *iommu_data,
> struct vfio_iommu *iommu = iommu_data;
> struct vfio_domain *domain;
> struct vfio_group *group;
> + int ret;
>
> mutex_lock(&iommu->lock);
>
> @@ -1560,7 +1679,11 @@ static void
> vfio_iommu_type1_detach_group(void *iommu_data,
> if (!group)
> continue;
>
> - iommu_detach_group(domain->domain, iommu_group);
> + if (group->parent_group)
> + iommu_group_unshare_domain(group-
> >parent_group);
> + else
> + iommu_detach_group(domain->domain,
> iommu_group);
> +
> list_del(&group->next);
> kfree(group);
> /*
> @@ -1577,7 +1700,8 @@ static void
> vfio_iommu_type1_detach_group(void *iommu_data,
> else
>
> vfio_iommu_unmap_unpin_reaccount(iommu);
> }
> - iommu_domain_free(domain->domain);
> + if (!ret)
> + iommu_domain_free(domain->domain);
> list_del(&domain->next);
> kfree(domain);
> }
> diff --git a/include/linux/vfio_spimdev.h b/include/linux/vfio_spimdev.h
> new file mode 100644
> index 000000000000..f7e7d90013e1
> --- /dev/null
> +++ b/include/linux/vfio_spimdev.h
> @@ -0,0 +1,95 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +#ifndef __VFIO_SPIMDEV_H
> +#define __VFIO_SPIMDEV_H
> +
> +#include <linux/device.h>
> +#include <linux/iommu.h>
> +#include <linux/mdev.h>
> +#include <linux/vfio.h>
> +#include <uapi/linux/vfio_spimdev.h>
> +
> +struct vfio_spimdev_queue;
> +struct vfio_spimdev;
> +
> +/**
> + * struct vfio_spimdev_ops - WD device operations
> + * @get_queue: get a queue from the device according to algorithm
> + * @put_queue: free a queue to the device
> + * @is_q_updated: check whether the task is finished
> + * @mask_notify: mask the task irq of queue
> + * @mmap: mmap addresses of queue to user space
> + * @reset: reset the WD device
> + * @reset_queue: reset the queue
> + * @ioctl: ioctl for user space users of the queue
> + * @get_available_instances: get numbers of the queue remained
> + */
> +struct vfio_spimdev_ops {
> + int (*get_queue)(struct vfio_spimdev *spimdev, unsigned long arg,
> + struct vfio_spimdev_queue **q);
> + int (*put_queue)(struct vfio_spimdev_queue *q);
> + int (*is_q_updated)(struct vfio_spimdev_queue *q);
> + void (*mask_notify)(struct vfio_spimdev_queue *q, int
> event_mask);
> + int (*mmap)(struct vfio_spimdev_queue *q, struct vm_area_struct
> *vma);
> + int (*reset)(struct vfio_spimdev *spimdev);
> + int (*reset_queue)(struct vfio_spimdev_queue *q);
> + long (*ioctl)(struct vfio_spimdev_queue *q, unsigned int cmd,
> + unsigned long arg);
> + int (*get_available_instances)(struct vfio_spimdev *spimdev);
> +};
> +
> +struct vfio_spimdev_queue {
> + struct mutex mutex;
> + struct vfio_spimdev *spimdev;
> + int qid;
> + __u32 flags;
> + void *priv;
> + wait_queue_head_t wait;
> + struct mdev_device *mdev;
> + int fd;
> + int container;
> +#ifdef CONFIG_IOMMU_SVA
> + int pasid;
> +#endif
> +};
> +
> +struct vfio_spimdev {
> + const char *name;
> + int status;
> + atomic_t ref;
> + struct module *owner;
> + const struct vfio_spimdev_ops *ops;
> + struct device *dev;
> + struct device cls_dev;
> + bool is_vf;
> + u32 iommu_type;
> + u32 dma_flag;
> + u32 dev_id;
> + void *priv;
> + int flags;
> + const char *api_ver;
> + struct mdev_parent_ops mdev_fops;
> +};
> +
> +int vfio_spimdev_register(struct vfio_spimdev *spimdev);
> +void vfio_spimdev_unregister(struct vfio_spimdev *spimdev);
> +void vfio_spimdev_wake_up(struct vfio_spimdev_queue *q);
> +int vfio_spimdev_is_spimdev(struct device *dev);
> +struct vfio_spimdev *vfio_spimdev_pdev_spimdev(struct device *dev);
> +int vfio_spimdev_pasid_pri_check(int pasid);
> +int vfio_spimdev_get(struct device *dev);
> +int vfio_spimdev_put(struct device *dev);
> +struct vfio_spimdev *mdev_spimdev(struct mdev_device *mdev);
> +
> +extern struct mdev_type_attribute mdev_type_attr_flags;
> +extern struct mdev_type_attribute mdev_type_attr_name;
> +extern struct mdev_type_attribute mdev_type_attr_device_api;
> +extern struct mdev_type_attribute mdev_type_attr_available_instances;
> +#define VFIO_SPIMDEV_DEFAULT_MDEV_TYPE_ATTRS \
> + &mdev_type_attr_name.attr, \
> + &mdev_type_attr_device_api.attr, \
> + &mdev_type_attr_available_instances.attr, \
> + &mdev_type_attr_flags.attr
> +
> +#define _VFIO_SPIMDEV_REGION(vm_pgoff) (vm_pgoff & 0xf)
> +
> +#endif
> diff --git a/include/uapi/linux/vfio_spimdev.h
> b/include/uapi/linux/vfio_spimdev.h
> new file mode 100644
> index 000000000000..3435e5c345b4
> --- /dev/null
> +++ b/include/uapi/linux/vfio_spimdev.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +#ifndef _UAPIVFIO_SPIMDEV_H
> +#define _UAPIVFIO_SPIMDEV_H
> +
> +#include <linux/ioctl.h>
> +
> +#define VFIO_SPIMDEV_CLASS_NAME "spimdev"
> +
> +/* Device ATTRs in parent dev SYSFS DIR */
> +#define VFIO_SPIMDEV_PDEV_ATTRS_GRP_NAME "params"
> +
> +/* Parent device attributes */
> +#define SPIMDEV_IOMMU_TYPE "iommu_type"
> +#define SPIMDEV_DMA_FLAG "dma_flag"
> +
> +/* Maximum length of algorithm name string */
> +#define VFIO_SPIMDEV_ALG_NAME_SIZE 64
> +
> +/* the bits used in SPIMDEV_DMA_FLAG attributes */
> +#define VFIO_SPIMDEV_DMA_INVALID 0
> +#define VFIO_SPIMDEV_DMA_SINGLE_PROC_MAP 1
> +#define VFIO_SPIMDEV_DMA_MULTI_PROC_MAP 2
> +#define VFIO_SPIMDEV_DMA_SVM 4
> +#define VFIO_SPIMDEV_DMA_SVM_NO_FAULT 8
> +#define VFIO_SPIMDEV_DMA_PHY 16
> +
> +#define VFIO_SPIMDEV_CMD_GET_Q _IO('W', 1)
> +#endif
> --
> 2.17.1

2018-08-02 03:40:06

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Thu, Aug 02, 2018 at 02:59:33AM +0000, Tian, Kevin wrote:
> Date: Thu, 2 Aug 2018 02:59:33 +0000
> From: "Tian, Kevin" <[email protected]>
> To: Kenneth Lee <[email protected]>, Jonathan Corbet <[email protected]>,
> Herbert Xu <[email protected]>, "David S . Miller"
> <davem-fT/PcQaiUtIeIZ0/[email protected]>, Joerg Roedel <[email protected]>, Alex Williamson
> <[email protected]>, Kenneth Lee <liguozhu-C8/M+/[email protected]>, Hao
> Fang <[email protected]>, Zhou Wang <wangzhou1-C8/M+/[email protected]>, Zaibo Xu
> <[email protected]>, Philippe Ombredanne <pombredanne-od1rfyK75/[email protected]>, Greg
> Kroah-Hartman <[email protected]>, Thomas Gleixner
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org"
> <iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>, "[email protected]"
> <[email protected]>, "linux-accelerators-uLR06cmDAlY/[email protected]"
> <linux-accelerators-uLR06cmDAlY/[email protected]>, Lu Baolu
> <baolu.lu-VuQAYsv1563Yd54FQh9/[email protected]>, "Kumar, Sanjay K" <[email protected]>
> CC: "[email protected]" <[email protected]>
> Subject: RE: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
> Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D191290EB3-0J0gbvR4kThpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
>
> > From: Kenneth Lee
> > Sent: Wednesday, August 1, 2018 6:22 PM
> >
> > From: Kenneth Lee <liguozhu-C8/M+/[email protected]>
> >
> > WarpDrive is an accelerator framework to expose the hardware capabilities
> > directly to the user space. It makes use of the exist vfio and vfio-mdev
> > facilities. So the user application can send request and DMA to the
> > hardware without interaction with the kernel. This remove the latency
> > of syscall and context switch.
> >
> > The patchset contains documents for the detail. Please refer to it for more
> > information.
> >
> > This patchset is intended to be used with Jean Philippe Brucker's SVA
> > patch [1] (Which is also in RFC stage). But it is not mandatory. This
> > patchset is tested in the latest mainline kernel without the SVA patches.
> > So it support only one process for each accelerator.
>
> If no sharing, then why not just assigning the whole parent device to
> the process? IMO if SVA usage is the clear goal of your series, it
> might be made clearly so then Jean's series is mandatory dependency...
>

We don't know how SVA will be finally. But the feature, "make use of
per-PASID/substream ID IOMMU page table", should be able to be enabled in the
kernel. So we don't want to enforce it here. After we have this serial ready, it
can be hooked to any implementation.

Further more, even without "per-PASID IOMMU page table", this series has its
value. It is not simply dedicate the whole device to the process. It "shares"
the device with the kernel driver. So you can support crypto and a user
application at the same time.

> >
> > With SVA support, WarpDrive can support multi-process in the same
> > accelerator device. We tested it in our SoC integrated Accelerator (board
> > ID: D06, Chip ID: HIP08). A reference work tree can be found here: [2].
> >
> > We have noticed the IOMMU aware mdev RFC announced recently [3].
> >
> > The IOMMU aware mdev has similar idea but different intention comparing
> > to
> > WarpDrive. It intends to dedicate part of the hardware resource to a VM.
>
> Not just to VM, though I/O Virtualization is in the name. You can assign
> such mdev to either VMs, containers, or bare metal processes. It's just
> a fully-isolated device from user space p.o.v.

Oh, yes. Thank you for clarification.

>
> > And the design is supposed to be used with Scalable I/O Virtualization.
> > While spimdev is intended to share the hardware resource with a big
> > amount
> > of processes. It just requires the hardware supporting address
> > translation per process (PCIE's PASID or ARM SMMU's substream ID).
> >
> > But we don't see serious confliction on both design. We believe they can be
> > normalized as one.
>
> yes there are something which can be shared, e.g. regarding to
> the interface to IOMMU.
>
> Conceptually I see them different mindset on device resource sharing:
>
> WarpDrive more aims to provide a generic framework to enable SVA
> usages on various accelerators, which lack of a well-abstracted user
> API like OpenCL. SVA is a hardware capability - sort of exposing resources
> composing ONE capability to user space through mdev framework. It is
> not like a VF which naturally carries most capabilities as PF.
>

Yes. But we believe the user abstraction layer will be enabled soon when the
channel is opened. WarpDrive gives the hardware the chance to serve the
application directly. For example, an AI engine can be called by many processes
for inference. The resource need not to be dedicated to one particular process.

> Intel Scalable I/O virtualization is a thorough design to partition the
> device into minimal sharable copies (queue, queue pair, context),
> while each copy carries most PF capabilities (including SVA) similar to
> VF. Also with IOMMU scalable mode support, the copy can be
> independently assigned to any client (process, container, VM, etc.)
>
Yes, we can see this intension.
> Thanks
> Kevin

Thank you.

--
-Kenneth(Hisilicon)

2018-08-02 03:47:27

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 3/7] vfio: add spimdev support

On Thu, Aug 02, 2018 at 03:21:25AM +0000, Tian, Kevin wrote:
> Date: Thu, 2 Aug 2018 03:21:25 +0000
> From: "Tian, Kevin" <[email protected]>
> To: Kenneth Lee <[email protected]>, Jonathan Corbet <[email protected]>,
> Herbert Xu <[email protected]>, "David S . Miller"
> <[email protected]>, Joerg Roedel <[email protected]>, Alex Williamson
> <[email protected]>, Kenneth Lee <[email protected]>, Hao
> Fang <[email protected]>, Zhou Wang <[email protected]>, Zaibo Xu
> <[email protected]>, Philippe Ombredanne <[email protected]>, Greg
> Kroah-Hartman <[email protected]>, Thomas Gleixner
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, Lu Baolu
> <[email protected]>, "Kumar, Sanjay K" <[email protected]>
> CC: "[email protected]" <[email protected]>
> Subject: RE: [RFC PATCH 3/7] vfio: add spimdev support
> Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D191290F7B@SHSMSX101.ccr.corp.intel.com>
>
> > From: Kenneth Lee
> > Sent: Wednesday, August 1, 2018 6:22 PM
> >
> > From: Kenneth Lee <[email protected]>
> >
> > SPIMDEV is "Share Parent IOMMU Mdev". It is a vfio-mdev. But differ from
> > the general vfio-mdev:
> >
> > 1. It shares its parent's IOMMU.
> > 2. There is no hardware resource attached to the mdev is created. The
> > hardware resource (A `queue') is allocated only when the mdev is
> > opened.
>
> Alex has concern on doing so, as pointed out in:
>
> https://www.spinics.net/lists/kvm/msg172652.html
>
> resource allocation should be reserved at creation time.

Yes. That is why I keep telling that SPIMDEV is not for "VM", it is for "many
processes", it is just an access point to the process. Not a device to VM. I hope
Alex can accept it:)

>
> >
> > Currently only the vfio type-1 driver is updated to make it to be aware
> > of.
> >
> > Signed-off-by: Kenneth Lee <[email protected]>
> > Signed-off-by: Zaibo Xu <[email protected]>
> > Signed-off-by: Zhou Wang <[email protected]>
> > ---
> > drivers/vfio/Kconfig | 1 +
> > drivers/vfio/Makefile | 1 +
> > drivers/vfio/spimdev/Kconfig | 10 +
> > drivers/vfio/spimdev/Makefile | 3 +
> > drivers/vfio/spimdev/vfio_spimdev.c | 421
> > ++++++++++++++++++++++++++++
> > drivers/vfio/vfio_iommu_type1.c | 136 ++++++++-
> > include/linux/vfio_spimdev.h | 95 +++++++
> > include/uapi/linux/vfio_spimdev.h | 28 ++
> > 8 files changed, 689 insertions(+), 6 deletions(-)
> > create mode 100644 drivers/vfio/spimdev/Kconfig
> > create mode 100644 drivers/vfio/spimdev/Makefile
> > create mode 100644 drivers/vfio/spimdev/vfio_spimdev.c
> > create mode 100644 include/linux/vfio_spimdev.h
> > create mode 100644 include/uapi/linux/vfio_spimdev.h
> >
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> > index c84333eb5eb5..3719eba72ef1 100644
> > --- a/drivers/vfio/Kconfig
> > +++ b/drivers/vfio/Kconfig
> > @@ -47,4 +47,5 @@ menuconfig VFIO_NOIOMMU
> > source "drivers/vfio/pci/Kconfig"
> > source "drivers/vfio/platform/Kconfig"
> > source "drivers/vfio/mdev/Kconfig"
> > +source "drivers/vfio/spimdev/Kconfig"
> > source "virt/lib/Kconfig"
> > diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> > index de67c4725cce..28f3ef0cdce1 100644
> > --- a/drivers/vfio/Makefile
> > +++ b/drivers/vfio/Makefile
> > @@ -9,3 +9,4 @@ obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
> > obj-$(CONFIG_VFIO_PCI) += pci/
> > obj-$(CONFIG_VFIO_PLATFORM) += platform/
> > obj-$(CONFIG_VFIO_MDEV) += mdev/
> > +obj-$(CONFIG_VFIO_SPIMDEV) += spimdev/
> > diff --git a/drivers/vfio/spimdev/Kconfig b/drivers/vfio/spimdev/Kconfig
> > new file mode 100644
> > index 000000000000..1226301f9d0e
> > --- /dev/null
> > +++ b/drivers/vfio/spimdev/Kconfig
> > @@ -0,0 +1,10 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +config VFIO_SPIMDEV
> > + tristate "Support for Share Parent IOMMU MDEV"
> > + depends on VFIO_MDEV_DEVICE
> > + help
> > + Support for VFIO Share Parent IOMMU MDEV, which enable the
> > kernel to
> > + support for the light weight hardware accelerator framework,
> > WrapDrive.
> > +
> > + To compile this as a module, choose M here: the module will be
> > called
> > + spimdev.
> > diff --git a/drivers/vfio/spimdev/Makefile b/drivers/vfio/spimdev/Makefile
> > new file mode 100644
> > index 000000000000..d02fb69c37e4
> > --- /dev/null
> > +++ b/drivers/vfio/spimdev/Makefile
> > @@ -0,0 +1,3 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +spimdev-y := spimdev.o
> > +obj-$(CONFIG_VFIO_SPIMDEV) += vfio_spimdev.o
> > diff --git a/drivers/vfio/spimdev/vfio_spimdev.c
> > b/drivers/vfio/spimdev/vfio_spimdev.c
> > new file mode 100644
> > index 000000000000..1b6910c9d27d
> > --- /dev/null
> > +++ b/drivers/vfio/spimdev/vfio_spimdev.c
> > @@ -0,0 +1,421 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +#include <linux/anon_inodes.h>
> > +#include <linux/idr.h>
> > +#include <linux/module.h>
> > +#include <linux/poll.h>
> > +#include <linux/vfio_spimdev.h>
> > +
> > +struct spimdev_mdev_state {
> > + struct vfio_spimdev *spimdev;
> > +};
> > +
> > +static struct class *spimdev_class;
> > +static DEFINE_IDR(spimdev_idr);
> > +
> > +static int vfio_spimdev_dev_exist(struct device *dev, void *data)
> > +{
> > + return !strcmp(dev_name(dev), dev_name((struct device *)data));
> > +}
> > +
> > +#ifdef CONFIG_IOMMU_SVA
> > +static bool vfio_spimdev_is_valid_pasid(int pasid)
> > +{
> > + struct mm_struct *mm;
> > +
> > + mm = iommu_sva_find(pasid);
> > + if (mm) {
> > + mmput(mm);
> > + return mm == current->mm;
> > + }
> > +
> > + return false;
> > +}
> > +#endif
> > +
> > +/* Check if the device is a mediated device belongs to vfio_spimdev */
> > +int vfio_spimdev_is_spimdev(struct device *dev)
> > +{
> > + struct mdev_device *mdev;
> > + struct device *pdev;
> > +
> > + mdev = mdev_from_dev(dev);
> > + if (!mdev)
> > + return 0;
> > +
> > + pdev = mdev_parent_dev(mdev);
> > + if (!pdev)
> > + return 0;
> > +
> > + return class_for_each_device(spimdev_class, NULL, pdev,
> > + vfio_spimdev_dev_exist);
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_spimdev_is_spimdev);
> > +
> > +struct vfio_spimdev *vfio_spimdev_pdev_spimdev(struct device *dev)
> > +{
> > + struct device *class_dev;
> > +
> > + if (!dev)
> > + return ERR_PTR(-EINVAL);
> > +
> > + class_dev = class_find_device(spimdev_class, NULL, dev,
> > + (int(*)(struct device *, const void
> > *))vfio_spimdev_dev_exist);
> > + if (!class_dev)
> > + return ERR_PTR(-ENODEV);
> > +
> > + return container_of(class_dev, struct vfio_spimdev, cls_dev);
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_spimdev_pdev_spimdev);
> > +
> > +struct vfio_spimdev *mdev_spimdev(struct mdev_device *mdev)
> > +{
> > + struct device *pdev = mdev_parent_dev(mdev);
> > +
> > + return vfio_spimdev_pdev_spimdev(pdev);
> > +}
> > +EXPORT_SYMBOL_GPL(mdev_spimdev);
> > +
> > +static ssize_t iommu_type_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct vfio_spimdev *spimdev = vfio_spimdev_pdev_spimdev(dev);
> > +
> > + if (!spimdev)
> > + return -ENODEV;
> > +
> > + return sprintf(buf, "%d\n", spimdev->iommu_type);
> > +}
> > +
> > +static DEVICE_ATTR_RO(iommu_type);
> > +
> > +static ssize_t dma_flag_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct vfio_spimdev *spimdev = vfio_spimdev_pdev_spimdev(dev);
> > +
> > + if (!spimdev)
> > + return -ENODEV;
> > +
> > + return sprintf(buf, "%d\n", spimdev->dma_flag);
> > +}
> > +
> > +static DEVICE_ATTR_RO(dma_flag);
> > +
> > +/* mdev->dev_attr_groups */
> > +static struct attribute *vfio_spimdev_attrs[] = {
> > + &dev_attr_iommu_type.attr,
> > + &dev_attr_dma_flag.attr,
> > + NULL,
> > +};
> > +static const struct attribute_group vfio_spimdev_group = {
> > + .name = VFIO_SPIMDEV_PDEV_ATTRS_GRP_NAME,
> > + .attrs = vfio_spimdev_attrs,
> > +};
> > +const struct attribute_group *vfio_spimdev_groups[] = {
> > + &vfio_spimdev_group,
> > + NULL,
> > +};
> > +
> > +/* default attributes for mdev->supported_type_groups, used by
> > registerer*/
> > +#define MDEV_TYPE_ATTR_RO_EXPORT(name) \
> > + MDEV_TYPE_ATTR_RO(name); \
> > + EXPORT_SYMBOL_GPL(mdev_type_attr_##name);
> > +
> > +#define DEF_SIMPLE_SPIMDEV_ATTR(_name, spimdev_member, format)
> > \
> > +static ssize_t _name##_show(struct kobject *kobj, struct device *dev, \
> > + char *buf) \
> > +{ \
> > + struct vfio_spimdev *spimdev = vfio_spimdev_pdev_spimdev(dev);
> > \
> > + if (!spimdev) \
> > + return -ENODEV; \
> > + return sprintf(buf, format, spimdev->spimdev_member); \
> > +} \
> > +MDEV_TYPE_ATTR_RO_EXPORT(_name)
> > +
> > +DEF_SIMPLE_SPIMDEV_ATTR(flags, flags, "%d");
> > +DEF_SIMPLE_SPIMDEV_ATTR(name, name, "%s"); /* this should be
> > algorithm name, */
> > + /* but you would not care if you have only one algorithm */
> > +DEF_SIMPLE_SPIMDEV_ATTR(device_api, api_ver, "%s");
> > +
> > +/* this return total queue left, not mdev left */
> > +static ssize_t
> > +available_instances_show(struct kobject *kobj, struct device *dev, char
> > *buf)
> > +{
> > + struct vfio_spimdev *spimdev = vfio_spimdev_pdev_spimdev(dev);
> > +
> > + return sprintf(buf, "%d",
> > + spimdev->ops->get_available_instances(spimdev));
> > +}
> > +MDEV_TYPE_ATTR_RO_EXPORT(available_instances);
> > +
> > +static int vfio_spimdev_mdev_create(struct kobject *kobj,
> > + struct mdev_device *mdev)
> > +{
> > + struct device *dev = mdev_dev(mdev);
> > + struct device *pdev = mdev_parent_dev(mdev);
> > + struct spimdev_mdev_state *mdev_state;
> > + struct vfio_spimdev *spimdev = mdev_spimdev(mdev);
> > +
> > + if (!spimdev->ops->get_queue)
> > + return -ENODEV;
> > +
> > + mdev_state = devm_kzalloc(dev, sizeof(struct
> > spimdev_mdev_state),
> > + GFP_KERNEL);
> > + if (!mdev_state)
> > + return -ENOMEM;
> > + mdev_set_drvdata(mdev, mdev_state);
> > + mdev_state->spimdev = spimdev;
> > + dev->iommu_fwspec = pdev->iommu_fwspec;
> > + get_device(pdev);
> > + __module_get(spimdev->owner);
> > +
> > + return 0;
> > +}
> > +
> > +static int vfio_spimdev_mdev_remove(struct mdev_device *mdev)
> > +{
> > + struct device *dev = mdev_dev(mdev);
> > + struct device *pdev = mdev_parent_dev(mdev);
> > + struct vfio_spimdev *spimdev = mdev_spimdev(mdev);
> > +
> > + put_device(pdev);
> > + module_put(spimdev->owner);
> > + dev->iommu_fwspec = NULL;
> > + mdev_set_drvdata(mdev, NULL);
> > +
> > + return 0;
> > +}
> > +
> > +/* Wake up the process who is waiting this queue */
> > +void vfio_spimdev_wake_up(struct vfio_spimdev_queue *q)
> > +{
> > + wake_up(&q->wait);
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_spimdev_wake_up);
> > +
> > +static int vfio_spimdev_q_file_open(struct inode *inode, struct file *file)
> > +{
> > + return 0;
> > +}
> > +
> > +static int vfio_spimdev_q_file_release(struct inode *inode, struct file *file)
> > +{
> > + struct vfio_spimdev_queue *q =
> > + (struct vfio_spimdev_queue *)file->private_data;
> > + struct vfio_spimdev *spimdev = q->spimdev;
> > + int ret;
> > +
> > + ret = spimdev->ops->put_queue(q);
> > + if (ret) {
> > + dev_err(spimdev->dev, "drv put queue fail (%d)!\n", ret);
> > + return ret;
> > + }
> > +
> > + put_device(mdev_dev(q->mdev));
> > +
> > + return 0;
> > +}
> > +
> > +static long vfio_spimdev_q_file_ioctl(struct file *file, unsigned int cmd,
> > + unsigned long arg)
> > +{
> > + struct vfio_spimdev_queue *q =
> > + (struct vfio_spimdev_queue *)file->private_data;
> > + struct vfio_spimdev *spimdev = q->spimdev;
> > +
> > + if (spimdev->ops->ioctl)
> > + return spimdev->ops->ioctl(q, cmd, arg);
> > +
> > + dev_err(spimdev->dev, "ioctl cmd (%d) is not supported!\n", cmd);
> > +
> > + return -EINVAL;
> > +}
> > +
> > +static int vfio_spimdev_q_file_mmap(struct file *file,
> > + struct vm_area_struct *vma)
> > +{
> > + struct vfio_spimdev_queue *q =
> > + (struct vfio_spimdev_queue *)file->private_data;
> > + struct vfio_spimdev *spimdev = q->spimdev;
> > +
> > + if (spimdev->ops->mmap)
> > + return spimdev->ops->mmap(q, vma);
> > +
> > + dev_err(spimdev->dev, "no driver mmap!\n");
> > + return -EINVAL;
> > +}
> > +
> > +static __poll_t vfio_spimdev_q_file_poll(struct file *file, poll_table *wait)
> > +{
> > + struct vfio_spimdev_queue *q =
> > + (struct vfio_spimdev_queue *)file->private_data;
> > + struct vfio_spimdev *spimdev = q->spimdev;
> > +
> > + poll_wait(file, &q->wait, wait);
> > + if (spimdev->ops->is_q_updated(q))
> > + return EPOLLIN | EPOLLRDNORM;
> > +
> > + return 0;
> > +}
> > +
> > +static const struct file_operations spimdev_q_file_ops = {
> > + .owner = THIS_MODULE,
> > + .open = vfio_spimdev_q_file_open,
> > + .unlocked_ioctl = vfio_spimdev_q_file_ioctl,
> > + .release = vfio_spimdev_q_file_release,
> > + .poll = vfio_spimdev_q_file_poll,
> > + .mmap = vfio_spimdev_q_file_mmap,
> > +};
> > +
> > +static long vfio_spimdev_mdev_get_queue(struct mdev_device *mdev,
> > + struct vfio_spimdev *spimdev, unsigned long arg)
> > +{
> > + struct vfio_spimdev_queue *q;
> > + int ret;
> > +
> > +#ifdef CONFIG_IOMMU_SVA
> > + int pasid = arg;
> > +
> > + if (!vfio_spimdev_is_valid_pasid(pasid))
> > + return -EINVAL;
> > +#endif
> > +
> > + if (!spimdev->ops->get_queue)
> > + return -EINVAL;
> > +
> > + ret = spimdev->ops->get_queue(spimdev, arg, &q);
> > + if (ret < 0) {
> > + dev_err(spimdev->dev, "get_queue failed\n");
> > + return -ENODEV;
> > + }
> > +
> > + ret = anon_inode_getfd("spimdev_q", &spimdev_q_file_ops,
> > + q, O_CLOEXEC | O_RDWR);
> > + if (ret < 0) {
> > + dev_err(spimdev->dev, "getfd fail %d\n", ret);
> > + goto err_with_queue;
> > + }
> > +
> > + q->fd = ret;
> > + q->spimdev = spimdev;
> > + q->mdev = mdev;
> > + q->container = arg;
> > + init_waitqueue_head(&q->wait);
> > + get_device(mdev_dev(mdev));
> > +
> > + return ret;
> > +
> > +err_with_queue:
> > + spimdev->ops->put_queue(q);
> > + return ret;
> > +}
> > +
> > +static long vfio_spimdev_mdev_ioctl(struct mdev_device *mdev, unsigned
> > int cmd,
> > + unsigned long arg)
> > +{
> > + struct spimdev_mdev_state *mdev_state;
> > + struct vfio_spimdev *spimdev;
> > +
> > + if (!mdev)
> > + return -ENODEV;
> > +
> > + mdev_state = mdev_get_drvdata(mdev);
> > + if (!mdev_state)
> > + return -ENODEV;
> > +
> > + spimdev = mdev_state->spimdev;
> > + if (!spimdev)
> > + return -ENODEV;
> > +
> > + if (cmd == VFIO_SPIMDEV_CMD_GET_Q)
> > + return vfio_spimdev_mdev_get_queue(mdev, spimdev, arg);
> > +
> > + dev_err(spimdev->dev,
> > + "%s, ioctl cmd (0x%x) is not supported!\n", __func__, cmd);
> > + return -EINVAL;
> > +}
> > +
> > +static void vfio_spimdev_release(struct device *dev) { }
> > +static void vfio_spimdev_mdev_release(struct mdev_device *mdev) { }
> > +static int vfio_spimdev_mdev_open(struct mdev_device *mdev) { return
> > 0; }
> > +
> > +/**
> > + * vfio_spimdev_register - register a spimdev
> > + * @spimdev: device structure
> > + */
> > +int vfio_spimdev_register(struct vfio_spimdev *spimdev)
> > +{
> > + int ret;
> > + const char *drv_name;
> > +
> > + if (!spimdev->dev)
> > + return -ENODEV;
> > +
> > + drv_name = dev_driver_string(spimdev->dev);
> > + if (strstr(drv_name, "-")) {
> > + pr_err("spimdev: parent driver name cannot include '-'!\n");
> > + return -EINVAL;
> > + }
> > +
> > + spimdev->dev_id = idr_alloc(&spimdev_idr, spimdev, 0, 0,
> > GFP_KERNEL);
> > + if (spimdev->dev_id < 0)
> > + return spimdev->dev_id;
> > +
> > + atomic_set(&spimdev->ref, 0);
> > + spimdev->cls_dev.parent = spimdev->dev;
> > + spimdev->cls_dev.class = spimdev_class;
> > + spimdev->cls_dev.release = vfio_spimdev_release;
> > + dev_set_name(&spimdev->cls_dev, "%s", dev_name(spimdev-
> > >dev));
> > + ret = device_register(&spimdev->cls_dev);
> > + if (ret)
> > + return ret;
> > +
> > + spimdev->mdev_fops.owner = spimdev->owner;
> > + spimdev->mdev_fops.dev_attr_groups =
> > vfio_spimdev_groups;
> > + WARN_ON(!spimdev->mdev_fops.supported_type_groups);
> > + spimdev->mdev_fops.create =
> > vfio_spimdev_mdev_create;
> > + spimdev->mdev_fops.remove =
> > vfio_spimdev_mdev_remove;
> > + spimdev->mdev_fops.ioctl = vfio_spimdev_mdev_ioctl;
> > + spimdev->mdev_fops.open =
> > vfio_spimdev_mdev_open;
> > + spimdev->mdev_fops.release =
> > vfio_spimdev_mdev_release;
> > +
> > + ret = mdev_register_device(spimdev->dev, &spimdev->mdev_fops);
> > + if (ret)
> > + device_unregister(&spimdev->cls_dev);
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_spimdev_register);
> > +
> > +/**
> > + * vfio_spimdev_unregister - unregisters a spimdev
> > + * @spimdev: device to unregister
> > + *
> > + * Unregister a miscellaneous device that wat previously successully
> > registered
> > + * with vfio_spimdev_register().
> > + */
> > +void vfio_spimdev_unregister(struct vfio_spimdev *spimdev)
> > +{
> > + mdev_unregister_device(spimdev->dev);
> > + device_unregister(&spimdev->cls_dev);
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_spimdev_unregister);
> > +
> > +static int __init vfio_spimdev_init(void)
> > +{
> > + spimdev_class = class_create(THIS_MODULE,
> > VFIO_SPIMDEV_CLASS_NAME);
> > + return PTR_ERR_OR_ZERO(spimdev_class);
> > +}
> > +
> > +static __exit void vfio_spimdev_exit(void)
> > +{
> > + class_destroy(spimdev_class);
> > + idr_destroy(&spimdev_idr);
> > +}
> > +
> > +module_init(vfio_spimdev_init);
> > +module_exit(vfio_spimdev_exit);
> > +
> > +MODULE_LICENSE("GPL");
> > +MODULE_AUTHOR("Hisilicon Tech. Co., Ltd.");
> > +MODULE_DESCRIPTION("VFIO Share Parent's IOMMU Mediated Device");
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c
> > index 3e5b17710a4f..0ec38a17c98c 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -41,6 +41,7 @@
> > #include <linux/notifier.h>
> > #include <linux/dma-iommu.h>
> > #include <linux/irqdomain.h>
> > +#include <linux/vfio_spimdev.h>
> >
> > #define DRIVER_VERSION "0.2"
> > #define DRIVER_AUTHOR "Alex Williamson
> > <[email protected]>"
> > @@ -89,6 +90,8 @@ struct vfio_dma {
> > };
> >
> > struct vfio_group {
> > + /* iommu_group of mdev's parent device */
> > + struct iommu_group *parent_group;
> > struct iommu_group *iommu_group;
> > struct list_head next;
> > };
> > @@ -1327,6 +1330,109 @@ static bool vfio_iommu_has_sw_msi(struct
> > iommu_group *group, phys_addr_t *base)
> > return ret;
> > }
> >
> > +/* return 0 if the device is not spimdev.
> > + * return 1 if the device is spimdev, the data will be updated with parent
> > + * device's group.
> > + * return -errno if other error.
> > + */
> > +static int vfio_spimdev_type(struct device *dev, void *data)
> > +{
> > + struct iommu_group **group = data;
> > + struct iommu_group *pgroup;
> > + int (*spimdev_mdev)(struct device *dev);
> > + struct device *pdev;
> > + int ret = 1;
> > +
> > + /* vfio_spimdev module is not configurated */
> > + spimdev_mdev = symbol_get(vfio_spimdev_is_spimdev);
> > + if (!spimdev_mdev)
> > + return 0;
> > +
> > + /* check if it belongs to vfio_spimdev device */
> > + if (!spimdev_mdev(dev)) {
> > + ret = 0;
> > + goto get_exit;
> > + }
> > +
> > + pdev = dev->parent;
> > + pgroup = iommu_group_get(pdev);
> > + if (!pgroup) {
> > + ret = -ENODEV;
> > + goto get_exit;
> > + }
> > +
> > + if (group) {
> > + /* check if all parent devices is the same */
> > + if (*group && *group != pgroup)
> > + ret = -ENODEV;
> > + else
> > + *group = pgroup;
> > + }
> > +
> > + iommu_group_put(pgroup);
> > +
> > +get_exit:
> > + symbol_put(vfio_spimdev_is_spimdev);
> > +
> > + return ret;
> > +}
> > +
> > +/* return 0 or -errno */
> > +static int vfio_spimdev_bus(struct device *dev, void *data)
> > +{
> > + struct bus_type **bus = data;
> > +
> > + if (!dev->bus)
> > + return -ENODEV;
> > +
> > + /* ensure all devices has the same bus_type */
> > + if (*bus && *bus != dev->bus)
> > + return -EINVAL;
> > +
> > + *bus = dev->bus;
> > + return 0;
> > +}
> > +
> > +/* return 0 means it is not spi group, 1 means it is, or -EXXX for error */
> > +static int vfio_iommu_type1_attach_spigroup(struct vfio_domain *domain,
> > + struct vfio_group *group,
> > + struct iommu_group
> > *iommu_group)
> > +{
> > + int ret;
> > + struct bus_type *pbus = NULL;
> > + struct iommu_group *pgroup = NULL;
> > +
> > + ret = iommu_group_for_each_dev(iommu_group, &pgroup,
> > + vfio_spimdev_type);
> > + if (ret < 0)
> > + goto out;
> > + else if (ret > 0) {
> > + domain->domain = iommu_group_share_domain(pgroup);
> > + if (IS_ERR(domain->domain))
> > + goto out;
> > + ret = iommu_group_for_each_dev(pgroup, &pbus,
> > + vfio_spimdev_bus);
> > + if (ret < 0)
> > + goto err_with_share_domain;
> > +
> > + if (pbus && iommu_capable(pbus,
> > IOMMU_CAP_CACHE_COHERENCY))
> > + domain->prot |= IOMMU_CACHE;
> > +
> > + group->parent_group = pgroup;
> > + INIT_LIST_HEAD(&domain->group_list);
> > + list_add(&group->next, &domain->group_list);
> > +
> > + return 1;
> > + }
> > +
> > + return 0;
> > +
> > +err_with_share_domain:
> > + iommu_group_unshare_domain(pgroup);
> > +out:
> > + return ret;
> > +}
> > +
> > static int vfio_iommu_type1_attach_group(void *iommu_data,
> > struct iommu_group
> > *iommu_group)
> > {
> > @@ -1335,8 +1441,8 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> > struct vfio_domain *domain, *d;
> > struct bus_type *bus = NULL, *mdev_bus;
> > int ret;
> > - bool resv_msi, msi_remap;
> > - phys_addr_t resv_msi_base;
> > + bool resv_msi = false, msi_remap;
> > + phys_addr_t resv_msi_base = 0;
> >
> > mutex_lock(&iommu->lock);
> >
> > @@ -1373,6 +1479,14 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> > if (mdev_bus) {
> > if ((bus == mdev_bus) && !iommu_present(bus)) {
> > symbol_put(mdev_bus_type);
> > +
> > + ret = vfio_iommu_type1_attach_spigroup(domain,
> > group,
> > + iommu_group);
> > + if (ret < 0)
> > + goto out_free;
> > + else if (ret > 0)
> > + goto replay_check;
> > +
> > if (!iommu->external_domain) {
> > INIT_LIST_HEAD(&domain->group_list);
> > iommu->external_domain = domain;
> > @@ -1451,12 +1565,13 @@ static int
> > vfio_iommu_type1_attach_group(void *iommu_data,
> >
> > vfio_test_domain_fgsp(domain);
> >
> > +replay_check:
> > /* replay mappings on new domains */
> > ret = vfio_iommu_replay(iommu, domain);
> > if (ret)
> > goto out_detach;
> >
> > - if (resv_msi) {
> > + if (!group->parent_group && resv_msi) {
> > ret = iommu_get_msi_cookie(domain->domain,
> > resv_msi_base);
> > if (ret)
> > goto out_detach;
> > @@ -1471,7 +1586,10 @@ static int vfio_iommu_type1_attach_group(void
> > *iommu_data,
> > out_detach:
> > iommu_detach_group(domain->domain, iommu_group);
> > out_domain:
> > - iommu_domain_free(domain->domain);
> > + if (group->parent_group)
> > + iommu_group_unshare_domain(group->parent_group);
> > + else
> > + iommu_domain_free(domain->domain);
> > out_free:
> > kfree(domain);
> > kfree(group);
> > @@ -1533,6 +1651,7 @@ static void
> > vfio_iommu_type1_detach_group(void *iommu_data,
> > struct vfio_iommu *iommu = iommu_data;
> > struct vfio_domain *domain;
> > struct vfio_group *group;
> > + int ret;
> >
> > mutex_lock(&iommu->lock);
> >
> > @@ -1560,7 +1679,11 @@ static void
> > vfio_iommu_type1_detach_group(void *iommu_data,
> > if (!group)
> > continue;
> >
> > - iommu_detach_group(domain->domain, iommu_group);
> > + if (group->parent_group)
> > + iommu_group_unshare_domain(group-
> > >parent_group);
> > + else
> > + iommu_detach_group(domain->domain,
> > iommu_group);
> > +
> > list_del(&group->next);
> > kfree(group);
> > /*
> > @@ -1577,7 +1700,8 @@ static void
> > vfio_iommu_type1_detach_group(void *iommu_data,
> > else
> >
> > vfio_iommu_unmap_unpin_reaccount(iommu);
> > }
> > - iommu_domain_free(domain->domain);
> > + if (!ret)
> > + iommu_domain_free(domain->domain);
> > list_del(&domain->next);
> > kfree(domain);
> > }
> > diff --git a/include/linux/vfio_spimdev.h b/include/linux/vfio_spimdev.h
> > new file mode 100644
> > index 000000000000..f7e7d90013e1
> > --- /dev/null
> > +++ b/include/linux/vfio_spimdev.h
> > @@ -0,0 +1,95 @@
> > +/* SPDX-License-Identifier: GPL-2.0+ */
> > +#ifndef __VFIO_SPIMDEV_H
> > +#define __VFIO_SPIMDEV_H
> > +
> > +#include <linux/device.h>
> > +#include <linux/iommu.h>
> > +#include <linux/mdev.h>
> > +#include <linux/vfio.h>
> > +#include <uapi/linux/vfio_spimdev.h>
> > +
> > +struct vfio_spimdev_queue;
> > +struct vfio_spimdev;
> > +
> > +/**
> > + * struct vfio_spimdev_ops - WD device operations
> > + * @get_queue: get a queue from the device according to algorithm
> > + * @put_queue: free a queue to the device
> > + * @is_q_updated: check whether the task is finished
> > + * @mask_notify: mask the task irq of queue
> > + * @mmap: mmap addresses of queue to user space
> > + * @reset: reset the WD device
> > + * @reset_queue: reset the queue
> > + * @ioctl: ioctl for user space users of the queue
> > + * @get_available_instances: get numbers of the queue remained
> > + */
> > +struct vfio_spimdev_ops {
> > + int (*get_queue)(struct vfio_spimdev *spimdev, unsigned long arg,
> > + struct vfio_spimdev_queue **q);
> > + int (*put_queue)(struct vfio_spimdev_queue *q);
> > + int (*is_q_updated)(struct vfio_spimdev_queue *q);
> > + void (*mask_notify)(struct vfio_spimdev_queue *q, int
> > event_mask);
> > + int (*mmap)(struct vfio_spimdev_queue *q, struct vm_area_struct
> > *vma);
> > + int (*reset)(struct vfio_spimdev *spimdev);
> > + int (*reset_queue)(struct vfio_spimdev_queue *q);
> > + long (*ioctl)(struct vfio_spimdev_queue *q, unsigned int cmd,
> > + unsigned long arg);
> > + int (*get_available_instances)(struct vfio_spimdev *spimdev);
> > +};
> > +
> > +struct vfio_spimdev_queue {
> > + struct mutex mutex;
> > + struct vfio_spimdev *spimdev;
> > + int qid;
> > + __u32 flags;
> > + void *priv;
> > + wait_queue_head_t wait;
> > + struct mdev_device *mdev;
> > + int fd;
> > + int container;
> > +#ifdef CONFIG_IOMMU_SVA
> > + int pasid;
> > +#endif
> > +};
> > +
> > +struct vfio_spimdev {
> > + const char *name;
> > + int status;
> > + atomic_t ref;
> > + struct module *owner;
> > + const struct vfio_spimdev_ops *ops;
> > + struct device *dev;
> > + struct device cls_dev;
> > + bool is_vf;
> > + u32 iommu_type;
> > + u32 dma_flag;
> > + u32 dev_id;
> > + void *priv;
> > + int flags;
> > + const char *api_ver;
> > + struct mdev_parent_ops mdev_fops;
> > +};
> > +
> > +int vfio_spimdev_register(struct vfio_spimdev *spimdev);
> > +void vfio_spimdev_unregister(struct vfio_spimdev *spimdev);
> > +void vfio_spimdev_wake_up(struct vfio_spimdev_queue *q);
> > +int vfio_spimdev_is_spimdev(struct device *dev);
> > +struct vfio_spimdev *vfio_spimdev_pdev_spimdev(struct device *dev);
> > +int vfio_spimdev_pasid_pri_check(int pasid);
> > +int vfio_spimdev_get(struct device *dev);
> > +int vfio_spimdev_put(struct device *dev);
> > +struct vfio_spimdev *mdev_spimdev(struct mdev_device *mdev);
> > +
> > +extern struct mdev_type_attribute mdev_type_attr_flags;
> > +extern struct mdev_type_attribute mdev_type_attr_name;
> > +extern struct mdev_type_attribute mdev_type_attr_device_api;
> > +extern struct mdev_type_attribute mdev_type_attr_available_instances;
> > +#define VFIO_SPIMDEV_DEFAULT_MDEV_TYPE_ATTRS \
> > + &mdev_type_attr_name.attr, \
> > + &mdev_type_attr_device_api.attr, \
> > + &mdev_type_attr_available_instances.attr, \
> > + &mdev_type_attr_flags.attr
> > +
> > +#define _VFIO_SPIMDEV_REGION(vm_pgoff) (vm_pgoff & 0xf)
> > +
> > +#endif
> > diff --git a/include/uapi/linux/vfio_spimdev.h
> > b/include/uapi/linux/vfio_spimdev.h
> > new file mode 100644
> > index 000000000000..3435e5c345b4
> > --- /dev/null
> > +++ b/include/uapi/linux/vfio_spimdev.h
> > @@ -0,0 +1,28 @@
> > +/* SPDX-License-Identifier: GPL-2.0+ */
> > +#ifndef _UAPIVFIO_SPIMDEV_H
> > +#define _UAPIVFIO_SPIMDEV_H
> > +
> > +#include <linux/ioctl.h>
> > +
> > +#define VFIO_SPIMDEV_CLASS_NAME "spimdev"
> > +
> > +/* Device ATTRs in parent dev SYSFS DIR */
> > +#define VFIO_SPIMDEV_PDEV_ATTRS_GRP_NAME "params"
> > +
> > +/* Parent device attributes */
> > +#define SPIMDEV_IOMMU_TYPE "iommu_type"
> > +#define SPIMDEV_DMA_FLAG "dma_flag"
> > +
> > +/* Maximum length of algorithm name string */
> > +#define VFIO_SPIMDEV_ALG_NAME_SIZE 64
> > +
> > +/* the bits used in SPIMDEV_DMA_FLAG attributes */
> > +#define VFIO_SPIMDEV_DMA_INVALID 0
> > +#define VFIO_SPIMDEV_DMA_SINGLE_PROC_MAP 1
> > +#define VFIO_SPIMDEV_DMA_MULTI_PROC_MAP 2
> > +#define VFIO_SPIMDEV_DMA_SVM 4
> > +#define VFIO_SPIMDEV_DMA_SVM_NO_FAULT 8
> > +#define VFIO_SPIMDEV_DMA_PHY 16
> > +
> > +#define VFIO_SPIMDEV_CMD_GET_Q _IO('W', 1)
> > +#endif
> > --
> > 2.17.1

--
-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

2018-08-02 04:05:57

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Thu, Aug 02, 2018 at 02:33:12AM +0000, Tian, Kevin wrote:
> Date: Thu, 2 Aug 2018 02:33:12 +0000
> From: "Tian, Kevin" <[email protected]>
> To: Jerome Glisse <[email protected]>, Kenneth Lee <[email protected]>
> CC: Hao Fang <[email protected]>, Herbert Xu
> <[email protected]>, "[email protected]"
> <[email protected]>, Jonathan Corbet <[email protected]>, Greg
> Kroah-Hartman <[email protected]>, "[email protected]"
> <[email protected]>, "Kumar, Sanjay K" <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>, Alex Williamson
> <[email protected]>, Thomas Gleixner <[email protected]>,
> "[email protected]" <[email protected]>, Philippe
> Ombredanne <[email protected]>, Zaibo Xu <[email protected]>, Kenneth
> Lee <[email protected]>, "David S . Miller" <[email protected]>,
> "[email protected]"
> <[email protected]>
> Subject: RE: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
> Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D191290E1A@SHSMSX101.ccr.corp.intel.com>
>
> > From: Jerome Glisse
> > Sent: Thursday, August 2, 2018 12:57 AM
> >
> > On Wed, Aug 01, 2018 at 06:22:14PM +0800, Kenneth Lee wrote:
> > > From: Kenneth Lee <[email protected]>
> > >
> > > WarpDrive is an accelerator framework to expose the hardware
> > capabilities
> > > directly to the user space. It makes use of the exist vfio and vfio-mdev
> > > facilities. So the user application can send request and DMA to the
> > > hardware without interaction with the kernel. This remove the latency
> > > of syscall and context switch.
> > >
> > > The patchset contains documents for the detail. Please refer to it for
> > more
> > > information.
> > >
> > > This patchset is intended to be used with Jean Philippe Brucker's SVA
> > > patch [1] (Which is also in RFC stage). But it is not mandatory. This
> > > patchset is tested in the latest mainline kernel without the SVA patches.
> > > So it support only one process for each accelerator.
> > >
> > > With SVA support, WarpDrive can support multi-process in the same
> > > accelerator device. We tested it in our SoC integrated Accelerator (board
> > > ID: D06, Chip ID: HIP08). A reference work tree can be found here: [2].
> >
> > I have not fully inspected things nor do i know enough about
> > this Hisilicon ZIP accelerator to ascertain, but from glimpsing
> > at the code it seems that it is unsafe to use even with SVA due
> > to the doorbell. There is a comment talking about safetyness
> > in patch 7.
> >
> > Exposing thing to userspace is always enticing, but if it is
> > a security risk then it should clearly say so and maybe a
> > kernel boot flag should be necessary to allow such device to
> > be use.
> >

But doorbell is just a notification. Except for DOS (to make hardware busy) it
cannot actually take or change anything from the kernel space. And the DOS
problem can be always taken as the problem that a group of processes share the
same kernel entity.

In the coming HIP09 hardware, the doorbell will come with a random number so
only the process who allocated the queue can knock it correctly.
> >
> > My more general question is do we want to grow VFIO to become
> > a more generic device driver API. This patchset adds a command
> > queue concept to it (i don't think it exist today but i have
> > not follow VFIO closely).
> >

The thing is, VFIO is the only place to support DMA from user land. If we don't
put it here, we have to create another similar facility to support the same.

> > Why is that any better that existing driver model ? Where a
> > device create a device file (can be character device, block
> > device, ...). Such models also allow for direct hardware
> > access from userspace. For instance see the AMD KFD driver
> > inside drivers/gpu/drm/amd
>
> One motivation I guess, is that most accelerators lack of a
> well-abstracted high level APIs similar to GPU side (e.g. OpenCL
> clearly defines Shared Virtual Memory models). VFIO mdev
> might be an alternative common interface to enable SVA usages
> on various accelerators...
>
Yes.
> >
> > So you can already do what you are doing with the Hisilicon
> > driver today without this new infrastructure. This only need
> > hardware that have command queue and doorbell like mechanisms.
> >
> >
> > Unlike mdev which unify a very high level concept, it seems
> > to me spimdev just introduce low level concept (namely command
> > queue) and i don't see the intrinsic value here.
> >
As I have said, VFIO is the only place support DMA from user space now.
> >
> > Cheers,
> > Jérôme
> > _______________________________________________
> > iommu mailing list
> > [email protected]
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu

--
-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

2018-08-02 04:15:47

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 2/7] iommu: Add share domain interface in iommu for spimdev

On Thu, Aug 02, 2018 at 03:17:03AM +0000, Tian, Kevin wrote:
> Date: Thu, 2 Aug 2018 03:17:03 +0000
> From: "Tian, Kevin" <[email protected]>
> To: Kenneth Lee <[email protected]>, Jonathan Corbet <[email protected]>,
> Herbert Xu <[email protected]>, "David S . Miller"
> <[email protected]>, Joerg Roedel <[email protected]>, Alex Williamson
> <[email protected]>, Kenneth Lee <[email protected]>, Hao
> Fang <[email protected]>, Zhou Wang <[email protected]>, Zaibo Xu
> <[email protected]>, Philippe Ombredanne <[email protected]>, Greg
> Kroah-Hartman <[email protected]>, Thomas Gleixner
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, Lu Baolu
> <[email protected]>, "Kumar, Sanjay K" <[email protected]>
> CC: "[email protected]" <[email protected]>
> Subject: RE: [RFC PATCH 2/7] iommu: Add share domain interface in iommu for
> spimdev
> Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D191290F49@SHSMSX101.ccr.corp.intel.com>
>
> > From: Kenneth Lee
> > Sent: Wednesday, August 1, 2018 6:22 PM
> >
> > From: Kenneth Lee <[email protected]>
> >
> > This patch add sharing interface for a iommu_group. The new interface:
> >
> > iommu_group_share_domain()
> > iommu_group_unshare_domain()
> >
> > can be used by some virtual iommu_group (such as iommu_group for
> > spimdev)
> > to share their parent's iommu_group.
> >
> > When the domain of the group is shared, it cannot be changed before
> > unshared. In the future, notification can be added if update is required.
>
> Is it necessary or just asking VFIO to use parent domain directly?
>
Even we add to VFIO, the iommu still need to be changed. We can move the type1
part to VFIO if we have agreement in RFC stage.
> >
> > Signed-off-by: Kenneth Lee <[email protected]>
> > ---
> > drivers/iommu/iommu.c | 28 +++++++++++++++++++++++++++-
> > include/linux/iommu.h | 2 ++
> > 2 files changed, 29 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 63b37563db7e..a832aafe660d 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -54,6 +54,9 @@ struct iommu_group {
> > int id;
> > struct iommu_domain *default_domain;
> > struct iommu_domain *domain;
> > + atomic_t domain_shared_ref; /* Number of user of current domain.
> > + * The domain cannot be modified if ref >
> > 0
> > + */
> > };
> >
> > struct group_device {
> > @@ -353,6 +356,7 @@ struct iommu_group *iommu_group_alloc(void)
> > return ERR_PTR(ret);
> > }
> > group->id = ret;
> > + atomic_set(&group->domain_shared_ref, 0);
> >
> > ret = kobject_init_and_add(&group->kobj, &iommu_group_ktype,
> > NULL, "%d", group->id);
> > @@ -482,6 +486,25 @@ int iommu_group_set_name(struct iommu_group
> > *group, const char *name)
> > }
> > EXPORT_SYMBOL_GPL(iommu_group_set_name);
> >
> > +struct iommu_domain *iommu_group_share_domain(struct
> > iommu_group *group)
> > +{
> > + /* the domain can be shared only when the default domain is used
> > */
> > + /* todo: more shareable check */
> > + if (group->domain != group->default_domain)
> > + return ERR_PTR(-EINVAL);
> > +
> > + atomic_inc(&group->domain_shared_ref);
> > + return group->domain;
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_group_share_domain);
> > +
> > +void iommu_group_unshare_domain(struct iommu_group *group)
> > +{
> > + atomic_dec(&group->domain_shared_ref);
> > + WARN_ON(atomic_read(&group->domain_shared_ref) < 0);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_group_unshare_domain);
> > +
> > static int iommu_group_create_direct_mappings(struct iommu_group
> > *group,
> > struct device *dev)
> > {
> > @@ -1401,7 +1424,8 @@ static int __iommu_attach_group(struct
> > iommu_domain *domain,
> > {
> > int ret;
> >
> > - if (group->default_domain && group->domain != group-
> > >default_domain)
> > + if ((group->default_domain && group->domain != group-
> > >default_domain) ||
> > + atomic_read(&group->domain_shared_ref) > 0)
> > return -EBUSY;
> >
> > ret = __iommu_group_for_each_dev(group, domain,
> > @@ -1438,6 +1462,8 @@ static void __iommu_detach_group(struct
> > iommu_domain *domain,
> > {
> > int ret;
> >
> > + WARN_ON(atomic_read(&group->domain_shared_ref) > 0);
> > +
> > if (!group->default_domain) {
> > __iommu_group_for_each_dev(group, domain,
> > iommu_group_do_detach_device);
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 19938ee6eb31..278d60e3ec39 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -349,6 +349,8 @@ extern int iommu_domain_get_attr(struct
> > iommu_domain *domain, enum iommu_attr,
> > void *data);
> > extern int iommu_domain_set_attr(struct iommu_domain *domain, enum
> > iommu_attr,
> > void *data);
> > +extern struct iommu_domain *iommu_group_share_domain(struct
> > iommu_group *group);
> > +extern void iommu_group_unshare_domain(struct iommu_group *group);
> >
> > /* Window handling function prototypes */
> > extern int iommu_domain_window_enable(struct iommu_domain
> > *domain, u32 wnd_nr,
> > --
> > 2.17.1

--
-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

2018-08-02 04:22:35

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 1/7] vfio/spimdev: Add documents for WarpDrive framework

On Thu, Aug 02, 2018 at 03:14:38AM +0000, Tian, Kevin wrote:
> Date: Thu, 2 Aug 2018 03:14:38 +0000
> From: "Tian, Kevin" <[email protected]>
> To: Kenneth Lee <[email protected]>, Jonathan Corbet <[email protected]>,
> Herbert Xu <[email protected]>, "David S . Miller"
> <[email protected]>, Joerg Roedel <[email protected]>, Alex Williamson
> <[email protected]>, Kenneth Lee <[email protected]>, Hao
> Fang <[email protected]>, Zhou Wang <[email protected]>, Zaibo Xu
> <[email protected]>, Philippe Ombredanne <[email protected]>, Greg
> Kroah-Hartman <[email protected]>, Thomas Gleixner
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, Lu Baolu
> <[email protected]>, "Kumar, Sanjay K" <[email protected]>
> CC: "[email protected]" <[email protected]>
> Subject: RE: [RFC PATCH 1/7] vfio/spimdev: Add documents for WarpDrive
> framework
> Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D191290F04@SHSMSX101.ccr.corp.intel.com>
>
> > From: Kenneth Lee
> > Sent: Wednesday, August 1, 2018 6:22 PM
> >
> > From: Kenneth Lee <[email protected]>
> >
> > WarpDrive is a common user space accelerator framework. Its main
> > component
> > in Kernel is called spimdev, Share Parent IOMMU Mediated Device. It
>
> Not sure whether "share parent IOMMU" is a good term here. better
> stick to what capabity you bring to user space, instead of describing
> internal trick...
Agree. The "SPI" is also strongly hint to "Share Peripheral Interrupt" or "Serial
Peripheral Interface". Personally I hate this name. But I cannot find a better
one. In the user space, we can use a impassible name (WarpDrive). But kernel
require a name point to its feature... maybe I should use "Process-Shared Mdev"
in next version?
>
> > exposes
> > the hardware capabilities to the user space via vfio-mdev. So processes in
> > user land can obtain a "queue" by open the device and direct access the
> > hardware MMIO space or do DMA operation via VFIO interface.
> >
> > WarpDrive is intended to be used with Jean Philippe Brucker's SVA patchset
> > (it is still in RFC stage) to support multi-process. But This is not a must.
> > Without the SVA patches, WarpDrive can still work for one process for
> > every
> > hardware device.
> >
> > This patch add detail documents for the framework.
> >
> > Signed-off-by: Kenneth Lee <[email protected]>
> > ---
> > Documentation/00-INDEX | 2 +
> > Documentation/warpdrive/warpdrive.rst | 153 ++++++
> > Documentation/warpdrive/wd-arch.svg | 732
> > ++++++++++++++++++++++++++
> > Documentation/warpdrive/wd.svg | 526 ++++++++++++++++++
> > 4 files changed, 1413 insertions(+)
> > create mode 100644 Documentation/warpdrive/warpdrive.rst
> > create mode 100644 Documentation/warpdrive/wd-arch.svg
> > create mode 100644 Documentation/warpdrive/wd.svg
> >
> > diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
> > index 2754fe83f0d4..9959affab599 100644
> > --- a/Documentation/00-INDEX
> > +++ b/Documentation/00-INDEX
> > @@ -410,6 +410,8 @@ vm/
> > - directory with info on the Linux vm code.
> > w1/
> > - directory with documents regarding the 1-wire (w1) subsystem.
> > +warpdrive/
> > + - directory with documents about WarpDrive accelerator
> > framework.
> > watchdog/
> > - how to auto-reboot Linux if it has "fallen and can't get up". ;-)
> > wimax/
> > diff --git a/Documentation/warpdrive/warpdrive.rst
> > b/Documentation/warpdrive/warpdrive.rst
> > new file mode 100644
> > index 000000000000..3792b2780ea6
> > --- /dev/null
> > +++ b/Documentation/warpdrive/warpdrive.rst
> > @@ -0,0 +1,153 @@
> > +Introduction of WarpDrive
> > +=========================
> > +
> > +*WarpDrive* is a general accelerator framework built on top of vfio.
> > +It can be taken as a light weight virtual function, which you can use
> > without
> > +*SR-IOV* like facility and can be shared among multiple processes.
> > +
> > +It can be used as the quick channel for accelerators, network adaptors or
> > +other hardware in user space. It can make some implementation simpler.
> > E.g.
> > +you can reuse most of the *netdev* driver and just share some ring buffer
> > to
> > +the user space driver for *DPDK* or *ODP*. Or you can combine the RSA
> > +accelerator with the *netdev* in the user space as a Web reversed proxy,
> > etc.
> > +
> > +The name *WarpDrive* is simply a cool and general name meaning the
> > framework
> > +makes the application faster. In kernel, the framework is called SPIMDEV,
> > +namely "Share Parent IOMMU Mediated Device".
> > +
> > +
> > +How does it work
> > +================
> > +
> > +*WarpDrive* takes the Hardware Accelerator as a heterogeneous
> > processor which
> > +can share some load for the CPU:
> > +
> > +.. image:: wd.svg
> > + :alt: This is a .svg image, if your browser cannot show it,
> > + try to download and view it locally
> > +
> > +So it provides the capability to the user application to:
> > +
> > +1. Send request to the hardware
> > +2. Share memory with the application and other accelerators
> > +
> > +These requirements can be fulfilled by VFIO if the accelerator can serve
> > each
> > +application with a separated Virtual Function. But a *SR-IOV* like VF (we
> > will
> > +call it *HVF* hereinafter) design is too heavy for the accelerator which
> > +service thousands of processes.
> > +
> > +And the *HVF* is not good for the scenario that a device keep most of its
> > +resource but share partial of the function to the user space. E.g. a *NIC*
> > +works as a *netdev* but share some hardware queues to the user
> > application to
> > +send packets direct to the hardware.
> > +
> > +*VFIO-mdev* can solve some of the problem here. But *VFIO-mdev* has
> > two problem:
> > +
> > +1. it cannot make use of its parent device's IOMMU.
> > +2. it is assumed to be openned only once.
> > +
> > +So it will need some add-on for better resource control and let the VFIO
> > +driver be aware of this.
> > +
> > +
> > +Architecture
> > +------------
> > +
> > +The full *WarpDrive* architecture is represented in the following class
> > +diagram:
> > +
> > +.. image:: wd-arch.svg
> > + :alt: This is a .svg image, if your browser cannot show it,
> > + try to download and view it locally
> > +
> > +The idea is: when a device is probed, it can be registered to the general
> > +framework, e.g. *netdev* or *crypto*, and the *SPIMDEV* at the same
> > time.
> > +
> > +If *SPIMDEV* is registered. A *mdev* creation interface is created. Then
> > the
> > +system administrator can create a *mdev* in the user space and set its
> > +parameters via its sysfs interfacev. But not like the other mdev
> > +implementation, hardware resource will not be allocated until it is opened
> > by
> > +an application.
> > +
> > +With this strategy, the hardware resource can be easily scheduled among
> > +multiple processes.
> > +
> > +
> > +The user API
> > +------------
> > +
> > +We adopt a polling style interface in the user space: ::
> > +
> > + int wd_request_queue(int container, struct wd_queue *q,
> > + const char *mdev)
> > + void wd_release_queue(struct wd_queue *q);
> > +
> > + int wd_send(struct wd_queue *q, void *req);
> > + int wd_recv(struct wd_queue *q, void **req);
> > + int wd_recv_sync(struct wd_queue *q, void **req);
> > +
> > +the ..._sync() interface is a wrapper to the non sync version. They wait on
> > the
> > +device until the queue become available.
> > +
> > +Memory can be done by VFIO DMA API. Or the following helper function
> > can be
> > +adopted: ::
> > +
> > + int wd_mem_share(struct wd_queue *q, const void *addr,
> > + size_t size, int flags);
> > + void wd_mem_unshare(struct wd_queue *q, const void *addr, size_t
> > size);
> > +
> > +Todo: if the IOMMU support *ATS* or *SMMU* stall mode. mem share is
> > not
> > +necessary. This can be check with SPImdev sysfs interface.
> > +
> > +The user API is not mandatory. It is simply a suggestion and hint what the
> > +kernel interface is supposed to support.
> > +
> > +
> > +The user driver
> > +---------------
> > +
> > +*WarpDrive* expose the hardware IO space to the user process (via
> > *mmap*). So
> > +it will require user driver for implementing the user API. The following API
> > +is suggested for a user driver: ::
> > +
> > + int open(struct wd_queue *q);
> > + int close(struct wd_queue *q);
> > + int send(struct wd_queue *q, void *req);
> > + int recv(struct wd_queue *q, void **req);
> > +
> > +These callback enable the communication between the user application
> > and the
> > +device. You will still need the hardware-depend algorithm driver to access
> > the
> > +algorithm functionality of the accelerator itself.
> > +
> > +
> > +Multiple processes support
> > +==========================
> > +
> > +In the latest mainline kernel (4.18) when this document is written.
> > +Multi-process is not supported in VFIO yet.
> > +
> > +*JPB* has a patchset to enable this[2]_. We have tested it with our
> > hardware
> > +(which is known as *D06*). It works well. *WarpDrive* rely on them to
> > support
> > +multiple processes. If it is not enabled, *WarpDrive* can still work, but it
> > +support only one process, which will share the same io map table with
> > kernel
> > +(but the user application cannot access the kernel address, So it is not
> > going
> > +to be a security problem)
> > +
> > +
> > +Legacy Mode Support
> > +===================
> > +For the hardware on which IOMMU is not support, WarpDrive can run on
> > *NOIOMMU*
> > +mode.
> > +
> > +
> > +References
> > +==========
> > +.. [1] Accroding to the comment in in mm/gup.c, The *gup* is only safe
> > within
> > + a syscall. Because it can only keep the physical memory in place
> > + without making sure the VMA will always point to it. Maybe we should
> > + raise the VM_PINNED patchset (see
> > + https://lists.gt.net/linux/kernel/1931993) again to solve this problem.
> > +.. [2] https://patchwork.kernel.org/patch/10394851/
> > +.. [3] https://zhuanlan.zhihu.com/p/35489035
> > +
> > +.. vim: tw=78
> > diff --git a/Documentation/warpdrive/wd-arch.svg
> > b/Documentation/warpdrive/wd-arch.svg
> > new file mode 100644
> > index 000000000000..1b3d1817c4ba
> > --- /dev/null
> > +++ b/Documentation/warpdrive/wd-arch.svg
> > @@ -0,0 +1,732 @@
> > +<?xml version="1.0" encoding="UTF-8" standalone="no"?>
> > +<!-- Created with Inkscape (http://www.inkscape.org/) -->
> > +
> > +<svg
> > + xmlns:dc="http://purl.org/dc/elements/1.1/"
> > + xmlns:cc="http://creativecommons.org/ns#"
> > + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> > + xmlns:svg="http://www.w3.org/2000/svg"
> > + xmlns="http://www.w3.org/2000/svg"
> > + xmlns:xlink="http://www.w3.org/1999/xlink"
> > + xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
> > + xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
> > + width="210mm"
> > + height="193mm"
> > + viewBox="0 0 744.09449 683.85823"
> > + id="svg2"
> > + version="1.1"
> > + inkscape:version="0.92.3 (2405546, 2018-03-11)"
> > + sodipodi:docname="wd-arch.svg">
> > + <defs
> > + id="defs4">
> > + <linearGradient
> > + inkscape:collect="always"
> > + id="linearGradient6830">
> > + <stop
> > + style="stop-color:#000000;stop-opacity:1;"
> > + offset="0"
> > + id="stop6832" />
> > + <stop
> > + style="stop-color:#000000;stop-opacity:0;"
> > + offset="1"
> > + id="stop6834" />
> > + </linearGradient>
> > + <linearGradient
> > + inkscape:collect="always"
> > + xlink:href="#linearGradient5026"
> > + id="linearGradient5032"
> > + x1="353"
> > + y1="211.3622"
> > + x2="565.5"
> > + y2="174.8622"
> > + gradientUnits="userSpaceOnUse"
> > + gradientTransform="translate(-89.949614,405.94594)" />
> > + <linearGradient
> > + inkscape:collect="always"
> > + id="linearGradient5026">
> > + <stop
> > + style="stop-color:#f2f2f2;stop-opacity:1;"
> > + offset="0"
> > + id="stop5028" />
> > + <stop
> > + style="stop-color:#f2f2f2;stop-opacity:0;"
> > + offset="1"
> > + id="stop5030" />
> > + </linearGradient>
> > + <filter
> > + inkscape:collect="always"
> > + style="color-interpolation-filters:sRGB"
> > + id="filter4169-3"
> > + x="-0.031597666"
> > + width="1.0631953"
> > + y="-0.099812768"
> > + height="1.1996255">
> > + <feGaussianBlur
> > + inkscape:collect="always"
> > + stdDeviation="1.3307599"
> > + id="feGaussianBlur4171-6" />
> > + </filter>
> > + <linearGradient
> > + inkscape:collect="always"
> > + xlink:href="#linearGradient5026"
> > + id="linearGradient5032-1"
> > + x1="353"
> > + y1="211.3622"
> > + x2="565.5"
> > + y2="174.8622"
> > + gradientUnits="userSpaceOnUse"
> > + gradientTransform="translate(175.77842,400.29111)" />
> > + <filter
> > + inkscape:collect="always"
> > + style="color-interpolation-filters:sRGB"
> > + id="filter4169-3-0"
> > + x="-0.031597666"
> > + width="1.0631953"
> > + y="-0.099812768"
> > + height="1.1996255">
> > + <feGaussianBlur
> > + inkscape:collect="always"
> > + stdDeviation="1.3307599"
> > + id="feGaussianBlur4171-6-9" />
> > + </filter>
> > + <marker
> > + markerWidth="18.960653"
> > + markerHeight="11.194658"
> > + refX="9.4803267"
> > + refY="5.5973287"
> > + orient="auto"
> > + id="marker4613">
> > + <rect
> > + y="-5.1589785"
> > + x="5.8504119"
> > + height="10.317957"
> > + width="10.317957"
> > + id="rect4212"
> > + style="fill:#ffffff;stroke:#000000;stroke-width:0.69143367;stroke-
> > miterlimit:4;stroke-dasharray:none"
> > + transform="matrix(0.86111274,0.50841405,-
> > 0.86111274,0.50841405,0,0)">
> > + <title
> > + id="title4262">generation</title>
> > + </rect>
> > + </marker>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <linearGradient
> > + inkscape:collect="always"
> > + xlink:href="#linearGradient5026"
> > + id="linearGradient5032-3-9"
> > + x1="353"
> > + y1="211.3622"
> > + x2="565.5"
> > + y2="174.8622"
> > + gradientUnits="userSpaceOnUse"
> > + gradientTransform="matrix(1.2452511,0,0,0.98513016,-
> > 190.95632,540.33156)" />
> > + <filter
> > + inkscape:collect="always"
> > + style="color-interpolation-filters:sRGB"
> > + id="filter4169-3-5-8"
> > + x="-0.031597666"
> > + width="1.0631953"
> > + y="-0.099812768"
> > + height="1.1996255">
> > + <feGaussianBlur
> > + inkscape:collect="always"
> > + stdDeviation="1.3307599"
> > + id="feGaussianBlur4171-6-3-9" />
> > + </filter>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-2">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-9"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-2-1">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-9-9"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <linearGradient
> > + inkscape:collect="always"
> > + xlink:href="#linearGradient5026"
> > + id="linearGradient5032-3-9-7"
> > + x1="353"
> > + y1="211.3622"
> > + x2="565.5"
> > + y2="174.8622"
> > + gradientUnits="userSpaceOnUse"
> > + gradientTransform="matrix(1.3742742,0,0,0.97786398,-
> > 234.52617,654.63367)" />
> > + <filter
> > + inkscape:collect="always"
> > + style="color-interpolation-filters:sRGB"
> > + id="filter4169-3-5-8-5"
> > + x="-0.031597666"
> > + width="1.0631953"
> > + y="-0.099812768"
> > + height="1.1996255">
> > + <feGaussianBlur
> > + inkscape:collect="always"
> > + stdDeviation="1.3307599"
> > + id="feGaussianBlur4171-6-3-9-0" />
> > + </filter>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-2-6">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-9-1"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <linearGradient
> > + inkscape:collect="always"
> > + xlink:href="#linearGradient5026"
> > + id="linearGradient5032-3-9-4"
> > + x1="353"
> > + y1="211.3622"
> > + x2="565.5"
> > + y2="174.8622"
> > + gradientUnits="userSpaceOnUse"
> > + gradientTransform="matrix(1.3742912,0,0,2.0035845,-
> > 468.34428,342.56603)" />
> > + <filter
> > + inkscape:collect="always"
> > + style="color-interpolation-filters:sRGB"
> > + id="filter4169-3-5-8-54"
> > + x="-0.031597666"
> > + width="1.0631953"
> > + y="-0.099812768"
> > + height="1.1996255">
> > + <feGaussianBlur
> > + inkscape:collect="always"
> > + stdDeviation="1.3307599"
> > + id="feGaussianBlur4171-6-3-9-7" />
> > + </filter>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-2-1-8">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-9-9-6"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-2-1-8-8">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-9-9-6-9"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-0">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-93"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-0-2">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-93-6"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <filter
> > + inkscape:collect="always"
> > + style="color-interpolation-filters:sRGB"
> > + id="filter5382"
> > + x="-0.089695387"
> > + width="1.1793908"
> > + y="-0.10052069"
> > + height="1.2010413">
> > + <feGaussianBlur
> > + inkscape:collect="always"
> > + stdDeviation="0.86758925"
> > + id="feGaussianBlur5384" />
> > + </filter>
> > + <linearGradient
> > + inkscape:collect="always"
> > + xlink:href="#linearGradient6830"
> > + id="linearGradient6836"
> > + x1="362.73923"
> > + y1="700.04059"
> > + x2="340.4751"
> > + y2="678.25488"
> > + gradientUnits="userSpaceOnUse"
> > + gradientTransform="translate(-23.771026,-135.76835)" />
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-2-6-2">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-9-1-9"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + </defs>
> > + <sodipodi:namedview
> > + id="base"
> > + pagecolor="#ffffff"
> > + bordercolor="#666666"
> > + borderopacity="1.0"
> > + inkscape:pageopacity="0.0"
> > + inkscape:pageshadow="2"
> > + inkscape:zoom="0.98994949"
> > + inkscape:cx="222.32868"
> > + inkscape:cy="370.44492"
> > + inkscape:document-units="px"
> > + inkscape:current-layer="layer1"
> > + showgrid="false"
> > + inkscape:window-width="1916"
> > + inkscape:window-height="1033"
> > + inkscape:window-x="0"
> > + inkscape:window-y="22"
> > + inkscape:window-maximized="0"
> > + fit-margin-right="0.3"
> > + inkscape:snap-global="false" />
> > + <metadata
> > + id="metadata7">
> > + <rdf:RDF>
> > + <cc:Work
> > + rdf:about="">
> > + <dc:format>image/svg+xml</dc:format>
> > + <dc:type
> > + rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
> > + <dc:title />
> > + </cc:Work>
> > + </rdf:RDF>
> > + </metadata>
> > + <g
> > + inkscape:label="Layer 1"
> > + inkscape:groupmode="layer"
> > + id="layer1"
> > + transform="translate(0,-368.50374)">
> > + <rect
> > + style="fill:#000000;stroke:#000000;stroke-
> > width:0.6465112;filter:url(#filter4169-3)"
> > + id="rect4136-3-6"
> > + width="101.07784"
> > + height="31.998148"
> > + x="283.01144"
> > + y="588.80896" />
> > + <rect
> > + style="fill:url(#linearGradient5032);fill-
> > opacity:1;stroke:#000000;stroke-width:0.6465112"
> > + id="rect4136-2"
> > + width="101.07784"
> > + height="31.998148"
> > + x="281.63498"
> > + y="586.75739" />
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> > opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> > linejoin:miter;stroke-opacity:1"
> > + x="294.21747"
> > + y="612.50073"
> > + id="text4138-6"><tspan
> > + sodipodi:role="line"
> > + id="tspan4140-1"
> > + x="294.21747"
> > + y="612.50073"
> > + style="font-size:15px;line-height:1.25">WarpDrive</tspan></text>
> > + <rect
> > + style="fill:#000000;stroke:#000000;stroke-
> > width:0.6465112;filter:url(#filter4169-3-0)"
> > + id="rect4136-3-6-3"
> > + width="101.07784"
> > + height="31.998148"
> > + x="548.7395"
> > + y="583.15417" />
> > + <rect
> > + style="fill:url(#linearGradient5032-1);fill-
> > opacity:1;stroke:#000000;stroke-width:0.6465112"
> > + id="rect4136-2-60"
> > + width="101.07784"
> > + height="31.998148"
> > + x="547.36304"
> > + y="581.1026" />
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> > opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> > linejoin:miter;stroke-opacity:1"
> > + x="557.83484"
> > + y="602.32745"
> > + id="text4138-6-6"><tspan
> > + sodipodi:role="line"
> > + id="tspan4140-1-2"
> > + x="557.83484"
> > + y="602.32745"
> > + style="font-size:15px;line-height:1.25">user_driver</tspan></text>
> > + <path
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> > opacity:1;marker-end:url(#marker4613)"
> > + d="m 547.36304,600.78954 -156.58203,0.0691"
> > + id="path4855"
> > + inkscape:connector-curvature="0"
> > + sodipodi:nodetypes="cc" />
> > + <rect
> > + style="fill:#000000;stroke:#000000;stroke-
> > width:0.6465112;filter:url(#filter4169-3-5-8)"
> > + id="rect4136-3-6-5-7"
> > + width="101.07784"
> > + height="31.998148"
> > + x="128.74678"
> > + y="80.648842"
> > + transform="matrix(1.2452511,0,0,0.98513016,113.15182,641.02594)"
> > />
> > + <rect
> > + style="fill:url(#linearGradient5032-3-9);fill-
> > opacity:1;stroke:#000000;stroke-width:0.71606314"
> > + id="rect4136-2-6-3"
> > + width="125.86729"
> > + height="31.522341"
> > + x="271.75983"
> > + y="718.45435" />
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> > opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> > linejoin:miter;stroke-opacity:1"
> > + x="306.29599"
> > + y="746.50073"
> > + id="text4138-6-2-6"><tspan
> > + sodipodi:role="line"
> > + id="tspan4140-1-9-1"
> > + x="306.29599"
> > + y="746.50073"
> > + style="font-size:15px;line-height:1.25">spimdev</tspan></text>
> > + <path
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> > opacity:1;marker-end:url(#marker4825-6-2)"
> > + d="m 329.57309,619.72453 5.0373,97.14447"
> > + id="path4661-3"
> > + inkscape:connector-curvature="0"
> > + sodipodi:nodetypes="cc" />
> > + <path
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> > opacity:1;marker-end:url(#marker4825-6-2-1)"
> > + d="m 342.57219,830.63108 -5.67699,-79.2841"
> > + id="path4661-3-4"
> > + inkscape:connector-curvature="0"
> > + sodipodi:nodetypes="cc" />
> > + <rect
> > + style="fill:#000000;stroke:#000000;stroke-
> > width:0.6465112;filter:url(#filter4169-3-5-8-5)"
> > + id="rect4136-3-6-5-7-3"
> > + width="101.07784"
> > + height="31.998148"
> > + x="128.74678"
> > + y="80.648842"
> > + transform="matrix(1.3742742,0,0,0.97786398,101.09126,754.58534)"
> > />
> > + <rect
> > + style="fill:url(#linearGradient5032-3-9-7);fill-
> > opacity:1;stroke:#000000;stroke-width:0.74946606"
> > + id="rect4136-2-6-3-6"
> > + width="138.90866"
> > + height="31.289837"
> > + x="276.13297"
> > + y="831.44263" />
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> > opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> > linejoin:miter;stroke-opacity:1"
> > + x="295.67819"
> > + y="852.98224"
> > + id="text4138-6-2-6-1"><tspan
> > + sodipodi:role="line"
> > + id="tspan4140-1-9-1-0"
> > + x="295.67819"
> > + y="852.98224"
> > + style="font-size:15px;line-height:1.25">Device Driver</tspan></text>
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> > opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> > linejoin:miter;stroke-opacity:1"
> > + x="349.31198"
> > + y="829.46118"
> > + id="text4138-6-2-6-1-6"><tspan
> > + sodipodi:role="line"
> > + id="tspan4140-1-9-1-0-3"
> > + x="349.31198"
> > + y="829.46118"
> > + style="font-size:15px;line-height:1.25">*</tspan></text>
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> > opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> > linejoin:miter;stroke-opacity:1"
> > + x="349.98282"
> > + y="768.698"
> > + id="text4138-6-2-6-1-6-2"><tspan
> > + sodipodi:role="line"
> > + id="tspan4140-1-9-1-0-3-0"
> > + x="349.98282"
> > + y="768.698"
> > + style="font-size:15px;line-height:1.25">1</tspan></text>
> > + <path
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> > opacity:1;marker-end:url(#marker4825-6-2-6)"
> > + d="m 568.1238,614.05402 0.51369,333.80219"
> > + id="path4661-3-5"
> > + inkscape:connector-curvature="0"
> > + sodipodi:nodetypes="cc" />
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-
> > anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > + x="371.8013"
> > + y="664.62476"
> > + id="text4138-6-2-6-1-6-2-5"><tspan
> > + sodipodi:role="line"
> > + x="371.8013"
> > + y="664.62476"
> > + id="tspan4274"
> > + style="font-size:15px;line-
> > height:1.25">&lt;&lt;vfio&gt;&gt;</tspan><tspan
> > + sodipodi:role="line"
> > + x="371.8013"
> > + y="683.37476"
> > + id="tspan4305"
> > + style="font-size:15px;line-height:1.25">resource
> > management</tspan></text>
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> > opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> > linejoin:miter;stroke-opacity:1"
> > + x="389.92969"
> > + y="587.44836"
> > + id="text4138-6-2-6-1-6-2-56"><tspan
> > + sodipodi:role="line"
> > + id="tspan4140-1-9-1-0-3-0-9"
> > + x="389.92969"
> > + y="587.44836"
> > + style="font-size:15px;line-height:1.25">1</tspan></text>
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> > opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> > linejoin:miter;stroke-opacity:1"
> > + x="528.64813"
> > + y="600.08429"
> > + id="text4138-6-2-6-1-6-3"><tspan
> > + sodipodi:role="line"
> > + id="tspan4140-1-9-1-0-3-7"
> > + x="528.64813"
> > + y="600.08429"
> > + style="font-size:15px;line-height:1.25">*</tspan></text>
> > + <rect
> > + style="fill:#000000;stroke:#000000;stroke-
> > width:0.6465112;filter:url(#filter4169-3-5-8-54)"
> > + id="rect4136-3-6-5-7-4"
> > + width="101.07784"
> > + height="31.998148"
> > + x="128.74678"
> > + y="80.648842"
> > + transform="matrix(1.3745874,0,0,1.8929066,-132.7754,556.04505)" />
> > + <rect
> > + style="fill:url(#linearGradient5032-3-9-4);fill-
> > opacity:1;stroke:#000000;stroke-width:1.07280123"
> > + id="rect4136-2-6-3-4"
> > + width="138.91039"
> > + height="64.111"
> > + x="42.321312"
> > + y="704.8371" />
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> > opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> > linejoin:miter;stroke-opacity:1"
> > + x="110.30745"
> > + y="722.94025"
> > + id="text4138-6-2-6-3"><tspan
> > + sodipodi:role="line"
> > + x="111.99202"
> > + y="722.94025"
> > + id="tspan4366"
> > + style="font-size:15px;line-height:1.25;text-align:center;text-
> > anchor:middle">other standard </tspan><tspan
> > + sodipodi:role="line"
> > + x="110.30745"
> > + y="741.69025"
> > + id="tspan4368"
> > + style="font-size:15px;line-height:1.25;text-align:center;text-
> > anchor:middle">framework</tspan><tspan
> > + sodipodi:role="line"
> > + x="110.30745"
> > + y="760.44025"
> > + style="font-size:15px;line-height:1.25;text-align:center;text-
> > anchor:middle"
> > + id="tspan6840">(crypto/nic/others)</tspan></text>
> > + <path
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> > opacity:1;marker-end:url(#marker4825-6-2-1-8)"
> > + d="M 276.29661,849.04109 134.04449,771.90853"
> > + id="path4661-3-4-8"
> > + inkscape:connector-curvature="0"
> > + sodipodi:nodetypes="cc" />
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> > opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> > linejoin:miter;stroke-opacity:1"
> > + x="313.70813"
> > + y="730.06366"
> > + id="text4138-6-2-6-36"><tspan
> > + sodipodi:role="line"
> > + id="tspan4140-1-9-1-7"
> > + x="313.70813"
> > + y="730.06366"
> > + style="font-size:10px;line-
> > height:1.25">&lt;&lt;lkm&gt;&gt;</tspan></text>
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;text-align:start;letter-spacing:0px;word-spacing:0px;text-
> > anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-
> > linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > + x="343.81625"
> > + y="786.44141"
> > + id="text4138-6-2-6-1-6-2-5-7-5"><tspan
> > + sodipodi:role="line"
> > + x="343.81625"
> > + y="786.44141"
> > + style="font-size:15px;line-height:1.25;text-align:start;text-
> > anchor:start"
> > + id="tspan2278">regist<tspan
> > + style="text-align:start;text-anchor:start"
> > + id="tspan2280">er as mdev with &quot;share </tspan></tspan><tspan
> > + sodipodi:role="line"
> > + x="343.81625"
> > + y="805.19141"
> > + style="font-size:15px;line-height:1.25;text-align:start;text-
> > anchor:start"
> > + id="tspan2357"><tspan
> > + style="text-align:start;text-anchor:start"
> > + id="tspan2359">parent iommu&quot; attribu</tspan>te</tspan></text>
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> > opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> > linejoin:miter;stroke-opacity:1"
> > + x="29.145819"
> > + y="833.44244"
> > + id="text4138-6-2-6-1-6-2-5-7-5-2"><tspan
> > + sodipodi:role="line"
> > + x="29.145819"
> > + y="833.44244"
> > + id="tspan4301"
> > + style="font-size:15px;line-height:1.25">register to other
> > subsystem</tspan></text>
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> > opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> > linejoin:miter;stroke-opacity:1"
> > + x="301.20813"
> > + y="597.29437"
> > + id="text4138-6-2-6-36-1"><tspan
> > + sodipodi:role="line"
> > + id="tspan4140-1-9-1-7-2"
> > + x="301.20813"
> > + y="597.29437"
> > + style="font-size:10px;line-
> > height:1.25">&lt;&lt;user_lib&gt;&gt;</tspan></text>
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-
> > anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > + x="649.09613"
> > + y="774.4798"
> > + id="text4138-6-2-6-1-6-2-5-3"><tspan
> > + sodipodi:role="line"
> > + id="tspan4140-1-9-1-0-3-0-4-6"
> > + x="649.09613"
> > + y="774.4798"
> > + style="font-size:15px;line-
> > height:1.25">&lt;&lt;vfio&gt;&gt;</tspan><tspan
> > + sodipodi:role="line"
> > + x="649.09613"
> > + y="793.2298"
> > + id="tspan4274-7"
> > + style="font-size:15px;line-height:1.25">Hardware
> > Accessing</tspan></text>
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-
> > anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > + x="371.01291"
> > + y="529.23682"
> > + id="text4138-6-2-6-1-6-2-5-36"><tspan
> > + sodipodi:role="line"
> > + x="371.01291"
> > + y="529.23682"
> > + id="tspan4305-3"
> > + style="font-size:15px;line-height:1.25">wd user api</tspan></text>
> > + <path
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > + d="m 328.19325,585.87943 0,-23.57142"
> > + id="path4348"
> > + inkscape:connector-curvature="0" />
> > + <ellipse
> > + style="opacity:1;fill:#ffffff;fill-opacity:1;fill-
> > rule:evenodd;stroke:#000000;stroke-width:1;stroke-miterlimit:4;stroke-
> > dasharray:none;stroke-dashoffset:0"
> > + id="path4350"
> > + cx="328.01468"
> > + cy="551.95081"
> > + rx="11.607142"
> > + ry="10.357142" />
> > + <path
> > + style="opacity:0.444;fill:url(#linearGradient6836);fill-opacity:1;fill-
> > rule:evenodd;stroke:none;stroke-width:1;stroke-miterlimit:4;stroke-
> > dasharray:none;stroke-dashoffset:0;filter:url(#filter5382)"
> > + id="path4350-2"
> > + sodipodi:type="arc"
> > + sodipodi:cx="329.44327"
> > + sodipodi:cy="553.37933"
> > + sodipodi:rx="11.607142"
> > + sodipodi:ry="10.357142"
> > + sodipodi:start="0"
> > + sodipodi:end="6.2509098"
> > + d="m 341.05041,553.37933 a 11.607142,10.357142 0 0 1 -
> > 11.51349,10.35681 11.607142,10.357142 0 0 1 -11.69928,-10.18967
> > 11.607142,10.357142 0 0 1 11.32469,-10.52124 11.607142,10.357142 0 0 1
> > 11.88204,10.01988"
> > + sodipodi:open="true" />
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-
> > anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > + x="543.91455"
> > + y="978.22363"
> > + id="text4138-6-2-6-1-6-2-5-36-3"><tspan
> > + sodipodi:role="line"
> > + x="543.91455"
> > + y="978.22363"
> > + id="tspan4305-3-67"
> > + style="font-size:15px;line-
> > height:1.25">Device(Hardware)</tspan></text>
> > + <path
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> > opacity:1;marker-end:url(#marker4825-6-2-6-2)"
> > + d="m 347.51164,865.4527 153.19752,91.52439"
> > + id="path4661-3-5-1"
> > + inkscape:connector-curvature="0"
> > + sodipodi:nodetypes="cc" />
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;font-size:12px;line-
> > height:0%;font-family:sans-serif;letter-spacing:0px;word-
> > spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-
> > linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > + x="343.6398"
> > + y="716.47754"
> > + id="text4138-6-2-6-1-6-2-5-7-5-2-6"><tspan
> > + sodipodi:role="line"
> > + x="343.6398"
> > + y="716.47754"
> > + id="tspan4301-4"
> > + style="font-style:italic;font-variant:normal;font-weight:normal;font-
> > stretch:normal;font-size:15px;line-height:1.25;font-family:sans-serif;-
> > inkscape-font-specification:'sans-serif Italic';stroke-width:1px">Share
> > Parent's IOMMU mdev</tspan></text>
> > + </g>
> > +</svg>
> > diff --git a/Documentation/warpdrive/wd.svg
> > b/Documentation/warpdrive/wd.svg
> > new file mode 100644
> > index 000000000000..87ab92ebfbc6
> > --- /dev/null
> > +++ b/Documentation/warpdrive/wd.svg
> > @@ -0,0 +1,526 @@
> > +<?xml version="1.0" encoding="UTF-8" standalone="no"?>
> > +<!-- Created with Inkscape (http://www.inkscape.org/) -->
> > +
> > +<svg
> > + xmlns:dc="http://purl.org/dc/elements/1.1/"
> > + xmlns:cc="http://creativecommons.org/ns#"
> > + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> > + xmlns:svg="http://www.w3.org/2000/svg"
> > + xmlns="http://www.w3.org/2000/svg"
> > + xmlns:xlink="http://www.w3.org/1999/xlink"
> > + xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
> > + xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
> > + width="210mm"
> > + height="116mm"
> > + viewBox="0 0 744.09449 411.02338"
> > + id="svg2"
> > + version="1.1"
> > + inkscape:version="0.92.3 (2405546, 2018-03-11)"
> > + sodipodi:docname="wd.svg">
> > + <defs
> > + id="defs4">
> > + <linearGradient
> > + inkscape:collect="always"
> > + id="linearGradient5026">
> > + <stop
> > + style="stop-color:#f2f2f2;stop-opacity:1;"
> > + offset="0"
> > + id="stop5028" />
> > + <stop
> > + style="stop-color:#f2f2f2;stop-opacity:0;"
> > + offset="1"
> > + id="stop5030" />
> > + </linearGradient>
> > + <linearGradient
> > + inkscape:collect="always"
> > + xlink:href="#linearGradient5026"
> > + id="linearGradient5032-3"
> > + x1="353"
> > + y1="211.3622"
> > + x2="565.5"
> > + y2="174.8622"
> > + gradientUnits="userSpaceOnUse"
> > + gradientTransform="matrix(2.7384117,0,0,0.91666329,-
> > 952.8283,571.10143)" />
> > + <filter
> > + inkscape:collect="always"
> > + style="color-interpolation-filters:sRGB"
> > + id="filter4169-3-5"
> > + x="-0.031597666"
> > + width="1.0631953"
> > + y="-0.099812768"
> > + height="1.1996255">
> > + <feGaussianBlur
> > + inkscape:collect="always"
> > + stdDeviation="1.3307599"
> > + id="feGaussianBlur4171-6-3" />
> > + </filter>
> > + <marker
> > + markerWidth="18.960653"
> > + markerHeight="11.194658"
> > + refX="9.4803267"
> > + refY="5.5973287"
> > + orient="auto"
> > + id="marker4613">
> > + <rect
> > + y="-5.1589785"
> > + x="5.8504119"
> > + height="10.317957"
> > + width="10.317957"
> > + id="rect4212"
> > + style="fill:#ffffff;stroke:#000000;stroke-width:0.69143367;stroke-
> > miterlimit:4;stroke-dasharray:none"
> > + transform="matrix(0.86111274,0.50841405,-
> > 0.86111274,0.50841405,0,0)">
> > + <title
> > + id="title4262">generation</title>
> > + </rect>
> > + </marker>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <linearGradient
> > + inkscape:collect="always"
> > + xlink:href="#linearGradient5026"
> > + id="linearGradient5032-3-9"
> > + x1="353"
> > + y1="211.3622"
> > + x2="565.5"
> > + y2="174.8622"
> > + gradientUnits="userSpaceOnUse"
> > + gradientTransform="matrix(1.2452511,0,0,0.98513016,-
> > 190.95632,540.33156)" />
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-2">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-9"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-2-1">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-9-9"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-2-6">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-9-1"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-2-1-8">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-9-9-6"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-2-1-8-8">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-9-9-6-9"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-0">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-93"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-0-2">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-93-6"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-2-6-2">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-9-1-9"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <linearGradient
> > + inkscape:collect="always"
> > + xlink:href="#linearGradient5026"
> > + id="linearGradient5032-3-8"
> > + x1="353"
> > + y1="211.3622"
> > + x2="565.5"
> > + y2="174.8622"
> > + gradientUnits="userSpaceOnUse"
> > + gradientTransform="matrix(1.0104674,0,0,1.0052679,-
> > 218.642,661.15448)" />
> > + <filter
> > + inkscape:collect="always"
> > + style="color-interpolation-filters:sRGB"
> > + id="filter4169-3-5-8"
> > + x="-0.031597666"
> > + width="1.0631953"
> > + y="-0.099812768"
> > + height="1.1996255">
> > + <feGaussianBlur
> > + inkscape:collect="always"
> > + stdDeviation="1.3307599"
> > + id="feGaussianBlur4171-6-3-9" />
> > + </filter>
> > + <linearGradient
> > + inkscape:collect="always"
> > + xlink:href="#linearGradient5026"
> > + id="linearGradient5032-3-8-2"
> > + x1="353"
> > + y1="211.3622"
> > + x2="565.5"
> > + y2="174.8622"
> > + gradientUnits="userSpaceOnUse"
> > + gradientTransform="matrix(2.1450559,0,0,1.0052679,-
> > 521.97704,740.76422)" />
> > + <filter
> > + inkscape:collect="always"
> > + style="color-interpolation-filters:sRGB"
> > + id="filter4169-3-5-8-5"
> > + x="-0.031597666"
> > + width="1.0631953"
> > + y="-0.099812768"
> > + height="1.1996255">
> > + <feGaussianBlur
> > + inkscape:collect="always"
> > + stdDeviation="1.3307599"
> > + id="feGaussianBlur4171-6-3-9-1" />
> > + </filter>
> > + <linearGradient
> > + inkscape:collect="always"
> > + xlink:href="#linearGradient5026"
> > + id="linearGradient5032-3-8-0"
> > + x1="353"
> > + y1="211.3622"
> > + x2="565.5"
> > + y2="174.8622"
> > + gradientUnits="userSpaceOnUse"
> > +
> > gradientTransform="matrix(1.0104674,0,0,1.0052679,83.456748,660.20747
> > )" />
> > + <filter
> > + inkscape:collect="always"
> > + style="color-interpolation-filters:sRGB"
> > + id="filter4169-3-5-8-6"
> > + x="-0.031597666"
> > + width="1.0631953"
> > + y="-0.099812768"
> > + height="1.1996255">
> > + <feGaussianBlur
> > + inkscape:collect="always"
> > + stdDeviation="1.3307599"
> > + id="feGaussianBlur4171-6-3-9-2" />
> > + </filter>
> > + <linearGradient
> > + inkscape:collect="always"
> > + xlink:href="#linearGradient5026"
> > + id="linearGradient5032-3-84"
> > + x1="353"
> > + y1="211.3622"
> > + x2="565.5"
> > + y2="174.8622"
> > + gradientUnits="userSpaceOnUse"
> > + gradientTransform="matrix(1.9884948,0,0,0.94903536,-
> > 318.42665,564.37696)" />
> > + <filter
> > + inkscape:collect="always"
> > + style="color-interpolation-filters:sRGB"
> > + id="filter4169-3-5-4"
> > + x="-0.031597666"
> > + width="1.0631953"
> > + y="-0.099812768"
> > + height="1.1996255">
> > + <feGaussianBlur
> > + inkscape:collect="always"
> > + stdDeviation="1.3307599"
> > + id="feGaussianBlur4171-6-3-0" />
> > + </filter>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-0-0">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-93-8"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + <marker
> > + markerWidth="11.227358"
> > + markerHeight="12.355258"
> > + refX="10"
> > + refY="6.177629"
> > + orient="auto"
> > + id="marker4825-6-3">
> > + <path
> > + inkscape:connector-curvature="0"
> > + id="path4757-1-1"
> > + d="M 0.42024733,0.42806444 10.231357,6.3500844
> > 0.24347733,11.918544"
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > + </marker>
> > + </defs>
> > + <sodipodi:namedview
> > + id="base"
> > + pagecolor="#ffffff"
> > + bordercolor="#666666"
> > + borderopacity="1.0"
> > + inkscape:pageopacity="0.0"
> > + inkscape:pageshadow="2"
> > + inkscape:zoom="0.98994949"
> > + inkscape:cx="457.47339"
> > + inkscape:cy="250.14781"
> > + inkscape:document-units="px"
> > + inkscape:current-layer="layer1"
> > + showgrid="false"
> > + inkscape:window-width="1916"
> > + inkscape:window-height="1033"
> > + inkscape:window-x="0"
> > + inkscape:window-y="22"
> > + inkscape:window-maximized="0"
> > + fit-margin-right="0.3" />
> > + <metadata
> > + id="metadata7">
> > + <rdf:RDF>
> > + <cc:Work
> > + rdf:about="">
> > + <dc:format>image/svg+xml</dc:format>
> > + <dc:type
> > + rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
> > + <dc:title></dc:title>
> > + </cc:Work>
> > + </rdf:RDF>
> > + </metadata>
> > + <g
> > + inkscape:label="Layer 1"
> > + inkscape:groupmode="layer"
> > + id="layer1"
> > + transform="translate(0,-641.33861)">
> > + <rect
> > + style="fill:#000000;stroke:#000000;stroke-
> > width:0.6465112;filter:url(#filter4169-3-5)"
> > + id="rect4136-3-6-5"
> > + width="101.07784"
> > + height="31.998148"
> > + x="128.74678"
> > + y="80.648842"
> > + transform="matrix(2.7384116,0,0,0.91666328,-284.06895,664.79751)"
> > />
> > + <rect
> > + style="fill:url(#linearGradient5032-3);fill-
> > opacity:1;stroke:#000000;stroke-width:1.02430749"
> > + id="rect4136-2-6"
> > + width="276.79272"
> > + height="29.331528"
> > + x="64.723419"
> > + y="736.84473" />
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;line-height:0%;font-
> > family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-
> > opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-
> > linejoin:miter;stroke-opacity:1"
> > + x="78.223282"
> > + y="756.79803"
> > + id="text4138-6-2"><tspan
> > + sodipodi:role="line"
> > + id="tspan4140-1-9"
> > + x="78.223282"
> > + y="756.79803"
> > + style="font-size:15px;line-height:1.25">user application (running by
> > the CPU</tspan></text>
> > + <path
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> > opacity:1;marker-end:url(#marker4825-6)"
> > + d="m 217.67507,876.6738 113.40331,45.0758"
> > + id="path4661"
> > + inkscape:connector-curvature="0"
> > + sodipodi:nodetypes="cc" />
> > + <path
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> > opacity:1;marker-end:url(#marker4825-6-0)"
> > + d="m 208.10197,767.69811 0.29362,76.03656"
> > + id="path4661-6"
> > + inkscape:connector-curvature="0"
> > + sodipodi:nodetypes="cc" />
> > + <rect
> > + style="fill:#000000;stroke:#000000;stroke-
> > width:0.6465112;filter:url(#filter4169-3-5-8)"
> > + id="rect4136-3-6-5-3"
> > + width="101.07784"
> > + height="31.998148"
> > + x="128.74678"
> > + y="80.648842"
> > + transform="matrix(1.0104673,0,0,1.0052679,28.128628,763.90722)" />
> > + <rect
> > + style="fill:url(#linearGradient5032-3-8);fill-
> > opacity:1;stroke:#000000;stroke-width:0.65159565"
> > + id="rect4136-2-6-6"
> > + width="102.13586"
> > + height="32.16671"
> > + x="156.83217"
> > + y="842.91852" />
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;font-size:12px;line-
> > height:0%;font-family:sans-serif;letter-spacing:0px;word-
> > spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-
> > linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > + x="188.58519"
> > + y="864.47125"
> > + id="text4138-6-2-8"><tspan
> > + sodipodi:role="line"
> > + id="tspan4140-1-9-0"
> > + x="188.58519"
> > + y="864.47125"
> > + style="font-size:15px;line-height:1.25;stroke-
> > width:1px">MMU</tspan></text>
> > + <rect
> > + style="fill:#000000;stroke:#000000;stroke-
> > width:0.6465112;filter:url(#filter4169-3-5-8-5)"
> > + id="rect4136-3-6-5-3-1"
> > + width="101.07784"
> > + height="31.998148"
> > + x="128.74678"
> > + y="80.648842"
> > + transform="matrix(2.1450556,0,0,1.0052679,1.87637,843.51696)" />
> > + <rect
> > + style="fill:url(#linearGradient5032-3-8-2);fill-
> > opacity:1;stroke:#000000;stroke-width:0.94937181"
> > + id="rect4136-2-6-6-0"
> > + width="216.8176"
> > + height="32.16671"
> > + x="275.09283"
> > + y="922.5282" />
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;font-size:12px;line-
> > height:0%;font-family:sans-serif;letter-spacing:0px;word-
> > spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-
> > linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > + x="347.81482"
> > + y="943.23291"
> > + id="text4138-6-2-8-8"><tspan
> > + sodipodi:role="line"
> > + id="tspan4140-1-9-0-5"
> > + x="347.81482"
> > + y="943.23291"
> > + style="font-size:15px;line-height:1.25;stroke-
> > width:1px">Memory</tspan></text>
> > + <rect
> > + style="fill:#000000;stroke:#000000;stroke-
> > width:0.6465112;filter:url(#filter4169-3-5-8-6)"
> > + id="rect4136-3-6-5-3-5"
> > + width="101.07784"
> > + height="31.998148"
> > + x="128.74678"
> > + y="80.648842"
> > + transform="matrix(1.0104673,0,0,1.0052679,330.22737,762.9602)" />
> > + <rect
> > + style="fill:url(#linearGradient5032-3-8-0);fill-
> > opacity:1;stroke:#000000;stroke-width:0.65159565"
> > + id="rect4136-2-6-6-8"
> > + width="102.13586"
> > + height="32.16671"
> > + x="458.93091"
> > + y="841.9715" />
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;font-size:12px;line-
> > height:0%;font-family:sans-serif;letter-spacing:0px;word-
> > spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-
> > linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > + x="490.68393"
> > + y="863.52423"
> > + id="text4138-6-2-8-6"><tspan
> > + sodipodi:role="line"
> > + id="tspan4140-1-9-0-2"
> > + x="490.68393"
> > + y="863.52423"
> > + style="font-size:15px;line-height:1.25;stroke-
> > width:1px">IOMMU</tspan></text>
> > + <rect
> > + style="fill:#000000;stroke:#000000;stroke-
> > width:0.6465112;filter:url(#filter4169-3-5-4)"
> > + id="rect4136-3-6-5-6"
> > + width="101.07784"
> > + height="31.998148"
> > + x="128.74678"
> > + y="80.648842"
> > + transform="matrix(1.9884947,0,0,0.94903537,167.19229,661.38193)"
> > />
> > + <rect
> > + style="fill:url(#linearGradient5032-3-84);fill-
> > opacity:1;stroke:#000000;stroke-width:0.88813609"
> > + id="rect4136-2-6-2"
> > + width="200.99274"
> > + height="30.367374"
> > + x="420.4675"
> > + y="735.97351" />
> > + <text
> > + xml:space="preserve"
> > + style="font-style:normal;font-weight:normal;font-size:12px;line-
> > height:0%;font-family:sans-serif;letter-spacing:0px;word-
> > spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-
> > linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > + x="441.95297"
> > + y="755.9068"
> > + id="text4138-6-2-9"><tspan
> > + sodipodi:role="line"
> > + id="tspan4140-1-9-9"
> > + x="441.95297"
> > + y="755.9068"
> > + style="font-size:15px;line-height:1.25;stroke-width:1px">Hardware
> > Accelerator</tspan></text>
> > + <path
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> > opacity:1;marker-end:url(#marker4825-6-0-0)"
> > + d="m 508.2914,766.55885 0.29362,76.03656"
> > + id="path4661-6-1"
> > + inkscape:connector-curvature="0"
> > + sodipodi:nodetypes="cc" />
> > + <path
> > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-
> > width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-
> > opacity:1;marker-end:url(#marker4825-6-3)"
> > + d="M 499.70201,876.47297 361.38296,920.80258"
> > + id="path4661-1"
> > + inkscape:connector-curvature="0"
> > + sodipodi:nodetypes="cc" />
> > + </g>
> > +</svg>
> > --
> > 2.17.1

--
-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

2018-08-02 04:24:22

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFC PATCH 3/7] vfio: add spimdev support

> From: Kenneth Lee [mailto:[email protected]]
> Sent: Thursday, August 2, 2018 11:47 AM
>
> >
> > > From: Kenneth Lee
> > > Sent: Wednesday, August 1, 2018 6:22 PM
> > >
> > > From: Kenneth Lee <[email protected]>
> > >
> > > SPIMDEV is "Share Parent IOMMU Mdev". It is a vfio-mdev. But differ
> from
> > > the general vfio-mdev:
> > >
> > > 1. It shares its parent's IOMMU.
> > > 2. There is no hardware resource attached to the mdev is created. The
> > > hardware resource (A `queue') is allocated only when the mdev is
> > > opened.
> >
> > Alex has concern on doing so, as pointed out in:
> >
> > https://www.spinics.net/lists/kvm/msg172652.html
> >
> > resource allocation should be reserved at creation time.
>
> Yes. That is why I keep telling that SPIMDEV is not for "VM", it is for "many
> processes", it is just an access point to the process. Not a device to VM. I
> hope
> Alex can accept it:)
>

VFIO is just about assigning device resource to user space. It doesn't care
whether it's native processes or VM using the device so far. Along the direction
which you described, looks VFIO needs to support the configuration that
some mdevs are used for native process only, while others can be used
for both native and VM. I'm not sure whether there is a clean way to
enforce it...

Let's hear from Alex's thought.

Thanks
Kevin

2018-08-02 04:36:52

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

> From: Kenneth Lee
> Sent: Thursday, August 2, 2018 11:40 AM
>
> On Thu, Aug 02, 2018 at 02:59:33AM +0000, Tian, Kevin wrote:
> > > From: Kenneth Lee
> > > Sent: Wednesday, August 1, 2018 6:22 PM
> > >
> > > From: Kenneth Lee <[email protected]>
> > >
> > > WarpDrive is an accelerator framework to expose the hardware
> capabilities
> > > directly to the user space. It makes use of the exist vfio and vfio-mdev
> > > facilities. So the user application can send request and DMA to the
> > > hardware without interaction with the kernel. This remove the latency
> > > of syscall and context switch.
> > >
> > > The patchset contains documents for the detail. Please refer to it for
> more
> > > information.
> > >
> > > This patchset is intended to be used with Jean Philippe Brucker's SVA
> > > patch [1] (Which is also in RFC stage). But it is not mandatory. This
> > > patchset is tested in the latest mainline kernel without the SVA patches.
> > > So it support only one process for each accelerator.
> >
> > If no sharing, then why not just assigning the whole parent device to
> > the process? IMO if SVA usage is the clear goal of your series, it
> > might be made clearly so then Jean's series is mandatory dependency...
> >
>
> We don't know how SVA will be finally. But the feature, "make use of
> per-PASID/substream ID IOMMU page table", should be able to be enabled
> in the
> kernel. So we don't want to enforce it here. After we have this serial ready,
> it
> can be hooked to any implementation.

"any" or "only queue-based" implementation? some devices may not
have queue concept, e.g. GPU.

>
> Further more, even without "per-PASID IOMMU page table", this series has
> its
> value. It is not simply dedicate the whole device to the process. It "shares"
> the device with the kernel driver. So you can support crypto and a user
> application at the same time.

OK.

Thanks
Kevin

2018-08-02 04:39:37

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFC PATCH 2/7] iommu: Add share domain interface in iommu for spimdev

> From: Kenneth Lee
> Sent: Thursday, August 2, 2018 12:16 PM
>
> On Thu, Aug 02, 2018 at 03:17:03AM +0000, Tian, Kevin wrote:
> > > From: Kenneth Lee
> > > Sent: Wednesday, August 1, 2018 6:22 PM
> > >
> > > From: Kenneth Lee <liguozhu-C8/M+/[email protected]>
> > >
> > > This patch add sharing interface for a iommu_group. The new interface:
> > >
> > > iommu_group_share_domain()
> > > iommu_group_unshare_domain()
> > >
> > > can be used by some virtual iommu_group (such as iommu_group for
> > > spimdev)
> > > to share their parent's iommu_group.
> > >
> > > When the domain of the group is shared, it cannot be changed before
> > > unshared. In the future, notification can be added if update is required.
> >
> > Is it necessary or just asking VFIO to use parent domain directly?
> >
> Even we add to VFIO, the iommu still need to be changed. We can move
> the type1
> part to VFIO if we have agreement in RFC stage.

We have similar consideration in IOMMU aware mdev series. Now
we are inclined to avoid faked entity within IOMMU layer - leave to
caller to figure out right physical entity for any IOMMU ops. That
might be leveraged here too after we get a new version.

Thanks
Kevin

2018-08-02 04:41:32

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFC PATCH 1/7] vfio/spimdev: Add documents for WarpDrive framework

> From: Kenneth Lee
> Sent: Thursday, August 2, 2018 12:23 PM
>
> On Thu, Aug 02, 2018 at 03:14:38AM +0000, Tian, Kevin wrote:
> >
> > > From: Kenneth Lee
> > > Sent: Wednesday, August 1, 2018 6:22 PM
> > >
> > > From: Kenneth Lee <[email protected]>
> > >
> > > WarpDrive is a common user space accelerator framework. Its main
> > > component
> > > in Kernel is called spimdev, Share Parent IOMMU Mediated Device. It
> >
> > Not sure whether "share parent IOMMU" is a good term here. better
> > stick to what capabity you bring to user space, instead of describing
> > internal trick...
> Agree. The "SPI" is also strongly hint to "Share Peripheral Interrupt" or
> "Serial
> Peripheral Interface". Personally I hate this name. But I cannot find a better
> one. In the user space, we can use a impassible name (WarpDrive). But
> kernel
> require a name point to its feature... maybe I should use "Process-Shared
> Mdev"
> in next version?

or maybe no new name. Just stick to mdev but introducing some
new attribute/capability to indicate such purpose.

Thanks
Kevin

2018-08-02 05:35:58

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Thu, Aug 02, 2018 at 04:36:52AM +0000, Tian, Kevin wrote:
> Date: Thu, 2 Aug 2018 04:36:52 +0000
> From: "Tian, Kevin" <[email protected]>
> To: Kenneth Lee <[email protected]>
> CC: Kenneth Lee <[email protected]>, Herbert Xu
> <[email protected]>, "[email protected]"
> <[email protected]>, Jonathan Corbet <[email protected]>, Greg
> Kroah-Hartman <[email protected]>, "[email protected]"
> <[email protected]>, "Kumar, Sanjay K" <[email protected]>,
> Hao Fang <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, Alex Williamson <[email protected]>,
> Thomas Gleixner <[email protected]>, "[email protected]"
> <[email protected]>, Philippe Ombredanne
> <[email protected]>, Zaibo Xu <[email protected]>, "David S . Miller"
> <[email protected]>, "[email protected]"
> <[email protected]>
> Subject: RE: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
> Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D19129105A@SHSMSX101.ccr.corp.intel.com>
>
> > From: Kenneth Lee
> > Sent: Thursday, August 2, 2018 11:40 AM
> >
> > On Thu, Aug 02, 2018 at 02:59:33AM +0000, Tian, Kevin wrote:
> > > > From: Kenneth Lee
> > > > Sent: Wednesday, August 1, 2018 6:22 PM
> > > >
> > > > From: Kenneth Lee <[email protected]>
> > > >
> > > > WarpDrive is an accelerator framework to expose the hardware
> > capabilities
> > > > directly to the user space. It makes use of the exist vfio and vfio-mdev
> > > > facilities. So the user application can send request and DMA to the
> > > > hardware without interaction with the kernel. This remove the latency
> > > > of syscall and context switch.
> > > >
> > > > The patchset contains documents for the detail. Please refer to it for
> > more
> > > > information.
> > > >
> > > > This patchset is intended to be used with Jean Philippe Brucker's SVA
> > > > patch [1] (Which is also in RFC stage). But it is not mandatory. This
> > > > patchset is tested in the latest mainline kernel without the SVA patches.
> > > > So it support only one process for each accelerator.
> > >
> > > If no sharing, then why not just assigning the whole parent device to
> > > the process? IMO if SVA usage is the clear goal of your series, it
> > > might be made clearly so then Jean's series is mandatory dependency...
> > >
> >
> > We don't know how SVA will be finally. But the feature, "make use of
> > per-PASID/substream ID IOMMU page table", should be able to be enabled
> > in the
> > kernel. So we don't want to enforce it here. After we have this serial ready,
> > it
> > can be hooked to any implementation.
>
> "any" or "only queue-based" implementation? some devices may not
> have queue concept, e.g. GPU.

Here, "any implementation" refer to the "per PASID page table" implementatin:)

>
> >
> > Further more, even without "per-PASID IOMMU page table", this series has
> > its
> > value. It is not simply dedicate the whole device to the process. It "shares"
> > the device with the kernel driver. So you can support crypto and a user
> > application at the same time.
>
> OK.
>
> Thanks
> Kevin

--
-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!

2018-08-02 07:34:40

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 3/7] vfio: add spimdev support

On Thu, Aug 02, 2018 at 04:24:22AM +0000, Tian, Kevin wrote:
> Date: Thu, 2 Aug 2018 04:24:22 +0000
> From: "Tian, Kevin" <[email protected]>
> To: Kenneth Lee <[email protected]>
> CC: Kenneth Lee <[email protected]>, Jonathan Corbet <[email protected]>,
> Herbert Xu <[email protected]>, "David S . Miller"
> <[email protected]>, Joerg Roedel <[email protected]>, Alex Williamson
> <[email protected]>, Hao Fang <[email protected]>, Zhou Wang
> <[email protected]>, Zaibo Xu <[email protected]>, Philippe
> Ombredanne <[email protected]>, Greg Kroah-Hartman
> <[email protected]>, Thomas Gleixner <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]"
> <[email protected]>, Lu Baolu
> <[email protected]>, "Kumar, Sanjay K" <[email protected]>,
> "[email protected]" <[email protected]>
> Subject: RE: [RFC PATCH 3/7] vfio: add spimdev support
> Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D19129102C@SHSMSX101.ccr.corp.intel.com>
>
> > From: Kenneth Lee [mailto:[email protected]]
> > Sent: Thursday, August 2, 2018 11:47 AM
> >
> > >
> > > > From: Kenneth Lee
> > > > Sent: Wednesday, August 1, 2018 6:22 PM
> > > >
> > > > From: Kenneth Lee <[email protected]>
> > > >
> > > > SPIMDEV is "Share Parent IOMMU Mdev". It is a vfio-mdev. But differ
> > from
> > > > the general vfio-mdev:
> > > >
> > > > 1. It shares its parent's IOMMU.
> > > > 2. There is no hardware resource attached to the mdev is created. The
> > > > hardware resource (A `queue') is allocated only when the mdev is
> > > > opened.
> > >
> > > Alex has concern on doing so, as pointed out in:
> > >
> > > https://www.spinics.net/lists/kvm/msg172652.html
> > >
> > > resource allocation should be reserved at creation time.
> >
> > Yes. That is why I keep telling that SPIMDEV is not for "VM", it is for "many
> > processes", it is just an access point to the process. Not a device to VM. I
> > hope
> > Alex can accept it:)
> >
>
> VFIO is just about assigning device resource to user space. It doesn't care
> whether it's native processes or VM using the device so far. Along the direction
> which you described, looks VFIO needs to support the configuration that
> some mdevs are used for native process only, while others can be used
> for both native and VM. I'm not sure whether there is a clean way to
> enforce it...

I had the same idea at the beginning. But finally I found that the life cycle
of the virtual device for VM and process were different. Consider you create
some mdevs for VM use, you will give all those mdevs to lib-virt, which
distribute those mdev to VMs or containers. If the VM or container exits, the
mdev is returned to the lib-virt and used for next allocation. It is the
administrator who controlled every mdev's allocation.

But for process, it is different. There is no lib-virt in control. The
administrator's intension is to grant some type of application to access the
hardware. The application can get a handle of the hardware, send request and get
the result. That's all. He/She dose not care which mdev is allocated to that
application. If it crashes, it should be the kernel's responsibility to withdraw
the resource, the system administrator does not want to do it by hand.

>
> Let's hear from Alex's thought.

Sure:)

>
> Thanks
> Kevin

--
-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!

2018-08-02 10:10:00

by Alan Cox

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

> One motivation I guess, is that most accelerators lack of a
> well-abstracted high level APIs similar to GPU side (e.g. OpenCL
> clearly defines Shared Virtual Memory models). VFIO mdev
> might be an alternative common interface to enable SVA usages
> on various accelerators...

SVA is not IMHO the hard bit from a user level API perspective. The hard
bit is describing what you have and enumerating the contents of the device
especially when those can be quite dynamic and in the FPGA case can
change on the fly.

Right now we've got
- FPGA manager
- Google's recently posted ASIC patches
- WarpDrive

all trying to be bits of the same thing, and really there needs to be a
single solution that handles all of this stuff properly.

If we are going to have any kind of general purpose accelerator API then
it has to be able to implement things like

'find me an accelerator with function X that is nearest my memory'
'find me accelerator functions X and Y that share HBM'
'find me accelerator functions X and Y than can be chained'

If instead we have three API's depending upon whose accelerator you are
using and whether it's FPGA or ASIC this is going to be a mess on a grand
scale.

Alan

2018-08-02 08:35:28

by Cornelia Huck

[permalink] [raw]
Subject: Re: [RFC PATCH 3/7] vfio: add spimdev support

On Thu, 2 Aug 2018 15:34:40 +0800
Kenneth Lee <liguozhu-C8/M+/[email protected]> wrote:

> On Thu, Aug 02, 2018 at 04:24:22AM +0000, Tian, Kevin wrote:

> > > From: Kenneth Lee [mailto:liguozhu-C8/M+/[email protected]]
> > > Sent: Thursday, August 2, 2018 11:47 AM
> > >
> > > >
> > > > > From: Kenneth Lee
> > > > > Sent: Wednesday, August 1, 2018 6:22 PM
> > > > >
> > > > > From: Kenneth Lee <liguozhu-C8/M+/[email protected]>
> > > > >
> > > > > SPIMDEV is "Share Parent IOMMU Mdev". It is a vfio-mdev. But differ
> > > from
> > > > > the general vfio-mdev:
> > > > >
> > > > > 1. It shares its parent's IOMMU.
> > > > > 2. There is no hardware resource attached to the mdev is created. The
> > > > > hardware resource (A `queue') is allocated only when the mdev is
> > > > > opened.
> > > >
> > > > Alex has concern on doing so, as pointed out in:
> > > >
> > > > https://www.spinics.net/lists/kvm/msg172652.html
> > > >
> > > > resource allocation should be reserved at creation time.
> > >
> > > Yes. That is why I keep telling that SPIMDEV is not for "VM", it is for "many
> > > processes", it is just an access point to the process. Not a device to VM. I
> > > hope
> > > Alex can accept it:)
> > >
> >
> > VFIO is just about assigning device resource to user space. It doesn't care
> > whether it's native processes or VM using the device so far. Along the direction
> > which you described, looks VFIO needs to support the configuration that
> > some mdevs are used for native process only, while others can be used
> > for both native and VM. I'm not sure whether there is a clean way to
> > enforce it...
>
> I had the same idea at the beginning. But finally I found that the life cycle
> of the virtual device for VM and process were different. Consider you create
> some mdevs for VM use, you will give all those mdevs to lib-virt, which
> distribute those mdev to VMs or containers. If the VM or container exits, the
> mdev is returned to the lib-virt and used for next allocation. It is the
> administrator who controlled every mdev's allocation.
>
> But for process, it is different. There is no lib-virt in control. The
> administrator's intension is to grant some type of application to access the
> hardware. The application can get a handle of the hardware, send request and get
> the result. That's all. He/She dose not care which mdev is allocated to that
> application. If it crashes, it should be the kernel's responsibility to withdraw
> the resource, the system administrator does not want to do it by hand.

I don't think that you should distinguish the cases by the presence of
a management application. How can the mdev driver know what the
intention behind using the device is?

Would it make more sense to use a different mechanism to enforce that
applications only use those handles they are supposed to use? Maybe
cgroups? I don't think it's a good idea to push usage policy into the
kernel.

[I have not read the documentation, so I may be totally off on the
intended use of this.]

2018-08-02 12:24:39

by Xu Zaibo

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

Hi,

On 2018/8/2 18:10, Alan Cox wrote:
>> One motivation I guess, is that most accelerators lack of a
>> well-abstracted high level APIs similar to GPU side (e.g. OpenCL
>> clearly defines Shared Virtual Memory models). VFIO mdev
>> might be an alternative common interface to enable SVA usages
>> on various accelerators...
> SVA is not IMHO the hard bit from a user level API perspective. The hard
> bit is describing what you have and enumerating the contents of the device
> especially when those can be quite dynamic and in the FPGA case can
> change on the fly.
>
> Right now we've got
> - FPGA manager
> - Google's recently posted ASIC patches
> - WarpDrive
>
> all trying to be bits of the same thing, and really there needs to be a
> single solution that handles all of this stuff properly.
>
> If we are going to have any kind of general purpose accelerator API then
> it has to be able to implement things like
>
> 'find me an accelerator with function X that is nearest my memory'
> 'find me accelerator functions X and Y that share HBM'
> 'find me accelerator functions X and Y than can be chained'
>
> If instead we have three API's depending upon whose accelerator you are
> using and whether it's FPGA or ASIC this is going to be a mess on a grand
> scale.
>
Agree, at the beginning, we try to bring a notion of 'capability' which
describes 'algorithms, mem access methods .etc ',
but then, we come to realize it is the first thing that we should come
to a single solution on these things such as
memory/device access, IOMMU .etc.

Thanks,
Zaibo

>
> .
>

2018-08-02 14:22:43

by Jerome Glisse

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Thu, Aug 02, 2018 at 12:05:57PM +0800, Kenneth Lee wrote:
> On Thu, Aug 02, 2018 at 02:33:12AM +0000, Tian, Kevin wrote:
> > Date: Thu, 2 Aug 2018 02:33:12 +0000
> > > From: Jerome Glisse
> > > On Wed, Aug 01, 2018 at 06:22:14PM +0800, Kenneth Lee wrote:
> > > > From: Kenneth Lee <liguozhu-C8/M+/[email protected]>
> > > >
> > > > WarpDrive is an accelerator framework to expose the hardware
> > > capabilities
> > > > directly to the user space. It makes use of the exist vfio and vfio-mdev
> > > > facilities. So the user application can send request and DMA to the
> > > > hardware without interaction with the kernel. This remove the latency
> > > > of syscall and context switch.
> > > >
> > > > The patchset contains documents for the detail. Please refer to it for
> > > more
> > > > information.
> > > >
> > > > This patchset is intended to be used with Jean Philippe Brucker's SVA
> > > > patch [1] (Which is also in RFC stage). But it is not mandatory. This
> > > > patchset is tested in the latest mainline kernel without the SVA patches.
> > > > So it support only one process for each accelerator.
> > > >
> > > > With SVA support, WarpDrive can support multi-process in the same
> > > > accelerator device. We tested it in our SoC integrated Accelerator (board
> > > > ID: D06, Chip ID: HIP08). A reference work tree can be found here: [2].
> > >
> > > I have not fully inspected things nor do i know enough about
> > > this Hisilicon ZIP accelerator to ascertain, but from glimpsing
> > > at the code it seems that it is unsafe to use even with SVA due
> > > to the doorbell. There is a comment talking about safetyness
> > > in patch 7.
> > >
> > > Exposing thing to userspace is always enticing, but if it is
> > > a security risk then it should clearly say so and maybe a
> > > kernel boot flag should be necessary to allow such device to
> > > be use.
> > >
>
> But doorbell is just a notification. Except for DOS (to make hardware busy) it
> cannot actually take or change anything from the kernel space. And the DOS
> problem can be always taken as the problem that a group of processes share the
> same kernel entity.
>
> In the coming HIP09 hardware, the doorbell will come with a random number so
> only the process who allocated the queue can knock it correctly.

When doorbell is ring the hardware start fetching commands from
the queue and execute them ? If so than a rogue process B might
ring the doorbell of process A which would starts execution of
random commands (ie whatever random memory value there is left
inside the command buffer memory, could be old commands i guess).

If this is not how this doorbell works then, yes it can only do
a denial of service i guess. Issue i have with doorbell is that
i have seen 10 differents implementations in 10 differents hw
and each are different as to what ringing or value written to the
doorbell does. It is painfull to track what is what for each hw.


> > > My more general question is do we want to grow VFIO to become
> > > a more generic device driver API. This patchset adds a command
> > > queue concept to it (i don't think it exist today but i have
> > > not follow VFIO closely).
> > >
>
> The thing is, VFIO is the only place to support DMA from user land. If we don't
> put it here, we have to create another similar facility to support the same.

No it is not, network device, GPU, block device, ... they all do
support DMA. The point i am trying to make here is that even in
your mechanisms the userspace must have a specific userspace
drivers for each hardware and thus there are virtually no
differences between having this userspace driver open a device
file in vfio or somewhere else in the device filesystem. This is
just a different path.

So this is why i do not see any benefit to having all drivers with
SVM (can we please use SVM and not SVA as SVM is what have been use
in more places so far).


Cheers,
J?r?me

2018-08-02 14:46:27

by Jerome Glisse

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Thu, Aug 02, 2018 at 11:10:00AM +0100, Alan Cox wrote:
> > One motivation I guess, is that most accelerators lack of a
> > well-abstracted high level APIs similar to GPU side (e.g. OpenCL
> > clearly defines Shared Virtual Memory models). VFIO mdev
> > might be an alternative common interface to enable SVA usages
> > on various accelerators...
>
> SVA is not IMHO the hard bit from a user level API perspective. The hard
> bit is describing what you have and enumerating the contents of the device
> especially when those can be quite dynamic and in the FPGA case can
> change on the fly.
>
> Right now we've got
> - FPGA manager
> - Google's recently posted ASIC patches
> - WarpDrive
>
> all trying to be bits of the same thing, and really there needs to be a
> single solution that handles all of this stuff properly.
>
> If we are going to have any kind of general purpose accelerator API then
> it has to be able to implement things like

Why is the existing driver model not good enough ? So you want
a device with function X you look into /dev/X (for instance
for GPU you look in /dev/dri)

Each of those device need a userspace driver and thus this
user space driver can easily knows where to look. I do not
expect that every application will reimplement those drivers
but instead use some kind of library that provide a high
level API for each of those devices.

> 'find me an accelerator with function X that is nearest my memory'
> 'find me accelerator functions X and Y that share HBM'
> 'find me accelerator functions X and Y than can be chained'
>
> If instead we have three API's depending upon whose accelerator you are
> using and whether it's FPGA or ASIC this is going to be a mess on a grand
> scale.

I see the enumeration as an orthogonal problem. There have
been talks within various circles about it because new system
bus (CAPI, CCIX, ...) imply that things like NUMA topology
you have in sysfs is not up to the task.

Now you have a hierarchy of memory for the CPU (HBM, local
node main memory aka you DDR dimm, persistent memory) each
with different properties (bandwidth, latency, ...). But
also for devices memory. Some device can have many types
of memory too. For instance a GPU might have HBM, GDDR,
persistant and multiple types of memory. Those device are
on a given CPU node but can also have inter-connect of their
own (AMD infinity, NVlink, ...).


You have HMAT which i was hopping would provide a new sysfs
that supersede old numa but it seems it is going back to
NUMA node for compatibilities reasons i guess (ccing Ross).


Anyway i think finding devices and finding relation between
devices and memory is 2 separate problems and as such should
be handled separatly.


Cheers,
J?r?me

2018-08-02 18:43:27

by Alex Williamson

[permalink] [raw]
Subject: Re: [RFC PATCH 3/7] vfio: add spimdev support

On Thu, 2 Aug 2018 10:35:28 +0200
Cornelia Huck <[email protected]> wrote:

> On Thu, 2 Aug 2018 15:34:40 +0800
> Kenneth Lee <liguozhu-C8/M+/[email protected]> wrote:
>
> > On Thu, Aug 02, 2018 at 04:24:22AM +0000, Tian, Kevin wrote:
>
> > > > From: Kenneth Lee [mailto:liguozhu-C8/M+/[email protected]]
> > > > Sent: Thursday, August 2, 2018 11:47 AM
> > > >
> > > > >
> > > > > > From: Kenneth Lee
> > > > > > Sent: Wednesday, August 1, 2018 6:22 PM
> > > > > >
> > > > > > From: Kenneth Lee <liguozhu-C8/M+/[email protected]>
> > > > > >
> > > > > > SPIMDEV is "Share Parent IOMMU Mdev". It is a vfio-mdev. But differ
> > > > from
> > > > > > the general vfio-mdev:
> > > > > >
> > > > > > 1. It shares its parent's IOMMU.
> > > > > > 2. There is no hardware resource attached to the mdev is created. The
> > > > > > hardware resource (A `queue') is allocated only when the mdev is
> > > > > > opened.
> > > > >
> > > > > Alex has concern on doing so, as pointed out in:
> > > > >
> > > > > https://www.spinics.net/lists/kvm/msg172652.html
> > > > >
> > > > > resource allocation should be reserved at creation time.
> > > >
> > > > Yes. That is why I keep telling that SPIMDEV is not for "VM", it is for "many
> > > > processes", it is just an access point to the process. Not a device to VM. I
> > > > hope
> > > > Alex can accept it:)
> > > >
> > >
> > > VFIO is just about assigning device resource to user space. It doesn't care
> > > whether it's native processes or VM using the device so far. Along the direction
> > > which you described, looks VFIO needs to support the configuration that
> > > some mdevs are used for native process only, while others can be used
> > > for both native and VM. I'm not sure whether there is a clean way to
> > > enforce it...
> >
> > I had the same idea at the beginning. But finally I found that the life cycle
> > of the virtual device for VM and process were different. Consider you create
> > some mdevs for VM use, you will give all those mdevs to lib-virt, which
> > distribute those mdev to VMs or containers. If the VM or container exits, the
> > mdev is returned to the lib-virt and used for next allocation. It is the
> > administrator who controlled every mdev's allocation.

Libvirt currently does no management of mdev devices, so I believe
this example is fictitious. The extent of libvirt's interaction with
mdev is that XML may specify an mdev UUID as the source for a hostdev
and set the permissions on the device files appropriately. Whether
mdevs are created in advance and re-used or created and destroyed
around a VM instance (for example via qemu hooks scripts) is not a
policy that libvirt imposes.

> > But for process, it is different. There is no lib-virt in control. The
> > administrator's intension is to grant some type of application to access the
> > hardware. The application can get a handle of the hardware, send request and get
> > the result. That's all. He/She dose not care which mdev is allocated to that
> > application. If it crashes, it should be the kernel's responsibility to withdraw
> > the resource, the system administrator does not want to do it by hand.

Libvirt is also not a required component for VM lifecycles, it's an
optional management interface, but there are also VM lifecycles exactly
as you describe. A VM may want a given type of vGPU, there might be
multiple sources of that type and any instance is fungible to any
other. Such an mdev can be dynamically created, assigned to the VM,
and destroyed later. Why do we need to support "empty" mdevs that do
not reserve reserve resources until opened? The concept of available
instances is entirely lost with that approach and it creates an
environment that's difficult to support, resources may not be available
at the time the user attempts to access them.

> I don't think that you should distinguish the cases by the presence of
> a management application. How can the mdev driver know what the
> intention behind using the device is?

Absolutely, vfio is a userspace driver interface, it's not tailored to
VM usage and we cannot know the intentions of the user.

> Would it make more sense to use a different mechanism to enforce that
> applications only use those handles they are supposed to use? Maybe
> cgroups? I don't think it's a good idea to push usage policy into the
> kernel.

I agree, this sounds like a userspace problem, mdev supports dynamic
creation and removal of mdev devices, if there's an issue with
maintaining a set of standby devices that a user has access to, this
sounds like a userspace broker problem. It makes more sense to me to
have a model where a userspace application can make a request to a
broker and the broker can reply with "none available" rather than
having a set of devices on standby that may or may not work depending
on the system load and other users. Thanks,

Alex

2018-08-03 03:47:21

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Thu, Aug 02, 2018 at 10:22:43AM -0400, Jerome Glisse wrote:
> Date: Thu, 2 Aug 2018 10:22:43 -0400
> From: Jerome Glisse <[email protected]>
> To: Kenneth Lee <[email protected]>
> CC: "Tian, Kevin" <[email protected]>, Hao Fang <[email protected]>,
> Alex Williamson <[email protected]>, Herbert Xu
> <[email protected]>, "[email protected]"
> <[email protected]>, Jonathan Corbet <[email protected]>, Greg
> Kroah-Hartman <[email protected]>, Zaibo Xu <[email protected]>,
> "[email protected]" <[email protected]>, "Kumar, Sanjay K"
> <[email protected]>, Kenneth Lee <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>, Philippe
> Ombredanne <[email protected]>, Thomas Gleixner <[email protected]>,
> "David S . Miller" <[email protected]>,
> "[email protected]"
> <[email protected]>
> Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
> User-Agent: Mutt/1.10.0 (2018-05-17)
> Message-ID: <[email protected]>
>
> On Thu, Aug 02, 2018 at 12:05:57PM +0800, Kenneth Lee wrote:
> > On Thu, Aug 02, 2018 at 02:33:12AM +0000, Tian, Kevin wrote:
> > > Date: Thu, 2 Aug 2018 02:33:12 +0000
> > > > From: Jerome Glisse
> > > > On Wed, Aug 01, 2018 at 06:22:14PM +0800, Kenneth Lee wrote:
> > > > > From: Kenneth Lee <[email protected]>
> > > > >
> > > > > WarpDrive is an accelerator framework to expose the hardware
> > > > capabilities
> > > > > directly to the user space. It makes use of the exist vfio and vfio-mdev
> > > > > facilities. So the user application can send request and DMA to the
> > > > > hardware without interaction with the kernel. This remove the latency
> > > > > of syscall and context switch.
> > > > >
> > > > > The patchset contains documents for the detail. Please refer to it for
> > > > more
> > > > > information.
> > > > >
> > > > > This patchset is intended to be used with Jean Philippe Brucker's SVA
> > > > > patch [1] (Which is also in RFC stage). But it is not mandatory. This
> > > > > patchset is tested in the latest mainline kernel without the SVA patches.
> > > > > So it support only one process for each accelerator.
> > > > >
> > > > > With SVA support, WarpDrive can support multi-process in the same
> > > > > accelerator device. We tested it in our SoC integrated Accelerator (board
> > > > > ID: D06, Chip ID: HIP08). A reference work tree can be found here: [2].
> > > >
> > > > I have not fully inspected things nor do i know enough about
> > > > this Hisilicon ZIP accelerator to ascertain, but from glimpsing
> > > > at the code it seems that it is unsafe to use even with SVA due
> > > > to the doorbell. There is a comment talking about safetyness
> > > > in patch 7.
> > > >
> > > > Exposing thing to userspace is always enticing, but if it is
> > > > a security risk then it should clearly say so and maybe a
> > > > kernel boot flag should be necessary to allow such device to
> > > > be use.
> > > >
> >
> > But doorbell is just a notification. Except for DOS (to make hardware busy) it
> > cannot actually take or change anything from the kernel space. And the DOS
> > problem can be always taken as the problem that a group of processes share the
> > same kernel entity.
> >
> > In the coming HIP09 hardware, the doorbell will come with a random number so
> > only the process who allocated the queue can knock it correctly.
>
> When doorbell is ring the hardware start fetching commands from
> the queue and execute them ? If so than a rogue process B might
> ring the doorbell of process A which would starts execution of
> random commands (ie whatever random memory value there is left
> inside the command buffer memory, could be old commands i guess).
>
> If this is not how this doorbell works then, yes it can only do
> a denial of service i guess. Issue i have with doorbell is that
> i have seen 10 differents implementations in 10 differents hw
> and each are different as to what ringing or value written to the
> doorbell does. It is painfull to track what is what for each hw.
>

In our implementation, doorbell is simply a notification, just like an interrupt
to the accelerator. The command is all about what's in the queue.

I agree that there is no simple and standard way to track the shared IO space.
But I think we have to trust the driver in some way. If the driver is malicious,
even a simple ioctl can become an attack.

>
> > > > My more general question is do we want to grow VFIO to become
> > > > a more generic device driver API. This patchset adds a command
> > > > queue concept to it (i don't think it exist today but i have
> > > > not follow VFIO closely).
> > > >
> >
> > The thing is, VFIO is the only place to support DMA from user land. If we don't
> > put it here, we have to create another similar facility to support the same.
>
> No it is not, network device, GPU, block device, ... they all do
> support DMA. The point i am trying to make here is that even in

Sorry, wait a minute, are we talking the same thing? I meant "DMA from user
land", not "DMA from kernel driver". To do that we have to manipulate the
IOMMU(Unit). I think it can only be done by default_domain or vfio domain. Or
the user space have to directly access the IOMMU.

> your mechanisms the userspace must have a specific userspace
> drivers for each hardware and thus there are virtually no
> differences between having this userspace driver open a device
> file in vfio or somewhere else in the device filesystem. This is
> just a different path.
>

The basic problem WarpDrive want to solve it to avoid syscall. This is important
to accelerators. We have some data here:
https://www.slideshare.net/linaroorg/progress-and-demonstration-of-wrapdrive-a-accelerator-framework-sfo17317

(see page 3)

The performance is different on using kernel and user drivers.

And we also believe the hardware interface can become standard after sometime.
Some companies have started to do this (such ARM's Revere). But before that, we
should have a software channel for it.

> So this is why i do not see any benefit to having all drivers with
> SVM (can we please use SVM and not SVA as SVM is what have been use
> in more places so far).
>

Personally, we don't care what name to be used. I used SVM when I start this
work. And then Jean said SVM had been used by AMD as Secure Virtual Machine. So
he called it SVA. And now... who should I follow? :)

>
> Cheers,
> Jérôme

--
-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

2018-08-03 14:20:43

by Alan Cox

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

> If we are going to have any kind of general purpose accelerator API then
> > it has to be able to implement things like
>
> Why is the existing driver model not good enough ? So you want
> a device with function X you look into /dev/X (for instance
> for GPU you look in /dev/dri)

Except when my GPU is in an FPGA in which case it might be somewhere else
or it's a general purpose accelerator that happens to be usable as a GPU.
Unusual today in big computer space but you'll find it in
microcontrollers.

> Each of those device need a userspace driver and thus this
> user space driver can easily knows where to look. I do not
> expect that every application will reimplement those drivers
> but instead use some kind of library that provide a high
> level API for each of those devices.

Think about it from the user level. You have a pipeline of things you
wish to execute, you need to get the right accelerator combinations and
they need to fit together to meet system constraints like number of
IOMMU ids the accelerator supports, where they are connected.

> Now you have a hierarchy of memory for the CPU (HBM, local
> node main memory aka you DDR dimm, persistent memory) each

It's not a heirarchy, it's a graph. There's no fundamental reason two
accelerators can't be close to two different CPU cores but have shared
HBM that is far from each processor. There are physical reasons it tends
to look more like a heirarchy today.

> Anyway i think finding devices and finding relation between
> devices and memory is 2 separate problems and as such should
> be handled separatly.

At a certain level they are deeply intertwined because you need a common
API. It's not good if I want a particular accelerator and need to then
see which API its under on this machine and which interface I have to
use, and maybe have a mix of FPGA, WarpDrive and Google ASIC interfaces
all different.

The job of the kernel is to impose some kind of sanity and unity on this
lot.

All of it in the end comes down to

'Somehow glue some chunk of memory into my address space and find any
supporting driver I need'

plus virtualization of the above.

That bit's easy - but making it usable is a different story.

Alan

2018-08-03 14:39:44

by Jerome Glisse

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Fri, Aug 03, 2018 at 11:47:21AM +0800, Kenneth Lee wrote:
> On Thu, Aug 02, 2018 at 10:22:43AM -0400, Jerome Glisse wrote:
> > Date: Thu, 2 Aug 2018 10:22:43 -0400
> > From: Jerome Glisse <[email protected]>
> > To: Kenneth Lee <[email protected]>
> > CC: "Tian, Kevin" <[email protected]>, Hao Fang <[email protected]>,
> > Alex Williamson <[email protected]>, Herbert Xu
> > <[email protected]>, "[email protected]"
> > <[email protected]>, Jonathan Corbet <[email protected]>, Greg
> > Kroah-Hartman <[email protected]>, Zaibo Xu <[email protected]>,
> > "[email protected]" <[email protected]>, "Kumar, Sanjay K"
> > <[email protected]>, Kenneth Lee <[email protected]>,
> > "[email protected]" <[email protected]>,
> > "[email protected]" <[email protected]>,
> > "[email protected]" <[email protected]>,
> > "[email protected]" <[email protected]>, Philippe
> > Ombredanne <[email protected]>, Thomas Gleixner <[email protected]>,
> > "David S . Miller" <[email protected]>,
> > "[email protected]"
> > <[email protected]>
> > Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
> > User-Agent: Mutt/1.10.0 (2018-05-17)
> > Message-ID: <[email protected]>
> >
> > On Thu, Aug 02, 2018 at 12:05:57PM +0800, Kenneth Lee wrote:
> > > On Thu, Aug 02, 2018 at 02:33:12AM +0000, Tian, Kevin wrote:
> > > > Date: Thu, 2 Aug 2018 02:33:12 +0000
> > > > > From: Jerome Glisse
> > > > > On Wed, Aug 01, 2018 at 06:22:14PM +0800, Kenneth Lee wrote:
> > > > > > From: Kenneth Lee <[email protected]>
> > > > > >
> > > > > > WarpDrive is an accelerator framework to expose the hardware
> > > > > capabilities
> > > > > > directly to the user space. It makes use of the exist vfio and vfio-mdev
> > > > > > facilities. So the user application can send request and DMA to the
> > > > > > hardware without interaction with the kernel. This remove the latency
> > > > > > of syscall and context switch.
> > > > > >
> > > > > > The patchset contains documents for the detail. Please refer to it for
> > > > > more
> > > > > > information.
> > > > > >
> > > > > > This patchset is intended to be used with Jean Philippe Brucker's SVA
> > > > > > patch [1] (Which is also in RFC stage). But it is not mandatory. This
> > > > > > patchset is tested in the latest mainline kernel without the SVA patches.
> > > > > > So it support only one process for each accelerator.
> > > > > >
> > > > > > With SVA support, WarpDrive can support multi-process in the same
> > > > > > accelerator device. We tested it in our SoC integrated Accelerator (board
> > > > > > ID: D06, Chip ID: HIP08). A reference work tree can be found here: [2].
> > > > >
> > > > > I have not fully inspected things nor do i know enough about
> > > > > this Hisilicon ZIP accelerator to ascertain, but from glimpsing
> > > > > at the code it seems that it is unsafe to use even with SVA due
> > > > > to the doorbell. There is a comment talking about safetyness
> > > > > in patch 7.
> > > > >
> > > > > Exposing thing to userspace is always enticing, but if it is
> > > > > a security risk then it should clearly say so and maybe a
> > > > > kernel boot flag should be necessary to allow such device to
> > > > > be use.
> > > > >
> > >
> > > But doorbell is just a notification. Except for DOS (to make hardware busy) it
> > > cannot actually take or change anything from the kernel space. And the DOS
> > > problem can be always taken as the problem that a group of processes share the
> > > same kernel entity.
> > >
> > > In the coming HIP09 hardware, the doorbell will come with a random number so
> > > only the process who allocated the queue can knock it correctly.
> >
> > When doorbell is ring the hardware start fetching commands from
> > the queue and execute them ? If so than a rogue process B might
> > ring the doorbell of process A which would starts execution of
> > random commands (ie whatever random memory value there is left
> > inside the command buffer memory, could be old commands i guess).
> >
> > If this is not how this doorbell works then, yes it can only do
> > a denial of service i guess. Issue i have with doorbell is that
> > i have seen 10 differents implementations in 10 differents hw
> > and each are different as to what ringing or value written to the
> > doorbell does. It is painfull to track what is what for each hw.
> >
>
> In our implementation, doorbell is simply a notification, just like an interrupt
> to the accelerator. The command is all about what's in the queue.
>
> I agree that there is no simple and standard way to track the shared IO space.
> But I think we have to trust the driver in some way. If the driver is malicious,
> even a simple ioctl can become an attack.

Trusting kernel space driver is fine, trusting user space driver is
not in my view. AFAICT every driver developer so far always made
sure that someone could not abuse its device to do harmfull thing to
other process.


> > > > > My more general question is do we want to grow VFIO to become
> > > > > a more generic device driver API. This patchset adds a command
> > > > > queue concept to it (i don't think it exist today but i have
> > > > > not follow VFIO closely).
> > > > >
> > >
> > > The thing is, VFIO is the only place to support DMA from user land. If we don't
> > > put it here, we have to create another similar facility to support the same.
> >
> > No it is not, network device, GPU, block device, ... they all do
> > support DMA. The point i am trying to make here is that even in
>
> Sorry, wait a minute, are we talking the same thing? I meant "DMA from user
> land", not "DMA from kernel driver". To do that we have to manipulate the
> IOMMU(Unit). I think it can only be done by default_domain or vfio domain. Or
> the user space have to directly access the IOMMU.

GPU do DMA in the sense that you pass to the kernel a valid
virtual address (kernel driver do all the proper check) and
then you can use the GPU to copy from or to that range of
virtual address. Exactly how you want to use this compression
engine. It does not rely on SVM but SVM going forward would
still be the prefered option.


> > your mechanisms the userspace must have a specific userspace
> > drivers for each hardware and thus there are virtually no
> > differences between having this userspace driver open a device
> > file in vfio or somewhere else in the device filesystem. This is
> > just a different path.
> >
>
> The basic problem WarpDrive want to solve it to avoid syscall. This is important
> to accelerators. We have some data here:
> https://www.slideshare.net/linaroorg/progress-and-demonstration-of-wrapdrive-a-accelerator-framework-sfo17317
>
> (see page 3)
>
> The performance is different on using kernel and user drivers.

Yes and example i point to is exactly that. You have a one time setup
cost (creating command buffer binding PASID with command buffer and
couple other setup steps). Then userspace no longer have to do any
ioctl to schedule work on the GPU. It is all down from userspace and
it use a doorbell to notify hardware when it should go look at command
buffer for new thing to execute.

My point stands on that. You have existing driver already doing so
with no new framework and in your scheme you need a userspace driver.
So i do not see the value add, using one path or the other in the
userspace driver is litteraly one line to change.


> And we also believe the hardware interface can become standard after sometime.
> Some companies have started to do this (such ARM's Revere). But before that, we
> should have a software channel for it.

I hope it does, but right now for every single piece of hardware you
will need a specific driver (i am ignoring backward compatible hardware
evolution as this is a thing that do exist).

Even if down the road for every class of hardware you can use the same
driver, i am not sure what the value add is to do it inside VFIO versus
a class of device driver (like USB, PCIE, DRM aka GPU, ...) ie you would
have a compression class (/dev/compress/*) a encryption one, ...


> > So this is why i do not see any benefit to having all drivers with
> > SVM (can we please use SVM and not SVA as SVM is what have been use
> > in more places so far).
> >
>
> Personally, we don't care what name to be used. I used SVM when I start this
> work. And then Jean said SVM had been used by AMD as Secure Virtual Machine. So
> he called it SVA. And now... who should I follow? :)

I think Intel call it SVM too, i do not have any strong preference
beside have only one to remember :)

Cheers,
J?r?me

2018-08-03 14:55:26

by Jerome Glisse

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Fri, Aug 03, 2018 at 03:20:43PM +0100, Alan Cox wrote:
> > If we are going to have any kind of general purpose accelerator API then
> > > it has to be able to implement things like
> >
> > Why is the existing driver model not good enough ? So you want
> > a device with function X you look into /dev/X (for instance
> > for GPU you look in /dev/dri)
>
> Except when my GPU is in an FPGA in which case it might be somewhere else
> or it's a general purpose accelerator that happens to be usable as a GPU.
> Unusual today in big computer space but you'll find it in
> microcontrollers.

You do need a specific userspace driver for each of those device
correct ?

You definitly do for GPU and i do not see that going away any time
soon. I doubt Xilinx and Altera will ever compile down to same bit-
stream format either.

So userspace application is bound to use some kind of library that
implement the userspace side of the driver and that library can
easily provide helpers to enumerate all the devices it supports.

For instance that is what OpenCL allows for both GPU and FPGA.
One single API and multiple different hardware you can target from
it.


> > Each of those device need a userspace driver and thus this
> > user space driver can easily knows where to look. I do not
> > expect that every application will reimplement those drivers
> > but instead use some kind of library that provide a high
> > level API for each of those devices.
>
> Think about it from the user level. You have a pipeline of things you
> wish to execute, you need to get the right accelerator combinations and
> they need to fit together to meet system constraints like number of
> IOMMU ids the accelerator supports, where they are connected.

Creating a pipe of device ie one device consuming the work of
the previous one, is a problem on its own and it should be solved
separatly and not inside VFIO.

GPU (on ARM) already have this pipe thing because the IP block
that do overlay, or the IP block that push pixel to the screen or
the IP block that do 3D rendering are all coming from different
company.

I do not see the value of having all the device enumerated through
VFIO to address this problem. I can definitly understand having a
specific kernel mechanism to expose to userspace what is do-able
but i believe this should be its own thing that allow any device
(a VFIO one, a network one, a GPU, a FPGA, ...) to be use in "pipe"
mode.


> > Now you have a hierarchy of memory for the CPU (HBM, local
> > node main memory aka you DDR dimm, persistent memory) each
>
> It's not a heirarchy, it's a graph. There's no fundamental reason two
> accelerators can't be close to two different CPU cores but have shared
> HBM that is far from each processor. There are physical reasons it tends
> to look more like a heirarchy today.

Yes you are right i used the wrong word.

>
> > Anyway i think finding devices and finding relation between
> > devices and memory is 2 separate problems and as such should
> > be handled separatly.
>
> At a certain level they are deeply intertwined because you need a common
> API. It's not good if I want a particular accelerator and need to then
> see which API its under on this machine and which interface I have to
> use, and maybe have a mix of FPGA, WarpDrive and Google ASIC interfaces
> all different.
>
> The job of the kernel is to impose some kind of sanity and unity on this
> lot.
>
> All of it in the end comes down to
>
> 'Somehow glue some chunk of memory into my address space and find any
> supporting driver I need'
>
> plus virtualization of the above.
>
> That bit's easy - but making it usable is a different story.

My point is that for all those devices you will have a userspace drivers
and thus all the complexity of finding best memory for a given combination
of hardware can be push to userspace. Kernel only have to expose the
topology of the different memory and their relation with each of the
devices. Very much like you have NUMA node for CPU.


I rather see kernel having one API to expose topology, one API to
expose device mixing capabilities (ie how can you can pipe device
together if at all). Then having to update every single existing
upstream driver that want to participate in the above to now become
a VFIO driver.


I have nothing against VFIO ;) Just to me it seems that this is all
reinventing a new device driver infrastructure under it while the
existing one is good enough and can evolve to support all the cases
discussed here.

Cheers,
J?r?me

2018-08-06 01:26:22

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Fri, Aug 03, 2018 at 03:20:43PM +0100, Alan Cox wrote:
> Date: Fri, 3 Aug 2018 15:20:43 +0100
> From: Alan Cox <[email protected]>
> To: Jerome Glisse <[email protected]>
> CC: "Tian, Kevin" <[email protected]>, Kenneth Lee
> <[email protected]>, Hao Fang <[email protected]>, Herbert Xu
> <[email protected]>, "[email protected]"
> <[email protected]>, Jonathan Corbet <[email protected]>, Greg
> Kroah-Hartman <[email protected]>, "[email protected]"
> <[email protected]>, "Kumar, Sanjay K" <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>, Alex Williamson
> <[email protected]>, Thomas Gleixner <[email protected]>,
> "[email protected]" <[email protected]>, Philippe
> Ombredanne <[email protected]>, Zaibo Xu <[email protected]>, Kenneth
> Lee <[email protected]>, "David S . Miller" <[email protected]>,
> Ross Zwisler <[email protected]>
> Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
> Organization: Intel Corporation
> X-Mailer: Claws Mail 3.16.0 (GTK+ 2.24.32; x86_64-redhat-linux-gnu)
> Message-ID: <20180803152043.40f88947@alans-desktop>
>
> > If we are going to have any kind of general purpose accelerator API then
> > > it has to be able to implement things like
> >
> > Why is the existing driver model not good enough ? So you want
> > a device with function X you look into /dev/X (for instance
> > for GPU you look in /dev/dri)
>
> Except when my GPU is in an FPGA in which case it might be somewhere else
> or it's a general purpose accelerator that happens to be usable as a GPU.
> Unusual today in big computer space but you'll find it in
> microcontrollers.
>
> > Each of those device need a userspace driver and thus this
> > user space driver can easily knows where to look. I do not
> > expect that every application will reimplement those drivers
> > but instead use some kind of library that provide a high
> > level API for each of those devices.
>
> Think about it from the user level. You have a pipeline of things you
> wish to execute, you need to get the right accelerator combinations and
> they need to fit together to meet system constraints like number of
> IOMMU ids the accelerator supports, where they are connected.
>
> > Now you have a hierarchy of memory for the CPU (HBM, local
> > node main memory aka you DDR dimm, persistent memory) each
>
> It's not a heirarchy, it's a graph. There's no fundamental reason two
> accelerators can't be close to two different CPU cores but have shared
> HBM that is far from each processor. There are physical reasons it tends
> to look more like a heirarchy today.
>
> > Anyway i think finding devices and finding relation between
> > devices and memory is 2 separate problems and as such should
> > be handled separatly.
>
> At a certain level they are deeply intertwined because you need a common
> API. It's not good if I want a particular accelerator and need to then
> see which API its under on this machine and which interface I have to
> use, and maybe have a mix of FPGA, WarpDrive and Google ASIC interfaces
> all different.
>
> The job of the kernel is to impose some kind of sanity and unity on this
> lot.
>
> All of it in the end comes down to
>
> 'Somehow glue some chunk of memory into my address space and find any
> supporting driver I need'
>

Agree. This is also our intension on WarpDrive. And it looks VFIO is the best
place to fulfill this requirement.

> plus virtualization of the above.
>
> That bit's easy - but making it usable is a different story.
>
> Alan

--
-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

2018-08-06 01:40:04

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 3/7] vfio: add spimdev support

On Thu, Aug 02, 2018 at 12:43:27PM -0600, Alex Williamson wrote:
> Date: Thu, 2 Aug 2018 12:43:27 -0600
> From: Alex Williamson <[email protected]>
> To: Cornelia Huck <[email protected]>
> CC: Kenneth Lee <[email protected]>, "Tian, Kevin"
> <[email protected]>, Kenneth Lee <[email protected]>, Jonathan Corbet
> <[email protected]>, Herbert Xu <[email protected]>, "David S .
> Miller" <[email protected]>, Joerg Roedel <[email protected]>, Hao Fang
> <[email protected]>, Zhou Wang <[email protected]>, Zaibo Xu
> <[email protected]>, Philippe Ombredanne <[email protected]>, "Greg
> Kroah-Hartman" <[email protected]>, Thomas Gleixner
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]\"
> <[email protected]>, Lu Baolu
> <[email protected]>, Kumar", <Sanjay K "
> <[email protected]>, " [email protected] "
> <[email protected]>">
> Subject: Re: [RFC PATCH 3/7] vfio: add spimdev support
> Message-ID: <[email protected]>
>
> On Thu, 2 Aug 2018 10:35:28 +0200
> Cornelia Huck <[email protected]> wrote:
>
> > On Thu, 2 Aug 2018 15:34:40 +0800
> > Kenneth Lee <[email protected]> wrote:
> >
> > > On Thu, Aug 02, 2018 at 04:24:22AM +0000, Tian, Kevin wrote:
> >
> > > > > From: Kenneth Lee [mailto:[email protected]]
> > > > > Sent: Thursday, August 2, 2018 11:47 AM
> > > > >
> > > > > >
> > > > > > > From: Kenneth Lee
> > > > > > > Sent: Wednesday, August 1, 2018 6:22 PM
> > > > > > >
> > > > > > > From: Kenneth Lee <[email protected]>
> > > > > > >
> > > > > > > SPIMDEV is "Share Parent IOMMU Mdev". It is a vfio-mdev. But differ
> > > > > from
> > > > > > > the general vfio-mdev:
> > > > > > >
> > > > > > > 1. It shares its parent's IOMMU.
> > > > > > > 2. There is no hardware resource attached to the mdev is created. The
> > > > > > > hardware resource (A `queue') is allocated only when the mdev is
> > > > > > > opened.
> > > > > >
> > > > > > Alex has concern on doing so, as pointed out in:
> > > > > >
> > > > > > https://www.spinics.net/lists/kvm/msg172652.html
> > > > > >
> > > > > > resource allocation should be reserved at creation time.
> > > > >
> > > > > Yes. That is why I keep telling that SPIMDEV is not for "VM", it is for "many
> > > > > processes", it is just an access point to the process. Not a device to VM. I
> > > > > hope
> > > > > Alex can accept it:)
> > > > >
> > > >
> > > > VFIO is just about assigning device resource to user space. It doesn't care
> > > > whether it's native processes or VM using the device so far. Along the direction
> > > > which you described, looks VFIO needs to support the configuration that
> > > > some mdevs are used for native process only, while others can be used
> > > > for both native and VM. I'm not sure whether there is a clean way to
> > > > enforce it...
> > >
> > > I had the same idea at the beginning. But finally I found that the life cycle
> > > of the virtual device for VM and process were different. Consider you create
> > > some mdevs for VM use, you will give all those mdevs to lib-virt, which
> > > distribute those mdev to VMs or containers. If the VM or container exits, the
> > > mdev is returned to the lib-virt and used for next allocation. It is the
> > > administrator who controlled every mdev's allocation.
>
> Libvirt currently does no management of mdev devices, so I believe
> this example is fictitious. The extent of libvirt's interaction with
> mdev is that XML may specify an mdev UUID as the source for a hostdev
> and set the permissions on the device files appropriately. Whether
> mdevs are created in advance and re-used or created and destroyed
> around a VM instance (for example via qemu hooks scripts) is not a
> policy that libvirt imposes.
>
> > > But for process, it is different. There is no lib-virt in control. The
> > > administrator's intension is to grant some type of application to access the
> > > hardware. The application can get a handle of the hardware, send request and get
> > > the result. That's all. He/She dose not care which mdev is allocated to that
> > > application. If it crashes, it should be the kernel's responsibility to withdraw
> > > the resource, the system administrator does not want to do it by hand.
>
> Libvirt is also not a required component for VM lifecycles, it's an
> optional management interface, but there are also VM lifecycles exactly
> as you describe. A VM may want a given type of vGPU, there might be
> multiple sources of that type and any instance is fungible to any
> other. Such an mdev can be dynamically created, assigned to the VM,
> and destroyed later. Why do we need to support "empty" mdevs that do
> not reserve reserve resources until opened? The concept of available
> instances is entirely lost with that approach and it creates an
> environment that's difficult to support, resources may not be available
> at the time the user attempts to access them.
>
> > I don't think that you should distinguish the cases by the presence of
> > a management application. How can the mdev driver know what the
> > intention behind using the device is?
>
> Absolutely, vfio is a userspace driver interface, it's not tailored to
> VM usage and we cannot know the intentions of the user.
>
> > Would it make more sense to use a different mechanism to enforce that
> > applications only use those handles they are supposed to use? Maybe
> > cgroups? I don't think it's a good idea to push usage policy into the
> > kernel.
>
> I agree, this sounds like a userspace problem, mdev supports dynamic
> creation and removal of mdev devices, if there's an issue with
> maintaining a set of standby devices that a user has access to, this
> sounds like a userspace broker problem. It makes more sense to me to
> have a model where a userspace application can make a request to a
> broker and the broker can reply with "none available" rather than
> having a set of devices on standby that may or may not work depending
> on the system load and other users. Thanks,
>
> Alex

I am sorry, I used a wrong mutt command when reply to Cornelia's last mail. The
last reply dose not stay within this thread. So please let me repeat my point
here.

I should not have use libvirt as the example. But WarpDrive works in such
scenario:

1. It supports thousands of processes. Take zip accelerator as an example, any
application need data compression/decompression will need to interact with the
accelerator. To support that, you have to create tens of thousands of mdev for
their usage. I don't think it is a good idea to have so many devices in the
system.

2. The application does not want to own the mdev for long. It just need an
access point for the hardware service. If it has to interact with an management
agent for allocation and release, this makes the problem complex.

3. The service is bound with the process. When the process exit, the resource
should be released automatically. Kernel is the best place to monitor the state
of the process.

I agree this extending the concept of mdev. But again, it is cleaner than
creating another facility for user land DMA. We just need to take mdev as an
access point of the device: when it is open, the resource is given. It is not a
device for a particular entity or instance. But it is still a device which can
provide service of the hardware.

Cornelia is worrying about resource starving. I think that can be solved by set
restriction on the mdev itself. Mdev management agent dose not help much here.
Management on the mdev itself can still lead to the status of running out of
resource.

Thanks


--
-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

2018-08-06 03:12:52

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Fri, Aug 03, 2018 at 10:39:44AM -0400, Jerome Glisse wrote:
> Date: Fri, 3 Aug 2018 10:39:44 -0400
> From: Jerome Glisse <[email protected]>
> To: Kenneth Lee <[email protected]>
> CC: "Tian, Kevin" <[email protected]>, Herbert Xu
> <[email protected]>, "[email protected]"
> <[email protected]>, Jonathan Corbet <[email protected]>, Greg
> Kroah-Hartman <[email protected]>, Zaibo Xu <[email protected]>,
> "[email protected]" <[email protected]>, "Kumar, Sanjay K"
> <[email protected]>, Hao Fang <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>, Alex Williamson
> <[email protected]>, "[email protected]"
> <[email protected]>, Philippe Ombredanne
> <[email protected]>, Thomas Gleixner <[email protected]>, Kenneth Lee
> <[email protected]>, "David S . Miller" <[email protected]>,
> "[email protected]"
> <[email protected]>
> Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
> User-Agent: Mutt/1.10.0 (2018-05-17)
> Message-ID: <[email protected]>
>
> On Fri, Aug 03, 2018 at 11:47:21AM +0800, Kenneth Lee wrote:
> > On Thu, Aug 02, 2018 at 10:22:43AM -0400, Jerome Glisse wrote:
> > > Date: Thu, 2 Aug 2018 10:22:43 -0400
> > > From: Jerome Glisse <[email protected]>
> > > To: Kenneth Lee <[email protected]>
> > > CC: "Tian, Kevin" <[email protected]>, Hao Fang <[email protected]>,
> > > Alex Williamson <[email protected]>, Herbert Xu
> > > <[email protected]>, "[email protected]"
> > > <[email protected]>, Jonathan Corbet <[email protected]>, Greg
> > > Kroah-Hartman <[email protected]>, Zaibo Xu <[email protected]>,
> > > "[email protected]" <[email protected]>, "Kumar, Sanjay K"
> > > <[email protected]>, Kenneth Lee <[email protected]>,
> > > "[email protected]" <[email protected]>,
> > > "[email protected]" <[email protected]>,
> > > "[email protected]" <[email protected]>,
> > > "[email protected]" <[email protected]>, Philippe
> > > Ombredanne <[email protected]>, Thomas Gleixner <[email protected]>,
> > > "David S . Miller" <[email protected]>,
> > > "[email protected]"
> > > <[email protected]>
> > > Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
> > > User-Agent: Mutt/1.10.0 (2018-05-17)
> > > Message-ID: <[email protected]>
> > >
> > > On Thu, Aug 02, 2018 at 12:05:57PM +0800, Kenneth Lee wrote:
> > > > On Thu, Aug 02, 2018 at 02:33:12AM +0000, Tian, Kevin wrote:
> > > > > Date: Thu, 2 Aug 2018 02:33:12 +0000
> > > > > > From: Jerome Glisse
> > > > > > On Wed, Aug 01, 2018 at 06:22:14PM +0800, Kenneth Lee wrote:
> > > > > > > From: Kenneth Lee <[email protected]>
> > > > > > >
> > > > > > > WarpDrive is an accelerator framework to expose the hardware
> > > > > > capabilities
> > > > > > > directly to the user space. It makes use of the exist vfio and vfio-mdev
> > > > > > > facilities. So the user application can send request and DMA to the
> > > > > > > hardware without interaction with the kernel. This remove the latency
> > > > > > > of syscall and context switch.
> > > > > > >
> > > > > > > The patchset contains documents for the detail. Please refer to it for
> > > > > > more
> > > > > > > information.
> > > > > > >
> > > > > > > This patchset is intended to be used with Jean Philippe Brucker's SVA
> > > > > > > patch [1] (Which is also in RFC stage). But it is not mandatory. This
> > > > > > > patchset is tested in the latest mainline kernel without the SVA patches.
> > > > > > > So it support only one process for each accelerator.
> > > > > > >
> > > > > > > With SVA support, WarpDrive can support multi-process in the same
> > > > > > > accelerator device. We tested it in our SoC integrated Accelerator (board
> > > > > > > ID: D06, Chip ID: HIP08). A reference work tree can be found here: [2].
> > > > > >
> > > > > > I have not fully inspected things nor do i know enough about
> > > > > > this Hisilicon ZIP accelerator to ascertain, but from glimpsing
> > > > > > at the code it seems that it is unsafe to use even with SVA due
> > > > > > to the doorbell. There is a comment talking about safetyness
> > > > > > in patch 7.
> > > > > >
> > > > > > Exposing thing to userspace is always enticing, but if it is
> > > > > > a security risk then it should clearly say so and maybe a
> > > > > > kernel boot flag should be necessary to allow such device to
> > > > > > be use.
> > > > > >
> > > >
> > > > But doorbell is just a notification. Except for DOS (to make hardware busy) it
> > > > cannot actually take or change anything from the kernel space. And the DOS
> > > > problem can be always taken as the problem that a group of processes share the
> > > > same kernel entity.
> > > >
> > > > In the coming HIP09 hardware, the doorbell will come with a random number so
> > > > only the process who allocated the queue can knock it correctly.
> > >
> > > When doorbell is ring the hardware start fetching commands from
> > > the queue and execute them ? If so than a rogue process B might
> > > ring the doorbell of process A which would starts execution of
> > > random commands (ie whatever random memory value there is left
> > > inside the command buffer memory, could be old commands i guess).
> > >
> > > If this is not how this doorbell works then, yes it can only do
> > > a denial of service i guess. Issue i have with doorbell is that
> > > i have seen 10 differents implementations in 10 differents hw
> > > and each are different as to what ringing or value written to the
> > > doorbell does. It is painfull to track what is what for each hw.
> > >
> >
> > In our implementation, doorbell is simply a notification, just like an interrupt
> > to the accelerator. The command is all about what's in the queue.
> >
> > I agree that there is no simple and standard way to track the shared IO space.
> > But I think we have to trust the driver in some way. If the driver is malicious,
> > even a simple ioctl can become an attack.
>
> Trusting kernel space driver is fine, trusting user space driver is
> not in my view. AFAICT every driver developer so far always made
> sure that someone could not abuse its device to do harmfull thing to
> other process.
>

Fully agree. That is why this driver shares only the doorbell space. There is
only the doorbell is shared in the whole page, nothing else.

Maybe you are concerning the user driver will give malicious command to the
hardware? But these commands cannot influence the other process. If we can trust
the hardware design, the process cannot do any harm.

>
> > > > > > My more general question is do we want to grow VFIO to become
> > > > > > a more generic device driver API. This patchset adds a command
> > > > > > queue concept to it (i don't think it exist today but i have
> > > > > > not follow VFIO closely).
> > > > > >
> > > >
> > > > The thing is, VFIO is the only place to support DMA from user land. If we don't
> > > > put it here, we have to create another similar facility to support the same.
> > >
> > > No it is not, network device, GPU, block device, ... they all do
> > > support DMA. The point i am trying to make here is that even in
> >
> > Sorry, wait a minute, are we talking the same thing? I meant "DMA from user
> > land", not "DMA from kernel driver". To do that we have to manipulate the
> > IOMMU(Unit). I think it can only be done by default_domain or vfio domain. Or
> > the user space have to directly access the IOMMU.
>
> GPU do DMA in the sense that you pass to the kernel a valid
> virtual address (kernel driver do all the proper check) and
> then you can use the GPU to copy from or to that range of
> virtual address. Exactly how you want to use this compression
> engine. It does not rely on SVM but SVM going forward would
> still be the prefered option.
>

No, SVM is not the reason why we rely on Jean's SVM(SVA) series. We rely on
Jean's series because of multi-process (PASID or substream ID) support.

But of couse, WarpDrive can still benefit from the SVM feature.

>
> > > your mechanisms the userspace must have a specific userspace
> > > drivers for each hardware and thus there are virtually no
> > > differences between having this userspace driver open a device
> > > file in vfio or somewhere else in the device filesystem. This is
> > > just a different path.
> > >
> >
> > The basic problem WarpDrive want to solve it to avoid syscall. This is important
> > to accelerators. We have some data here:
> > https://www.slideshare.net/linaroorg/progress-and-demonstration-of-wrapdrive-a-accelerator-framework-sfo17317
> >
> > (see page 3)
> >
> > The performance is different on using kernel and user drivers.
>
> Yes and example i point to is exactly that. You have a one time setup
> cost (creating command buffer binding PASID with command buffer and
> couple other setup steps). Then userspace no longer have to do any
> ioctl to schedule work on the GPU. It is all down from userspace and
> it use a doorbell to notify hardware when it should go look at command
> buffer for new thing to execute.
>
> My point stands on that. You have existing driver already doing so
> with no new framework and in your scheme you need a userspace driver.
> So i do not see the value add, using one path or the other in the
> userspace driver is litteraly one line to change.
>

Sorry, I'd got confuse here. I partially agree that the user driver is
redundance of kernel driver. (But for WarpDrive, the kernel driver is a full
driver include all preparation and setup stuff for the hardware, the user driver
is simply to send request and receive answer). Yes, it is just a choice of path.
But the user path is faster if the request come from use space. And to do that,
we need user land DMA support. Then why is it invaluable to let VFIO involved?

>
> > And we also believe the hardware interface can become standard after sometime.
> > Some companies have started to do this (such ARM's Revere). But before that, we
> > should have a software channel for it.
>
> I hope it does, but right now for every single piece of hardware you
> will need a specific driver (i am ignoring backward compatible hardware
> evolution as this is a thing that do exist).
>
> Even if down the road for every class of hardware you can use the same
> driver, i am not sure what the value add is to do it inside VFIO versus
> a class of device driver (like USB, PCIE, DRM aka GPU, ...) ie you would
> have a compression class (/dev/compress/*) a encryption one, ...
>
>
> > > So this is why i do not see any benefit to having all drivers with
> > > SVM (can we please use SVM and not SVA as SVM is what have been use
> > > in more places so far).
> > >
> >
> > Personally, we don't care what name to be used. I used SVM when I start this
> > work. And then Jean said SVM had been used by AMD as Secure Virtual Machine. So
> > he called it SVA. And now... who should I follow? :)
>
> I think Intel call it SVM too, i do not have any strong preference
> beside have only one to remember :)

OK. Then let's call it SVM for now.

>
> Cheers,
> Jérôme

--
-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

2018-08-06 12:27:33

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC PATCH 1/7] vfio/spimdev: Add documents for WarpDrive framework

Hi!

> WarpDrive is a common user space accelerator framework. Its main component
> in Kernel is called spimdev, Share Parent IOMMU Mediated Device. It exposes

spimdev is really unfortunate name. It looks like it has something to do with SPI, but
it does not.

> +++ b/Documentation/warpdrive/warpdrive.rst
> @@ -0,0 +1,153 @@
> +Introduction of WarpDrive
> +=========================
> +
> +*WarpDrive* is a general accelerator framework built on top of vfio.
> +It can be taken as a light weight virtual function, which you can use without
> +*SR-IOV* like facility and can be shared among multiple processes.
> +
> +It can be used as the quick channel for accelerators, network adaptors or
> +other hardware in user space. It can make some implementation simpler. E.g.
> +you can reuse most of the *netdev* driver and just share some ring buffer to
> +the user space driver for *DPDK* or *ODP*. Or you can combine the RSA
> +accelerator with the *netdev* in the user space as a Web reversed proxy, etc.

What is DPDK? ODP?

> +How does it work
> +================
> +
> +*WarpDrive* takes the Hardware Accelerator as a heterogeneous processor which
> +can share some load for the CPU:
> +
> +.. image:: wd.svg
> + :alt: This is a .svg image, if your browser cannot show it,
> + try to download and view it locally
> +
> +So it provides the capability to the user application to:
> +
> +1. Send request to the hardware
> +2. Share memory with the application and other accelerators
> +
> +These requirements can be fulfilled by VFIO if the accelerator can serve each
> +application with a separated Virtual Function. But a *SR-IOV* like VF (we will
> +call it *HVF* hereinafter) design is too heavy for the accelerator which
> +service thousands of processes.

VFIO? VF? HVF?

Also "gup" might be worth spelling out.

> +References
> +==========
> +.. [1] Accroding to the comment in in mm/gup.c, The *gup* is only safe within
> + a syscall. Because it can only keep the physical memory in place
> + without making sure the VMA will always point to it. Maybe we should
> + raise the VM_PINNED patchset (see
> + https://lists.gt.net/linux/kernel/1931993) again to solve this probl


I went through the docs, but I still don't know what it does.
Pavel

2018-08-06 15:32:58

by Jerome Glisse

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Mon, Aug 06, 2018 at 11:12:52AM +0800, Kenneth Lee wrote:
> On Fri, Aug 03, 2018 at 10:39:44AM -0400, Jerome Glisse wrote:
> > On Fri, Aug 03, 2018 at 11:47:21AM +0800, Kenneth Lee wrote:
> > > On Thu, Aug 02, 2018 at 10:22:43AM -0400, Jerome Glisse wrote:
> > > > On Thu, Aug 02, 2018 at 12:05:57PM +0800, Kenneth Lee wrote:
> > > > > On Thu, Aug 02, 2018 at 02:33:12AM +0000, Tian, Kevin wrote:
> > > > > > > On Wed, Aug 01, 2018 at 06:22:14PM +0800, Kenneth Lee wrote:
> > > > > > > >

[...]

> > > > > But doorbell is just a notification. Except for DOS (to make hardware busy) it
> > > > > cannot actually take or change anything from the kernel space. And the DOS
> > > > > problem can be always taken as the problem that a group of processes share the
> > > > > same kernel entity.
> > > > >
> > > > > In the coming HIP09 hardware, the doorbell will come with a random number so
> > > > > only the process who allocated the queue can knock it correctly.
> > > >
> > > > When doorbell is ring the hardware start fetching commands from
> > > > the queue and execute them ? If so than a rogue process B might
> > > > ring the doorbell of process A which would starts execution of
> > > > random commands (ie whatever random memory value there is left
> > > > inside the command buffer memory, could be old commands i guess).
> > > >
> > > > If this is not how this doorbell works then, yes it can only do
> > > > a denial of service i guess. Issue i have with doorbell is that
> > > > i have seen 10 differents implementations in 10 differents hw
> > > > and each are different as to what ringing or value written to the
> > > > doorbell does. It is painfull to track what is what for each hw.
> > > >
> > >
> > > In our implementation, doorbell is simply a notification, just like an interrupt
> > > to the accelerator. The command is all about what's in the queue.
> > >
> > > I agree that there is no simple and standard way to track the shared IO space.
> > > But I think we have to trust the driver in some way. If the driver is malicious,
> > > even a simple ioctl can become an attack.
> >
> > Trusting kernel space driver is fine, trusting user space driver is
> > not in my view. AFAICT every driver developer so far always made
> > sure that someone could not abuse its device to do harmfull thing to
> > other process.
> >
>
> Fully agree. That is why this driver shares only the doorbell space. There is
> only the doorbell is shared in the whole page, nothing else.
>
> Maybe you are concerning the user driver will give malicious command to the
> hardware? But these commands cannot influence the other process. If we can trust
> the hardware design, the process cannot do any harm.

My questions was what happens if a process B ring the doorbell of
process A.

On some hardware the value written in the doorbell is use as an
index in command buffer. On other it just wakeup the hardware to go
look at a structure private to the process. They are other variations
of those themes.

If it is the former ie the value is use to advance in the command
buffer then a rogue process can force another process to advance its
command buffer and what is in the command buffer can be some random
old memory values which can be more harmfull than just Denial Of
Service.


> > > > > > > My more general question is do we want to grow VFIO to become
> > > > > > > a more generic device driver API. This patchset adds a command
> > > > > > > queue concept to it (i don't think it exist today but i have
> > > > > > > not follow VFIO closely).
> > > > > > >
> > > > >
> > > > > The thing is, VFIO is the only place to support DMA from user land. If we don't
> > > > > put it here, we have to create another similar facility to support the same.
> > > >
> > > > No it is not, network device, GPU, block device, ... they all do
> > > > support DMA. The point i am trying to make here is that even in
> > >
> > > Sorry, wait a minute, are we talking the same thing? I meant "DMA from user
> > > land", not "DMA from kernel driver". To do that we have to manipulate the
> > > IOMMU(Unit). I think it can only be done by default_domain or vfio domain. Or
> > > the user space have to directly access the IOMMU.
> >
> > GPU do DMA in the sense that you pass to the kernel a valid
> > virtual address (kernel driver do all the proper check) and
> > then you can use the GPU to copy from or to that range of
> > virtual address. Exactly how you want to use this compression
> > engine. It does not rely on SVM but SVM going forward would
> > still be the prefered option.
> >
>
> No, SVM is not the reason why we rely on Jean's SVM(SVA) series. We rely on
> Jean's series because of multi-process (PASID or substream ID) support.
>
> But of couse, WarpDrive can still benefit from the SVM feature.

We are getting side tracked here. PASID/ID do not require VFIO.


> > > > your mechanisms the userspace must have a specific userspace
> > > > drivers for each hardware and thus there are virtually no
> > > > differences between having this userspace driver open a device
> > > > file in vfio or somewhere else in the device filesystem. This is
> > > > just a different path.
> > > >
> > >
> > > The basic problem WarpDrive want to solve it to avoid syscall. This is important
> > > to accelerators. We have some data here:
> > > https://www.slideshare.net/linaroorg/progress-and-demonstration-of-wrapdrive-a-accelerator-framework-sfo17317
> > >
> > > (see page 3)
> > >
> > > The performance is different on using kernel and user drivers.
> >
> > Yes and example i point to is exactly that. You have a one time setup
> > cost (creating command buffer binding PASID with command buffer and
> > couple other setup steps). Then userspace no longer have to do any
> > ioctl to schedule work on the GPU. It is all down from userspace and
> > it use a doorbell to notify hardware when it should go look at command
> > buffer for new thing to execute.
> >
> > My point stands on that. You have existing driver already doing so
> > with no new framework and in your scheme you need a userspace driver.
> > So i do not see the value add, using one path or the other in the
> > userspace driver is litteraly one line to change.
> >
>
> Sorry, I'd got confuse here. I partially agree that the user driver is
> redundance of kernel driver. (But for WarpDrive, the kernel driver is a full
> driver include all preparation and setup stuff for the hardware, the user driver
> is simply to send request and receive answer). Yes, it is just a choice of path.
> But the user path is faster if the request come from use space. And to do that,
> we need user land DMA support. Then why is it invaluable to let VFIO involved?

Some drivers in the kernel already do exactly what you said. The user
space emit commands without ever going into kernel by directly scheduling
commands and ringing a doorbell. They do not need VFIO either and they
can map userspace address into the DMA address space of the device and
again they do not need VFIO for that.

My point is the you do not need VFIO for DMA in user land, nor do you need
it to allow a device to consume user space commands without IOCTL.

Moreover as you already need a device specific driver in both kernel and
user space then there is not added value in trying to have all kind of
devices under the same devfs hierarchy.

Cheers,
J?r?me

2018-08-06 15:49:40

by Alex Williamson

[permalink] [raw]
Subject: Re: [RFC PATCH 3/7] vfio: add spimdev support

On Mon, 6 Aug 2018 09:40:04 +0800
Kenneth Lee <[email protected]> wrote:

> On Thu, Aug 02, 2018 at 12:43:27PM -0600, Alex Williamson wrote:
> > Date: Thu, 2 Aug 2018 12:43:27 -0600
> > From: Alex Williamson <[email protected]>
> > To: Cornelia Huck <[email protected]>
> > CC: Kenneth Lee <[email protected]>, "Tian, Kevin"
> > <[email protected]>, Kenneth Lee <[email protected]>, Jonathan Corbet
> > <[email protected]>, Herbert Xu <[email protected]>, "David S .
> > Miller" <[email protected]>, Joerg Roedel <[email protected]>, Hao Fang
> > <[email protected]>, Zhou Wang <[email protected]>, Zaibo Xu
> > <[email protected]>, Philippe Ombredanne <[email protected]>, "Greg
> > Kroah-Hartman" <[email protected]>, Thomas Gleixner
> > <[email protected]>, "[email protected]"
> > <[email protected]>, "[email protected]"
> > <[email protected]>, "[email protected]"
> > <[email protected]>, "[email protected]"
> > <[email protected]>, "[email protected]"
> > <[email protected]>, "[email protected]\"
> > <[email protected]>, Lu Baolu
> > <[email protected]>, Kumar", <Sanjay K "
> > <[email protected]>, " [email protected] "
> > <[email protected]>">
> > Subject: Re: [RFC PATCH 3/7] vfio: add spimdev support
> > Message-ID: <[email protected]>
> >
> > On Thu, 2 Aug 2018 10:35:28 +0200
> > Cornelia Huck <[email protected]> wrote:
> >
> > > On Thu, 2 Aug 2018 15:34:40 +0800
> > > Kenneth Lee <[email protected]> wrote:
> > >
> > > > On Thu, Aug 02, 2018 at 04:24:22AM +0000, Tian, Kevin wrote:
> > >
> > > > > > From: Kenneth Lee [mailto:[email protected]]
> > > > > > Sent: Thursday, August 2, 2018 11:47 AM
> > > > > >
> > > > > > >
> > > > > > > > From: Kenneth Lee
> > > > > > > > Sent: Wednesday, August 1, 2018 6:22 PM
> > > > > > > >
> > > > > > > > From: Kenneth Lee <[email protected]>
> > > > > > > >
> > > > > > > > SPIMDEV is "Share Parent IOMMU Mdev". It is a vfio-mdev. But differ
> > > > > > from
> > > > > > > > the general vfio-mdev:
> > > > > > > >
> > > > > > > > 1. It shares its parent's IOMMU.
> > > > > > > > 2. There is no hardware resource attached to the mdev is created. The
> > > > > > > > hardware resource (A `queue') is allocated only when the mdev is
> > > > > > > > opened.
> > > > > > >
> > > > > > > Alex has concern on doing so, as pointed out in:
> > > > > > >
> > > > > > > https://www.spinics.net/lists/kvm/msg172652.html
> > > > > > >
> > > > > > > resource allocation should be reserved at creation time.
> > > > > >
> > > > > > Yes. That is why I keep telling that SPIMDEV is not for "VM", it is for "many
> > > > > > processes", it is just an access point to the process. Not a device to VM. I
> > > > > > hope
> > > > > > Alex can accept it:)
> > > > > >
> > > > >
> > > > > VFIO is just about assigning device resource to user space. It doesn't care
> > > > > whether it's native processes or VM using the device so far. Along the direction
> > > > > which you described, looks VFIO needs to support the configuration that
> > > > > some mdevs are used for native process only, while others can be used
> > > > > for both native and VM. I'm not sure whether there is a clean way to
> > > > > enforce it...
> > > >
> > > > I had the same idea at the beginning. But finally I found that the life cycle
> > > > of the virtual device for VM and process were different. Consider you create
> > > > some mdevs for VM use, you will give all those mdevs to lib-virt, which
> > > > distribute those mdev to VMs or containers. If the VM or container exits, the
> > > > mdev is returned to the lib-virt and used for next allocation. It is the
> > > > administrator who controlled every mdev's allocation.
> >
> > Libvirt currently does no management of mdev devices, so I believe
> > this example is fictitious. The extent of libvirt's interaction with
> > mdev is that XML may specify an mdev UUID as the source for a hostdev
> > and set the permissions on the device files appropriately. Whether
> > mdevs are created in advance and re-used or created and destroyed
> > around a VM instance (for example via qemu hooks scripts) is not a
> > policy that libvirt imposes.
> >
> > > > But for process, it is different. There is no lib-virt in control. The
> > > > administrator's intension is to grant some type of application to access the
> > > > hardware. The application can get a handle of the hardware, send request and get
> > > > the result. That's all. He/She dose not care which mdev is allocated to that
> > > > application. If it crashes, it should be the kernel's responsibility to withdraw
> > > > the resource, the system administrator does not want to do it by hand.
> >
> > Libvirt is also not a required component for VM lifecycles, it's an
> > optional management interface, but there are also VM lifecycles exactly
> > as you describe. A VM may want a given type of vGPU, there might be
> > multiple sources of that type and any instance is fungible to any
> > other. Such an mdev can be dynamically created, assigned to the VM,
> > and destroyed later. Why do we need to support "empty" mdevs that do
> > not reserve reserve resources until opened? The concept of available
> > instances is entirely lost with that approach and it creates an
> > environment that's difficult to support, resources may not be available
> > at the time the user attempts to access them.
> >
> > > I don't think that you should distinguish the cases by the presence of
> > > a management application. How can the mdev driver know what the
> > > intention behind using the device is?
> >
> > Absolutely, vfio is a userspace driver interface, it's not tailored to
> > VM usage and we cannot know the intentions of the user.
> >
> > > Would it make more sense to use a different mechanism to enforce that
> > > applications only use those handles they are supposed to use? Maybe
> > > cgroups? I don't think it's a good idea to push usage policy into the
> > > kernel.
> >
> > I agree, this sounds like a userspace problem, mdev supports dynamic
> > creation and removal of mdev devices, if there's an issue with
> > maintaining a set of standby devices that a user has access to, this
> > sounds like a userspace broker problem. It makes more sense to me to
> > have a model where a userspace application can make a request to a
> > broker and the broker can reply with "none available" rather than
> > having a set of devices on standby that may or may not work depending
> > on the system load and other users. Thanks,
> >
> > Alex
>
> I am sorry, I used a wrong mutt command when reply to Cornelia's last mail. The
> last reply dose not stay within this thread. So please let me repeat my point
> here.
>
> I should not have use libvirt as the example. But WarpDrive works in such
> scenario:
>
> 1. It supports thousands of processes. Take zip accelerator as an example, any
> application need data compression/decompression will need to interact with the
> accelerator. To support that, you have to create tens of thousands of mdev for
> their usage. I don't think it is a good idea to have so many devices in the
> system.

Each mdev is a device, regardless of whether there are hardware
resources committed to the device, so I don't understand this argument.

> 2. The application does not want to own the mdev for long. It just need an
> access point for the hardware service. If it has to interact with an management
> agent for allocation and release, this makes the problem complex.

I don't see how the length of the usage plays a role here either. Are
you concerned that the time it takes to create and remove an mdev is
significant compared to the usage time? Userspace is certainly welcome
to create a pool of devices, but why should it be the kernel's
responsibility to dynamically assign resources to an mdev? What's the
usage model when resources are unavailable? It seems there's
complexity in either case, but it's generally userspace's responsibility
to impose a policy.

> 3. The service is bound with the process. When the process exit, the resource
> should be released automatically. Kernel is the best place to monitor the state
> of the process.

Mdev already provides that when an mdev is removed, the hardware
resources attached to it are released back to the mdev parent device.
A process closing the device simply indicates the end of a usage
context of the device. It seems like the request here is simply that
allocating resources on open allows userspace to be lazy and overcommit
physical resources without considering what happens when those
resources are unavailable.

> I agree this extending the concept of mdev. But again, it is cleaner than
> creating another facility for user land DMA. We just need to take mdev as an
> access point of the device: when it is open, the resource is given. It is not a
> device for a particular entity or instance. But it is still a device which can
> provide service of the hardware.

Cleaner for who? It's asking the kernel to impose a policy for
delegating resources when we effectively already have a policy that
userspace is responsible for allocating and delegating resources.

> Cornelia is worrying about resource starving. I think that can be solved by set
> restriction on the mdev itself. Mdev management agent dose not help much here.
> Management on the mdev itself can still lead to the status of running out of
> resource.

The restriction on the mdev is that the mdev itself represents allocated
resources. Of course we can always run out, but the current model is
that a user granted access to a vfio device, mdev or otherwise, has
ownership of the hardware provided through that interface. I don't see
how not committing resources to an mdev is anything more than an
attempt to push policy and error handling from one place to another.
Thanks,

Alex

2018-08-06 16:34:28

by Ashok Raj

[permalink] [raw]
Subject: Re: [RFC PATCH 3/7] vfio: add spimdev support

On Mon, Aug 06, 2018 at 09:49:40AM -0600, Alex Williamson wrote:
> On Mon, 6 Aug 2018 09:40:04 +0800
> Kenneth Lee <[email protected]> wrote:
> >
> > 1. It supports thousands of processes. Take zip accelerator as an example, any
> > application need data compression/decompression will need to interact with the
> > accelerator. To support that, you have to create tens of thousands of mdev for
> > their usage. I don't think it is a good idea to have so many devices in the
> > system.
>
> Each mdev is a device, regardless of whether there are hardware
> resources committed to the device, so I don't understand this argument.
>
> > 2. The application does not want to own the mdev for long. It just need an
> > access point for the hardware service. If it has to interact with an management
> > agent for allocation and release, this makes the problem complex.
>
> I don't see how the length of the usage plays a role here either. Are
> you concerned that the time it takes to create and remove an mdev is
> significant compared to the usage time? Userspace is certainly welcome
> to create a pool of devices, but why should it be the kernel's
> responsibility to dynamically assign resources to an mdev? What's the
> usage model when resources are unavailable? It seems there's
> complexity in either case, but it's generally userspace's responsibility
> to impose a policy.
>

Can vfio dev's created representing an mdev be shared between several
processes? It doesn't need to be exclusive.

The path to hardware is established by the processes binding to SVM and
IOMMU ensuring that the PASID is plummed properly. One can think the
same hardware is shared between several processes, hardware knows the
isolation is via the PASID.

For these cases it isn't required to create a dev per process.

Cheers,
Ashok

2018-08-06 17:05:21

by Alex Williamson

[permalink] [raw]
Subject: Re: [RFC PATCH 3/7] vfio: add spimdev support

On Mon, 6 Aug 2018 09:34:28 -0700
"Raj, Ashok" <[email protected]> wrote:

> On Mon, Aug 06, 2018 at 09:49:40AM -0600, Alex Williamson wrote:
> > On Mon, 6 Aug 2018 09:40:04 +0800
> > Kenneth Lee <[email protected]> wrote:
> > >
> > > 1. It supports thousands of processes. Take zip accelerator as an example, any
> > > application need data compression/decompression will need to interact with the
> > > accelerator. To support that, you have to create tens of thousands of mdev for
> > > their usage. I don't think it is a good idea to have so many devices in the
> > > system.
> >
> > Each mdev is a device, regardless of whether there are hardware
> > resources committed to the device, so I don't understand this argument.
> >
> > > 2. The application does not want to own the mdev for long. It just need an
> > > access point for the hardware service. If it has to interact with an management
> > > agent for allocation and release, this makes the problem complex.
> >
> > I don't see how the length of the usage plays a role here either. Are
> > you concerned that the time it takes to create and remove an mdev is
> > significant compared to the usage time? Userspace is certainly welcome
> > to create a pool of devices, but why should it be the kernel's
> > responsibility to dynamically assign resources to an mdev? What's the
> > usage model when resources are unavailable? It seems there's
> > complexity in either case, but it's generally userspace's responsibility
> > to impose a policy.
> >
>
> Can vfio dev's created representing an mdev be shared between several
> processes? It doesn't need to be exclusive.
>
> The path to hardware is established by the processes binding to SVM and
> IOMMU ensuring that the PASID is plummed properly. One can think the
> same hardware is shared between several processes, hardware knows the
> isolation is via the PASID.
>
> For these cases it isn't required to create a dev per process.

The iommu group is the unit of ownership, a vfio group mirrors an iommu
group, therefore a vfio group only allows a single open(2). A group
also represents the minimum isolation set of devices, therefore devices
within a group are not considered isolated and must share the same
address space represented by the vfio container. Beyond that, it is
possible to share devices among processes, but (I think) it generally
implies a hierarchical rather than peer relationship between
processes. Thanks,

Alex

2018-08-08 01:08:42

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive



在 2018年08月06日 星期一 11:32 下午, Jerome Glisse 写道:
> On Mon, Aug 06, 2018 at 11:12:52AM +0800, Kenneth Lee wrote:
>> On Fri, Aug 03, 2018 at 10:39:44AM -0400, Jerome Glisse wrote:
>>> On Fri, Aug 03, 2018 at 11:47:21AM +0800, Kenneth Lee wrote:
>>>> On Thu, Aug 02, 2018 at 10:22:43AM -0400, Jerome Glisse wrote:
>>>>> On Thu, Aug 02, 2018 at 12:05:57PM +0800, Kenneth Lee wrote:
>>>>>> On Thu, Aug 02, 2018 at 02:33:12AM +0000, Tian, Kevin wrote:
>>>>>>>> On Wed, Aug 01, 2018 at 06:22:14PM +0800, Kenneth Lee wrote:
> [...]
>
>>>>>> But doorbell is just a notification. Except for DOS (to make hardware busy) it
>>>>>> cannot actually take or change anything from the kernel space. And the DOS
>>>>>> problem can be always taken as the problem that a group of processes share the
>>>>>> same kernel entity.
>>>>>>
>>>>>> In the coming HIP09 hardware, the doorbell will come with a random number so
>>>>>> only the process who allocated the queue can knock it correctly.
>>>>> When doorbell is ring the hardware start fetching commands from
>>>>> the queue and execute them ? If so than a rogue process B might
>>>>> ring the doorbell of process A which would starts execution of
>>>>> random commands (ie whatever random memory value there is left
>>>>> inside the command buffer memory, could be old commands i guess).
>>>>>
>>>>> If this is not how this doorbell works then, yes it can only do
>>>>> a denial of service i guess. Issue i have with doorbell is that
>>>>> i have seen 10 differents implementations in 10 differents hw
>>>>> and each are different as to what ringing or value written to the
>>>>> doorbell does. It is painfull to track what is what for each hw.
>>>>>
>>>> In our implementation, doorbell is simply a notification, just like an interrupt
>>>> to the accelerator. The command is all about what's in the queue.
>>>>
>>>> I agree that there is no simple and standard way to track the shared IO space.
>>>> But I think we have to trust the driver in some way. If the driver is malicious,
>>>> even a simple ioctl can become an attack.
>>> Trusting kernel space driver is fine, trusting user space driver is
>>> not in my view. AFAICT every driver developer so far always made
>>> sure that someone could not abuse its device to do harmfull thing to
>>> other process.
>>>
>> Fully agree. That is why this driver shares only the doorbell space. There is
>> only the doorbell is shared in the whole page, nothing else.
>>
>> Maybe you are concerning the user driver will give malicious command to the
>> hardware? But these commands cannot influence the other process. If we can trust
>> the hardware design, the process cannot do any harm.
> My questions was what happens if a process B ring the doorbell of
> process A.
>
> On some hardware the value written in the doorbell is use as an
> index in command buffer. On other it just wakeup the hardware to go
> look at a structure private to the process. They are other variations
> of those themes.
>
> If it is the former ie the value is use to advance in the command
> buffer then a rogue process can force another process to advance its
> command buffer and what is in the command buffer can be some random
> old memory values which can be more harmfull than just Denial Of
> Service.

Yes. We have considered that. There is no other information in the
doorbell. The indexes, such as head and tail pointers, are all in the
shared memory between the hardware and the user process. The other
process cannot touch it.
>
>>>>>>>> My more general question is do we want to grow VFIO to become
>>>>>>>> a more generic device driver API. This patchset adds a command
>>>>>>>> queue concept to it (i don't think it exist today but i have
>>>>>>>> not follow VFIO closely).
>>>>>>>>
>>>>>> The thing is, VFIO is the only place to support DMA from user land. If we don't
>>>>>> put it here, we have to create another similar facility to support the same.
>>>>> No it is not, network device, GPU, block device, ... they all do
>>>>> support DMA. The point i am trying to make here is that even in
>>>> Sorry, wait a minute, are we talking the same thing? I meant "DMA from user
>>>> land", not "DMA from kernel driver". To do that we have to manipulate the
>>>> IOMMU(Unit). I think it can only be done by default_domain or vfio domain. Or
>>>> the user space have to directly access the IOMMU.
>>> GPU do DMA in the sense that you pass to the kernel a valid
>>> virtual address (kernel driver do all the proper check) and
>>> then you can use the GPU to copy from or to that range of
>>> virtual address. Exactly how you want to use this compression
>>> engine. It does not rely on SVM but SVM going forward would
>>> still be the prefered option.
>>>
>> No, SVM is not the reason why we rely on Jean's SVM(SVA) series. We rely on
>> Jean's series because of multi-process (PASID or substream ID) support.
>>
>> But of couse, WarpDrive can still benefit from the SVM feature.
> We are getting side tracked here. PASID/ID do not require VFIO.
>
Yes, PASID itself do not require VFIO. But what if:

1. Support DMA from user space.
2. The hardware makes use of standard IOMMU/SMMU for IO address translation.
3. The IOMMU facility is shared by both kernel and user drivers.
4. Support PASID with the current IOMMU facility
>>>>> your mechanisms the userspace must have a specific userspace
>>>>> drivers for each hardware and thus there are virtually no
>>>>> differences between having this userspace driver open a device
>>>>> file in vfio or somewhere else in the device filesystem. This is
>>>>> just a different path.
>>>>>
>>>> The basic problem WarpDrive want to solve it to avoid syscall. This is important
>>>> to accelerators. We have some data here:
>>>> https://www.slideshare.net/linaroorg/progress-and-demonstration-of-wrapdrive-a-accelerator-framework-sfo17317
>>>>
>>>> (see page 3)
>>>>
>>>> The performance is different on using kernel and user drivers.
>>> Yes and example i point to is exactly that. You have a one time setup
>>> cost (creating command buffer binding PASID with command buffer and
>>> couple other setup steps). Then userspace no longer have to do any
>>> ioctl to schedule work on the GPU. It is all down from userspace and
>>> it use a doorbell to notify hardware when it should go look at command
>>> buffer for new thing to execute.
>>>
>>> My point stands on that. You have existing driver already doing so
>>> with no new framework and in your scheme you need a userspace driver.
>>> So i do not see the value add, using one path or the other in the
>>> userspace driver is litteraly one line to change.
>>>
>> Sorry, I'd got confuse here. I partially agree that the user driver is
>> redundance of kernel driver. (But for WarpDrive, the kernel driver is a full
>> driver include all preparation and setup stuff for the hardware, the user driver
>> is simply to send request and receive answer). Yes, it is just a choice of path.
>> But the user path is faster if the request come from use space. And to do that,
>> we need user land DMA support. Then why is it invaluable to let VFIO involved?
> Some drivers in the kernel already do exactly what you said. The user
> space emit commands without ever going into kernel by directly scheduling
> commands and ringing a doorbell. They do not need VFIO either and they
> can map userspace address into the DMA address space of the device and
> again they do not need VFIO for that.
Could you please directly point out which driver you refer to here?
Thank you.
>
> My point is the you do not need VFIO for DMA in user land, nor do you need
> it to allow a device to consume user space commands without IOCTL.
>
> Moreover as you already need a device specific driver in both kernel and
> user space then there is not added value in trying to have all kind of
> devices under the same devfs hierarchy.
>
> Cheers,
> Jérôme
>

Cheers
Kenneth(Hisilicon)

2018-08-08 01:32:51

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 3/7] vfio: add spimdev support



在 2018年08月07日 星期二 01:05 上午, Alex Williamson 写道:
> On Mon, 6 Aug 2018 09:34:28 -0700
> "Raj, Ashok" <[email protected]> wrote:
>
>> On Mon, Aug 06, 2018 at 09:49:40AM -0600, Alex Williamson wrote:
>>> On Mon, 6 Aug 2018 09:40:04 +0800
>>> Kenneth Lee <[email protected]> wrote:
>>>> 1. It supports thousands of processes. Take zip accelerator as an example, any
>>>> application need data compression/decompression will need to interact with the
>>>> accelerator. To support that, you have to create tens of thousands of mdev for
>>>> their usage. I don't think it is a good idea to have so many devices in the
>>>> system.
>>> Each mdev is a device, regardless of whether there are hardware
>>> resources committed to the device, so I don't understand this argument.
>>>
>>>> 2. The application does not want to own the mdev for long. It just need an
>>>> access point for the hardware service. If it has to interact with an management
>>>> agent for allocation and release, this makes the problem complex.
>>> I don't see how the length of the usage plays a role here either. Are
>>> you concerned that the time it takes to create and remove an mdev is
>>> significant compared to the usage time? Userspace is certainly welcome
>>> to create a pool of devices, but why should it be the kernel's
>>> responsibility to dynamically assign resources to an mdev? What's the
>>> usage model when resources are unavailable? It seems there's
>>> complexity in either case, but it's generally userspace's responsibility
>>> to impose a policy.
>>>
>> Can vfio dev's created representing an mdev be shared between several
>> processes? It doesn't need to be exclusive.
>>
>> The path to hardware is established by the processes binding to SVM and
>> IOMMU ensuring that the PASID is plummed properly. One can think the
>> same hardware is shared between several processes, hardware knows the
>> isolation is via the PASID.
>>
>> For these cases it isn't required to create a dev per process.
> The iommu group is the unit of ownership, a vfio group mirrors an iommu
> group, therefore a vfio group only allows a single open(2). A group
> also represents the minimum isolation set of devices, therefore devices
> within a group are not considered isolated and must share the same
> address space represented by the vfio container. Beyond that, it is
> possible to share devices among processes, but (I think) it generally
> implies a hierarchical rather than peer relationship between
> processes. Thanks,
Actually, this is the key problem we concerned. Our logic was: The PASID
refer to the connection between the device and the process. So the
resource should be allocated only when the process "make use of" the
device. This strategy also bring another advantage that the kernel
driver can also make use of the resource if no user application open it.

We do have another branch that allocate resource to mdev directly. It
looks not so nice (many mdevs and user agent is required for resource
management). If the conclusion here is to keep the mdev's original
semantics, we will send that branch for discussion in next RFC.

Cheers
Kenneth
>
> Alex
>

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

2018-08-08 01:43:05

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 1/7] vfio/spimdev: Add documents for WarpDrive framework



?? 2018??08??06?? ????һ 08:27 ????, Pavel Machek д??:
> Hi!
>
>> WarpDrive is a common user space accelerator framework. Its main component
>> in Kernel is called spimdev, Share Parent IOMMU Mediated Device. It exposes
> spimdev is really unfortunate name. It looks like it has something to do with SPI, but
> it does not.
>
Yes. Let me change it to Share (IOMMU) Domain MDev, SDMdev:)
>> +++ b/Documentation/warpdrive/warpdrive.rst
>> @@ -0,0 +1,153 @@
>> +Introduction of WarpDrive
>> +=========================
>> +
>> +*WarpDrive* is a general accelerator framework built on top of vfio.
>> +It can be taken as a light weight virtual function, which you can use without
>> +*SR-IOV* like facility and can be shared among multiple processes.
>> +
>> +It can be used as the quick channel for accelerators, network adaptors or
>> +other hardware in user space. It can make some implementation simpler. E.g.
>> +you can reuse most of the *netdev* driver and just share some ring buffer to
>> +the user space driver for *DPDK* or *ODP*. Or you can combine the RSA
>> +accelerator with the *netdev* in the user space as a Web reversed proxy, etc.
> What is DPDK? ODP?
DPDK??https://www.dpdk.org/about/
ODP: https://www.opendataplane.org/

will add the reference in the next RFC
>
>> +How does it work
>> +================
>> +
>> +*WarpDrive* takes the Hardware Accelerator as a heterogeneous processor which
>> +can share some load for the CPU:
>> +
>> +.. image:: wd.svg
>> + :alt: This is a .svg image, if your browser cannot show it,
>> + try to download and view it locally
>> +
>> +So it provides the capability to the user application to:
>> +
>> +1. Send request to the hardware
>> +2. Share memory with the application and other accelerators
>> +
>> +These requirements can be fulfilled by VFIO if the accelerator can serve each
>> +application with a separated Virtual Function. But a *SR-IOV* like VF (we will
>> +call it *HVF* hereinafter) design is too heavy for the accelerator which
>> +service thousands of processes.
> VFIO? VF? HVF?
>
> Also "gup" might be worth spelling out.
But I think the reference [1] has explained this.
>
>> +References
>> +==========
>> +.. [1] Accroding to the comment in in mm/gup.c, The *gup* is only safe within
>> + a syscall. Because it can only keep the physical memory in place
>> + without making sure the VMA will always point to it. Maybe we should
>> + raise the VM_PINNED patchset (see
>> + https://lists.gt.net/linux/kernel/1931993) again to solve this probl
>
> I went through the docs, but I still don't know what it does.
Will refine the doc in next RFC, hope it will help.
> Pavel
>

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

2018-08-08 09:13:54

by Joerg Roedel

[permalink] [raw]
Subject: Re: [RFC PATCH 2/7] iommu: Add share domain interface in iommu for spimdev

On Wed, Aug 01, 2018 at 06:22:16PM +0800, Kenneth Lee wrote:
> From: Kenneth Lee <[email protected]>
>
> This patch add sharing interface for a iommu_group. The new interface:
>
> iommu_group_share_domain()
> iommu_group_unshare_domain()
>
> can be used by some virtual iommu_group (such as iommu_group for spimdev)
> to share their parent's iommu_group.
>
> When the domain of the group is shared, it cannot be changed before
> unshared. In the future, notification can be added if update is required.

>From the description or the patch I don't really get why this is needed.
Please update the description and make a case for this API addition.


Joerg

2018-08-08 15:18:35

by Jerome Glisse

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Wed, Aug 08, 2018 at 09:08:42AM +0800, Kenneth Lee wrote:
>
>
> 在 2018年08月06日 星期一 11:32 下午, Jerome Glisse 写道:
> > On Mon, Aug 06, 2018 at 11:12:52AM +0800, Kenneth Lee wrote:
> > > On Fri, Aug 03, 2018 at 10:39:44AM -0400, Jerome Glisse wrote:
> > > > On Fri, Aug 03, 2018 at 11:47:21AM +0800, Kenneth Lee wrote:
> > > > > On Thu, Aug 02, 2018 at 10:22:43AM -0400, Jerome Glisse wrote:
> > > > > > On Thu, Aug 02, 2018 at 12:05:57PM +0800, Kenneth Lee wrote:
> > > > > > > On Thu, Aug 02, 2018 at 02:33:12AM +0000, Tian, Kevin wrote:
> > > > > > > > > On Wed, Aug 01, 2018 at 06:22:14PM +0800, Kenneth Lee wrote:

[...]

> > > > > > > > > My more general question is do we want to grow VFIO to become
> > > > > > > > > a more generic device driver API. This patchset adds a command
> > > > > > > > > queue concept to it (i don't think it exist today but i have
> > > > > > > > > not follow VFIO closely).
> > > > > > > > >
> > > > > > > The thing is, VFIO is the only place to support DMA from user land. If we don't
> > > > > > > put it here, we have to create another similar facility to support the same.
> > > > > > No it is not, network device, GPU, block device, ... they all do
> > > > > > support DMA. The point i am trying to make here is that even in
> > > > > Sorry, wait a minute, are we talking the same thing? I meant "DMA from user
> > > > > land", not "DMA from kernel driver". To do that we have to manipulate the
> > > > > IOMMU(Unit). I think it can only be done by default_domain or vfio domain. Or
> > > > > the user space have to directly access the IOMMU.
> > > > GPU do DMA in the sense that you pass to the kernel a valid
> > > > virtual address (kernel driver do all the proper check) and
> > > > then you can use the GPU to copy from or to that range of
> > > > virtual address. Exactly how you want to use this compression
> > > > engine. It does not rely on SVM but SVM going forward would
> > > > still be the prefered option.
> > > >
> > > No, SVM is not the reason why we rely on Jean's SVM(SVA) series. We rely on
> > > Jean's series because of multi-process (PASID or substream ID) support.
> > >
> > > But of couse, WarpDrive can still benefit from the SVM feature.
> > We are getting side tracked here. PASID/ID do not require VFIO.
> >
> Yes, PASID itself do not require VFIO. But what if:
>
> 1. Support DMA from user space.
> 2. The hardware makes use of standard IOMMU/SMMU for IO address translation.
> 3. The IOMMU facility is shared by both kernel and user drivers.
> 4. Support PASID with the current IOMMU facility

I do not see how any of this means it has to be in VFIO.
Other devices do just that. GPUs driver for instance share
DMA engine (that copy data around) between kernel and user
space. Sometime kernel use it to move things around. Evict
some memory to make room for a new process is the common
example. Same DMA engines is often use by userspace itself
during rendering or compute (program moving things on there
own). So they are already kernel driver that do all 4 of
the above and are not in VFIO.


> > > > > > your mechanisms the userspace must have a specific userspace
> > > > > > drivers for each hardware and thus there are virtually no
> > > > > > differences between having this userspace driver open a device
> > > > > > file in vfio or somewhere else in the device filesystem. This is
> > > > > > just a different path.
> > > > > >
> > > > > The basic problem WarpDrive want to solve it to avoid syscall. This is important
> > > > > to accelerators. We have some data here:
> > > > > https://www.slideshare.net/linaroorg/progress-and-demonstration-of-wrapdrive-a-accelerator-framework-sfo17317
> > > > >
> > > > > (see page 3)
> > > > >
> > > > > The performance is different on using kernel and user drivers.
> > > > Yes and example i point to is exactly that. You have a one time setup
> > > > cost (creating command buffer binding PASID with command buffer and
> > > > couple other setup steps). Then userspace no longer have to do any
> > > > ioctl to schedule work on the GPU. It is all down from userspace and
> > > > it use a doorbell to notify hardware when it should go look at command
> > > > buffer for new thing to execute.
> > > >
> > > > My point stands on that. You have existing driver already doing so
> > > > with no new framework and in your scheme you need a userspace driver.
> > > > So i do not see the value add, using one path or the other in the
> > > > userspace driver is litteraly one line to change.
> > > >
> > > Sorry, I'd got confuse here. I partially agree that the user driver is
> > > redundance of kernel driver. (But for WarpDrive, the kernel driver is a full
> > > driver include all preparation and setup stuff for the hardware, the user driver
> > > is simply to send request and receive answer). Yes, it is just a choice of path.
> > > But the user path is faster if the request come from use space. And to do that,
> > > we need user land DMA support. Then why is it invaluable to let VFIO involved?
> > Some drivers in the kernel already do exactly what you said. The user
> > space emit commands without ever going into kernel by directly scheduling
> > commands and ringing a doorbell. They do not need VFIO either and they
> > can map userspace address into the DMA address space of the device and
> > again they do not need VFIO for that.
> Could you please directly point out which driver you refer to here? Thank
> you.

drivers/gpu/drm/amd/

Sub-directory of interest is amdkfd

Because it is a big driver here is a highlevel overview of how it works
(this is a simplification):
- Process can allocate GPUs buffer (through ioclt) and map them into
its address space (through mmap of device file at buffer object
specific offset).
- Process can map any valid range of virtual address space into device
address space (IOMMU mapping). This must be regular memory ie not an
mmap of a device file or any special file (this is the non PASID
path)
- Process can create a command queue and bind its process to it aka
PASID, this is done through an ioctl.
- Process can schedule commands onto queues it created from userspace
without ioctl. For that it just write command into a ring buffer
that it mapped during the command queue creation process and it
rings a doorbell when commands are ready to be consume by the
hardware.
- Commands can reference (access) all 3 types of object above ie
either full GPUs buffer, process regular memory maped as object
(non PASID) and PASID memory all at the same time ie you can
mix all of the above in same commands queue.
- Kernel can evict, unbind any process command queues, unbind commands
queue are still valid from process point of view but commands
process schedules on them will not be executed until kernel re-bind
the queue.
- Kernel can schedule commands itself onto its dedicated command
queues (kernel driver create its own command queues).
- Kernel can control priorities between all the queues ie it can
decides which queues should the hardware executed first next.

I believe all of the above are the aspects that matters to you. The main
reason i don't like creating a new driver infrastructure is that a lot
of existing drivers will want to use some of the new features that are
coming (memory topology, where to place process memory, pipeline devices,
...) and thus existing drivers are big (GPU drivers are the biggest of
all the kernel drivers).

So rewritting those existing drivers into VFIO or into any new infra-
structure so that they can leverage new features is a no go from my
point of view.

I would rather see a set of helpers so that each features can be use
either by new drivers or existing drivers. For instance a new way to
expose memory topology. A new way to expose how you can pipe devices
from one to another ...


Hence i do not see any value in a whole new infra-structure in which
drivers must be part of to leverage new features.

Cheers,
Jérôme
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

2018-08-09 01:09:38

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 2/7] iommu: Add share domain interface in iommu for spimdev

On Wed, Aug 08, 2018 at 11:13:54AM +0200, Joerg Roedel wrote:
> Date: Wed, 8 Aug 2018 11:13:54 +0200
> From: Joerg Roedel <[email protected]>
> To: Kenneth Lee <[email protected]>
> CC: Jonathan Corbet <[email protected]>, Herbert Xu
> <[email protected]>, "David S . Miller" <davem-fT/PcQaiUtIeIZ0/[email protected]>,
> Alex Williamson <[email protected]>, Kenneth Lee
> <liguozhu-C8/M+/[email protected]>, Hao Fang <[email protected]>, Zhou Wang
> <wangzhou1-C8/M+/[email protected]>, Zaibo Xu <[email protected]>, Philippe
> Ombredanne <pombredanne-od1rfyK75/[email protected]>, Greg Kroah-Hartman
> <[email protected]>, Thomas Gleixner <[email protected]>,
> [email protected], [email protected],
> [email protected], iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
> [email protected], linux-accelerators-uLR06cmDAlY/[email protected], Lu Baolu
> <baolu.lu-VuQAYsv1563Yd54FQh9/[email protected]>, Sanjay Kumar <[email protected]>,
> [email protected]
> Subject: Re: [RFC PATCH 2/7] iommu: Add share domain interface in iommu for
> spimdev
> User-Agent: NeoMutt/20170421 (1.8.2)
> Message-ID: <20180808091354.ppqgineql3pufwwr-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
>
> On Wed, Aug 01, 2018 at 06:22:16PM +0800, Kenneth Lee wrote:
> > From: Kenneth Lee <liguozhu-C8/M+/[email protected]>
> >
> > This patch add sharing interface for a iommu_group. The new interface:
> >
> > iommu_group_share_domain()
> > iommu_group_unshare_domain()
> >
> > can be used by some virtual iommu_group (such as iommu_group for spimdev)
> > to share their parent's iommu_group.
> >
> > When the domain of the group is shared, it cannot be changed before
> > unshared. In the future, notification can be added if update is required.
>
> >From the description or the patch I don't really get why this is needed.
> Please update the description and make a case for this API addition.
>

Yes, I will add more description in next version.

The idea here is:

1. iommu_domain is the setting of the IOMMU(the unit) of an iommu_group.

2. The iommu_group originally uses a default domain or a created one (e.g. by vfio)
as its effective domain. There is only one user for a domain. If a device is
bound to vfio driver, it uses the new created domain. And if it is unbound from
vfio driver, it is switched back to the default domain.

3. But for spimdev, we create an mdev which share it parent device's IOMMU. That
means, the parent device (along with its iommu group) will continue use its
iommu_domain, and the mdev will use the same at the same time.

4. This API, is to tell the iommu_group that the domain is shared by the
spimdev. And it should be changed before it is unshared.

Cheers

>
> Joerg

--
-Kenneth(Hisilicon)

2018-08-09 08:03:52

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Wed, Aug 08, 2018 at 11:18:35AM -0400, Jerome Glisse wrote:
> Date: Wed, 8 Aug 2018 11:18:35 -0400
> From: Jerome Glisse <[email protected]>
> To: Kenneth Lee <[email protected]>
> CC: Kenneth Lee <[email protected]>, "Tian, Kevin"
> <[email protected]>, Alex Williamson <[email protected]>,
> Herbert Xu <[email protected]>, "[email protected]"
> <[email protected]>, Jonathan Corbet <[email protected]>, Greg
> Kroah-Hartman <[email protected]>, Zaibo Xu <[email protected]>,
> "[email protected]" <[email protected]>, "Kumar, Sanjay K"
> <[email protected]>, Hao Fang <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>, Philippe
> Ombredanne <[email protected]>, Thomas Gleixner <[email protected]>,
> "David S . Miller" <[email protected]>,
> "[email protected]"
> <[email protected]>
> Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
> User-Agent: Mutt/1.10.0 (2018-05-17)
> Message-ID: <[email protected]>
>
> On Wed, Aug 08, 2018 at 09:08:42AM +0800, Kenneth Lee wrote:
> >
> >
> > 在 2018年08月06日 星期一 11:32 下午, Jerome Glisse 写道:
> > > On Mon, Aug 06, 2018 at 11:12:52AM +0800, Kenneth Lee wrote:
> > > > On Fri, Aug 03, 2018 at 10:39:44AM -0400, Jerome Glisse wrote:
> > > > > On Fri, Aug 03, 2018 at 11:47:21AM +0800, Kenneth Lee wrote:
> > > > > > On Thu, Aug 02, 2018 at 10:22:43AM -0400, Jerome Glisse wrote:
> > > > > > > On Thu, Aug 02, 2018 at 12:05:57PM +0800, Kenneth Lee wrote:
> > > > > > > > On Thu, Aug 02, 2018 at 02:33:12AM +0000, Tian, Kevin wrote:
> > > > > > > > > > On Wed, Aug 01, 2018 at 06:22:14PM +0800, Kenneth Lee wrote:
>
> [...]
>
> > > > > > > > > > My more general question is do we want to grow VFIO to become
> > > > > > > > > > a more generic device driver API. This patchset adds a command
> > > > > > > > > > queue concept to it (i don't think it exist today but i have
> > > > > > > > > > not follow VFIO closely).
> > > > > > > > > >
> > > > > > > > The thing is, VFIO is the only place to support DMA from user land. If we don't
> > > > > > > > put it here, we have to create another similar facility to support the same.
> > > > > > > No it is not, network device, GPU, block device, ... they all do
> > > > > > > support DMA. The point i am trying to make here is that even in
> > > > > > Sorry, wait a minute, are we talking the same thing? I meant "DMA from user
> > > > > > land", not "DMA from kernel driver". To do that we have to manipulate the
> > > > > > IOMMU(Unit). I think it can only be done by default_domain or vfio domain. Or
> > > > > > the user space have to directly access the IOMMU.
> > > > > GPU do DMA in the sense that you pass to the kernel a valid
> > > > > virtual address (kernel driver do all the proper check) and
> > > > > then you can use the GPU to copy from or to that range of
> > > > > virtual address. Exactly how you want to use this compression
> > > > > engine. It does not rely on SVM but SVM going forward would
> > > > > still be the prefered option.
> > > > >
> > > > No, SVM is not the reason why we rely on Jean's SVM(SVA) series. We rely on
> > > > Jean's series because of multi-process (PASID or substream ID) support.
> > > >
> > > > But of couse, WarpDrive can still benefit from the SVM feature.
> > > We are getting side tracked here. PASID/ID do not require VFIO.
> > >
> > Yes, PASID itself do not require VFIO. But what if:
> >
> > 1. Support DMA from user space.
> > 2. The hardware makes use of standard IOMMU/SMMU for IO address translation.
> > 3. The IOMMU facility is shared by both kernel and user drivers.
> > 4. Support PASID with the current IOMMU facility
>
> I do not see how any of this means it has to be in VFIO.
> Other devices do just that. GPUs driver for instance share
> DMA engine (that copy data around) between kernel and user
> space. Sometime kernel use it to move things around. Evict
> some memory to make room for a new process is the common
> example. Same DMA engines is often use by userspace itself
> during rendering or compute (program moving things on there
> own). So they are already kernel driver that do all 4 of
> the above and are not in VFIO.
>

I think our divergence is on "it is common that some device drivers use IOMMU
for user land DMA operation". Let us dive into this in the AMD case.

>
> > > > > > > your mechanisms the userspace must have a specific userspace
> > > > > > > drivers for each hardware and thus there are virtually no
> > > > > > > differences between having this userspace driver open a device
> > > > > > > file in vfio or somewhere else in the device filesystem. This is
> > > > > > > just a different path.
> > > > > > >
> > > > > > The basic problem WarpDrive want to solve it to avoid syscall. This is important
> > > > > > to accelerators. We have some data here:
> > > > > > https://www.slideshare.net/linaroorg/progress-and-demonstration-of-wrapdrive-a-accelerator-framework-sfo17317
> > > > > >
> > > > > > (see page 3)
> > > > > >
> > > > > > The performance is different on using kernel and user drivers.
> > > > > Yes and example i point to is exactly that. You have a one time setup
> > > > > cost (creating command buffer binding PASID with command buffer and
> > > > > couple other setup steps). Then userspace no longer have to do any
> > > > > ioctl to schedule work on the GPU. It is all down from userspace and
> > > > > it use a doorbell to notify hardware when it should go look at command
> > > > > buffer for new thing to execute.
> > > > >
> > > > > My point stands on that. You have existing driver already doing so
> > > > > with no new framework and in your scheme you need a userspace driver.
> > > > > So i do not see the value add, using one path or the other in the
> > > > > userspace driver is litteraly one line to change.
> > > > >
> > > > Sorry, I'd got confuse here. I partially agree that the user driver is
> > > > redundance of kernel driver. (But for WarpDrive, the kernel driver is a full
> > > > driver include all preparation and setup stuff for the hardware, the user driver
> > > > is simply to send request and receive answer). Yes, it is just a choice of path.
> > > > But the user path is faster if the request come from use space. And to do that,
> > > > we need user land DMA support. Then why is it invaluable to let VFIO involved?
> > > Some drivers in the kernel already do exactly what you said. The user
> > > space emit commands without ever going into kernel by directly scheduling
> > > commands and ringing a doorbell. They do not need VFIO either and they
> > > can map userspace address into the DMA address space of the device and
> > > again they do not need VFIO for that.
> > Could you please directly point out which driver you refer to here? Thank
> > you.
>
> drivers/gpu/drm/amd/
>
> Sub-directory of interest is amdkfd
>
> Because it is a big driver here is a highlevel overview of how it works
> (this is a simplification):
> - Process can allocate GPUs buffer (through ioclt) and map them into
> its address space (through mmap of device file at buffer object
> specific offset).
> - Process can map any valid range of virtual address space into device
> address space (IOMMU mapping). This must be regular memory ie not an
> mmap of a device file or any special file (this is the non PASID
> path)
> - Process can create a command queue and bind its process to it aka
> PASID, this is done through an ioctl.
> - Process can schedule commands onto queues it created from userspace
> without ioctl. For that it just write command into a ring buffer
> that it mapped during the command queue creation process and it
> rings a doorbell when commands are ready to be consume by the
> hardware.
> - Commands can reference (access) all 3 types of object above ie
> either full GPUs buffer, process regular memory maped as object
> (non PASID) and PASID memory all at the same time ie you can
> mix all of the above in same commands queue.
> - Kernel can evict, unbind any process command queues, unbind commands
> queue are still valid from process point of view but commands
> process schedules on them will not be executed until kernel re-bind
> the queue.
> - Kernel can schedule commands itself onto its dedicated command
> queues (kernel driver create its own command queues).
> - Kernel can control priorities between all the queues ie it can
> decides which queues should the hardware executed first next.
>

Thank you. Now I think I understand the point. Indeed, I can see some drivers,
such GPU and IB, attach their own iommu_domain to their iommu_group and do their
own iommu_map().

But we have another requirement which is to combine some device together to
share the same address space. This is a little like these kinds of solution:

http://tce.technion.ac.il/wp-content/uploads/sites/8/2015/06/SC-7.2-M.-Silberstein.pdf

With that, the application can directly pass the NiC packet pointer to the
decryption accelerator, and get the bare data in place. This is the feature that
the VFIO container can provide.

> I believe all of the above are the aspects that matters to you. The main
> reason i don't like creating a new driver infrastructure is that a lot
> of existing drivers will want to use some of the new features that are
> coming (memory topology, where to place process memory, pipeline devices,
> ...) and thus existing drivers are big (GPU drivers are the biggest of
> all the kernel drivers).
>

I think it is not necessarily to rewrite the GPU driver if they don't need to
share their space with others. But if they do, no matter how, they have to create
some facility similar to VFIO container. Then why not just create them in VFIO?

Actually, some GPUs have already used mdev to manage the resource by different
users, it is already part of VFIO.

> So rewritting those existing drivers into VFIO or into any new infra-
> structure so that they can leverage new features is a no go from my
> point of view.
>
> I would rather see a set of helpers so that each features can be use
> either by new drivers or existing drivers. For instance a new way to
> expose memory topology. A new way to expose how you can pipe devices
> from one to another ...
>
>
> Hence i do not see any value in a whole new infra-structure in which
> drivers must be part of to leverage new features.
>
> Cheers,
> Jérôme

--
-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!

2018-08-09 08:31:31

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

> From: Kenneth Lee [mailto:[email protected]]
> Sent: Thursday, August 9, 2018 4:04 PM
>
> But we have another requirement which is to combine some device
> together to
> share the same address space. This is a little like these kinds of solution:
>
> http://tce.technion.ac.il/wp-content/uploads/sites/8/2015/06/SC-7.2-M.-
> Silberstein.pdf
>
> With that, the application can directly pass the NiC packet pointer to the
> decryption accelerator, and get the bare data in place. This is the feature
> that
> the VFIO container can provide.

above is not a good argument, at least in the context of your discussion.
If each device has their own interface (similar to GPU) for process to bind
with, then having the process binding to multiple devices one-by-one then
you still get same address space shared cross them...

Thanks
Kevin

2018-08-09 14:46:13

by Jerome Glisse

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Thu, Aug 09, 2018 at 04:03:52PM +0800, Kenneth Lee wrote:
> On Wed, Aug 08, 2018 at 11:18:35AM -0400, Jerome Glisse wrote:
> > On Wed, Aug 08, 2018 at 09:08:42AM +0800, Kenneth Lee wrote:
> > > 在 2018年08月06日 星期一 11:32 下午, Jerome Glisse 写道:
> > > > On Mon, Aug 06, 2018 at 11:12:52AM +0800, Kenneth Lee wrote:
> > > > > On Fri, Aug 03, 2018 at 10:39:44AM -0400, Jerome Glisse wrote:
> > > > > > On Fri, Aug 03, 2018 at 11:47:21AM +0800, Kenneth Lee wrote:
> > > > > > > On Thu, Aug 02, 2018 at 10:22:43AM -0400, Jerome Glisse wrote:
> > > > > > > > On Thu, Aug 02, 2018 at 12:05:57PM +0800, Kenneth Lee wrote:

[...]

> > > > > > > > your mechanisms the userspace must have a specific userspace
> > > > > > > > drivers for each hardware and thus there are virtually no
> > > > > > > > differences between having this userspace driver open a device
> > > > > > > > file in vfio or somewhere else in the device filesystem. This is
> > > > > > > > just a different path.
> > > > > > > >
> > > > > > > The basic problem WarpDrive want to solve it to avoid syscall. This is important
> > > > > > > to accelerators. We have some data here:
> > > > > > > https://www.slideshare.net/linaroorg/progress-and-demonstration-of-wrapdrive-a-accelerator-framework-sfo17317
> > > > > > >
> > > > > > > (see page 3)
> > > > > > >
> > > > > > > The performance is different on using kernel and user drivers.
> > > > > > Yes and example i point to is exactly that. You have a one time setup
> > > > > > cost (creating command buffer binding PASID with command buffer and
> > > > > > couple other setup steps). Then userspace no longer have to do any
> > > > > > ioctl to schedule work on the GPU. It is all down from userspace and
> > > > > > it use a doorbell to notify hardware when it should go look at command
> > > > > > buffer for new thing to execute.
> > > > > >
> > > > > > My point stands on that. You have existing driver already doing so
> > > > > > with no new framework and in your scheme you need a userspace driver.
> > > > > > So i do not see the value add, using one path or the other in the
> > > > > > userspace driver is litteraly one line to change.
> > > > > >
> > > > > Sorry, I'd got confuse here. I partially agree that the user driver is
> > > > > redundance of kernel driver. (But for WarpDrive, the kernel driver is a full
> > > > > driver include all preparation and setup stuff for the hardware, the user driver
> > > > > is simply to send request and receive answer). Yes, it is just a choice of path.
> > > > > But the user path is faster if the request come from use space. And to do that,
> > > > > we need user land DMA support. Then why is it invaluable to let VFIO involved?
> > > > Some drivers in the kernel already do exactly what you said. The user
> > > > space emit commands without ever going into kernel by directly scheduling
> > > > commands and ringing a doorbell. They do not need VFIO either and they
> > > > can map userspace address into the DMA address space of the device and
> > > > again they do not need VFIO for that.
> > > Could you please directly point out which driver you refer to here? Thank
> > > you.
> >
> > drivers/gpu/drm/amd/
> >
> > Sub-directory of interest is amdkfd
> >
> > Because it is a big driver here is a highlevel overview of how it works
> > (this is a simplification):
> > - Process can allocate GPUs buffer (through ioclt) and map them into
> > its address space (through mmap of device file at buffer object
> > specific offset).
> > - Process can map any valid range of virtual address space into device
> > address space (IOMMU mapping). This must be regular memory ie not an
> > mmap of a device file or any special file (this is the non PASID
> > path)
> > - Process can create a command queue and bind its process to it aka
> > PASID, this is done through an ioctl.
> > - Process can schedule commands onto queues it created from userspace
> > without ioctl. For that it just write command into a ring buffer
> > that it mapped during the command queue creation process and it
> > rings a doorbell when commands are ready to be consume by the
> > hardware.
> > - Commands can reference (access) all 3 types of object above ie
> > either full GPUs buffer, process regular memory maped as object
> > (non PASID) and PASID memory all at the same time ie you can
> > mix all of the above in same commands queue.
> > - Kernel can evict, unbind any process command queues, unbind commands
> > queue are still valid from process point of view but commands
> > process schedules on them will not be executed until kernel re-bind
> > the queue.
> > - Kernel can schedule commands itself onto its dedicated command
> > queues (kernel driver create its own command queues).
> > - Kernel can control priorities between all the queues ie it can
> > decides which queues should the hardware executed first next.
> >
>
> Thank you. Now I think I understand the point. Indeed, I can see some drivers,
> such GPU and IB, attach their own iommu_domain to their iommu_group and do their
> own iommu_map().
>
> But we have another requirement which is to combine some device together to
> share the same address space. This is a little like these kinds of solution:
>
> http://tce.technion.ac.il/wp-content/uploads/sites/8/2015/06/SC-7.2-M.-Silberstein.pdf
>
> With that, the application can directly pass the NiC packet pointer to the
> decryption accelerator, and get the bare data in place. This is the feature that
> the VFIO container can provide.

Yes and GPU would very much like do the same. There is already out of
tree solution that allow NiC to stream into GPU memory or GPU to stream
its memory to a NiC. I am sure we will want to use more accelerator in
conjunction with GPU in the future.

>
> > I believe all of the above are the aspects that matters to you. The main
> > reason i don't like creating a new driver infrastructure is that a lot
> > of existing drivers will want to use some of the new features that are
> > coming (memory topology, where to place process memory, pipeline devices,
> > ...) and thus existing drivers are big (GPU drivers are the biggest of
> > all the kernel drivers).
> >
>
> I think it is not necessarily to rewrite the GPU driver if they don't need to
> share their space with others. But if they do, no matter how, they have to create
> some facility similar to VFIO container. Then why not just create them in VFIO?

No they do not, nor does anyone needs to. We already have that. If you want
device to share memory object you have either:
- PASID and everything is just easy no need to create anything, as
all valid virtual address will work
- no PASID or one of the device does not support PASID then use the
existing kernel infrastructure aka dma buffer see Documentation/
driver-api/dma-buf.rst

Everything you want to do is already happening upstream and they are allready
working example.

> Actually, some GPUs have already used mdev to manage the resource by different
> users, it is already part of VFIO.

The current use of mdev with GPU is to "emulate" the SR_IOV of PCIE in
software so that a single device can be share between multiple guests.
For this using VFIO make sense, as we want to expose device as a single
entity that can be manage without the userspace (QEMU) having to know
or learn about each individual devices.

QEMU just has a well define API to probe and attach device to guest.
It is the guest that have dedicated drivers for each of those mdev
devices.


What you are trying to do is recreate a whole driver API inside the
VFIO subsystem and i do not see any valid reasons for that. Moreover
you want to restrict future use to only drivers that are part of this
new driver subsystem and again i do not see any good reasons to
mandate any of the existing driver to be rewritten inside a new VFIO
infrastructure.


You can achieve everything you want to achieve with existing upstream
solution. Re-inventing a whole new driver infrastructure should really
be motivated with strong and obvious reasons.

Cheers,
Jérôme

2018-08-10 01:37:40

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Thu, Aug 09, 2018 at 08:31:31AM +0000, Tian, Kevin wrote:
> Date: Thu, 9 Aug 2018 08:31:31 +0000
> From: "Tian, Kevin" <[email protected]>
> To: Kenneth Lee <[email protected]>, Jerome Glisse <[email protected]>
> CC: Kenneth Lee <[email protected]>, Alex Williamson
> <[email protected]>, Herbert Xu <[email protected]>,
> "[email protected]" <[email protected]>, Jonathan Corbet
> <[email protected]>, Greg Kroah-Hartman <[email protected]>, Zaibo
> Xu <[email protected]>, "[email protected]"
> <[email protected]>, "Kumar, Sanjay K" <[email protected]>,
> Hao Fang <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, Philippe Ombredanne
> <[email protected]>, Thomas Gleixner <[email protected]>, "David S .
> Miller" <[email protected]>, "[email protected]"
> <[email protected]>
> Subject: RE: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
> Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D1912B39B3@SHSMSX101.ccr.corp.intel.com>
>
> > From: Kenneth Lee [mailto:[email protected]]
> > Sent: Thursday, August 9, 2018 4:04 PM
> >
> > But we have another requirement which is to combine some device
> > together to
> > share the same address space. This is a little like these kinds of solution:
> >
> > http://tce.technion.ac.il/wp-content/uploads/sites/8/2015/06/SC-7.2-M.-
> > Silberstein.pdf
> >
> > With that, the application can directly pass the NiC packet pointer to the
> > decryption accelerator, and get the bare data in place. This is the feature
> > that
> > the VFIO container can provide.
>
> above is not a good argument, at least in the context of your discussion.
> If each device has their own interface (similar to GPU) for process to bind
> with, then having the process binding to multiple devices one-by-one then
> you still get same address space shared cross them...

If we consider this from the VFIO container perspective, with a container, you
can do DMA to the container applying it to all devices, even the device is added
after the DMA operation.

So your argument remains true only when SVM is enabled and the whole process space
is devoted to the devices.

Yes, the process can do the same all by itself. But if we agree with that, it
makes no sense to keep the container concept in VFIO;)

>
> Thanks
> Kevin

--
-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!

2018-08-10 03:39:13

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Thu, Aug 09, 2018 at 10:46:13AM -0400, Jerome Glisse wrote:
> Date: Thu, 9 Aug 2018 10:46:13 -0400
> From: Jerome Glisse <[email protected]>
> To: Kenneth Lee <[email protected]>
> CC: Kenneth Lee <[email protected]>, "Tian, Kevin"
> <[email protected]>, Alex Williamson <[email protected]>,
> Herbert Xu <[email protected]>, "[email protected]"
> <[email protected]>, Jonathan Corbet <[email protected]>, Greg
> Kroah-Hartman <[email protected]>, Zaibo Xu <[email protected]>,
> "[email protected]" <[email protected]>, "Kumar, Sanjay K"
> <[email protected]>, Hao Fang <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>, Philippe
> Ombredanne <[email protected]>, Thomas Gleixner <[email protected]>,
> "David S . Miller" <[email protected]>,
> "[email protected]"
> <[email protected]>
> Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
> User-Agent: Mutt/1.10.0 (2018-05-17)
> Message-ID: <[email protected]>
>
> On Thu, Aug 09, 2018 at 04:03:52PM +0800, Kenneth Lee wrote:
> > On Wed, Aug 08, 2018 at 11:18:35AM -0400, Jerome Glisse wrote:
> > > On Wed, Aug 08, 2018 at 09:08:42AM +0800, Kenneth Lee wrote:
> > > > 在 2018年08月06日 星期一 11:32 下午, Jerome Glisse 写道:
> > > > > On Mon, Aug 06, 2018 at 11:12:52AM +0800, Kenneth Lee wrote:
> > > > > > On Fri, Aug 03, 2018 at 10:39:44AM -0400, Jerome Glisse wrote:
> > > > > > > On Fri, Aug 03, 2018 at 11:47:21AM +0800, Kenneth Lee wrote:
> > > > > > > > On Thu, Aug 02, 2018 at 10:22:43AM -0400, Jerome Glisse wrote:
> > > > > > > > > On Thu, Aug 02, 2018 at 12:05:57PM +0800, Kenneth Lee wrote:
>
> [...]
>
> > > > > > > > > your mechanisms the userspace must have a specific userspace
> > > > > > > > > drivers for each hardware and thus there are virtually no
> > > > > > > > > differences between having this userspace driver open a device
> > > > > > > > > file in vfio or somewhere else in the device filesystem. This is
> > > > > > > > > just a different path.
> > > > > > > > >
> > > > > > > > The basic problem WarpDrive want to solve it to avoid syscall. This is important
> > > > > > > > to accelerators. We have some data here:
> > > > > > > > https://www.slideshare.net/linaroorg/progress-and-demonstration-of-wrapdrive-a-accelerator-framework-sfo17317
> > > > > > > >
> > > > > > > > (see page 3)
> > > > > > > >
> > > > > > > > The performance is different on using kernel and user drivers.
> > > > > > > Yes and example i point to is exactly that. You have a one time setup
> > > > > > > cost (creating command buffer binding PASID with command buffer and
> > > > > > > couple other setup steps). Then userspace no longer have to do any
> > > > > > > ioctl to schedule work on the GPU. It is all down from userspace and
> > > > > > > it use a doorbell to notify hardware when it should go look at command
> > > > > > > buffer for new thing to execute.
> > > > > > >
> > > > > > > My point stands on that. You have existing driver already doing so
> > > > > > > with no new framework and in your scheme you need a userspace driver.
> > > > > > > So i do not see the value add, using one path or the other in the
> > > > > > > userspace driver is litteraly one line to change.
> > > > > > >
> > > > > > Sorry, I'd got confuse here. I partially agree that the user driver is
> > > > > > redundance of kernel driver. (But for WarpDrive, the kernel driver is a full
> > > > > > driver include all preparation and setup stuff for the hardware, the user driver
> > > > > > is simply to send request and receive answer). Yes, it is just a choice of path.
> > > > > > But the user path is faster if the request come from use space. And to do that,
> > > > > > we need user land DMA support. Then why is it invaluable to let VFIO involved?
> > > > > Some drivers in the kernel already do exactly what you said. The user
> > > > > space emit commands without ever going into kernel by directly scheduling
> > > > > commands and ringing a doorbell. They do not need VFIO either and they
> > > > > can map userspace address into the DMA address space of the device and
> > > > > again they do not need VFIO for that.
> > > > Could you please directly point out which driver you refer to here? Thank
> > > > you.
> > >
> > > drivers/gpu/drm/amd/
> > >
> > > Sub-directory of interest is amdkfd
> > >
> > > Because it is a big driver here is a highlevel overview of how it works
> > > (this is a simplification):
> > > - Process can allocate GPUs buffer (through ioclt) and map them into
> > > its address space (through mmap of device file at buffer object
> > > specific offset).
> > > - Process can map any valid range of virtual address space into device
> > > address space (IOMMU mapping). This must be regular memory ie not an
> > > mmap of a device file or any special file (this is the non PASID
> > > path)
> > > - Process can create a command queue and bind its process to it aka
> > > PASID, this is done through an ioctl.
> > > - Process can schedule commands onto queues it created from userspace
> > > without ioctl. For that it just write command into a ring buffer
> > > that it mapped during the command queue creation process and it
> > > rings a doorbell when commands are ready to be consume by the
> > > hardware.
> > > - Commands can reference (access) all 3 types of object above ie
> > > either full GPUs buffer, process regular memory maped as object
> > > (non PASID) and PASID memory all at the same time ie you can
> > > mix all of the above in same commands queue.
> > > - Kernel can evict, unbind any process command queues, unbind commands
> > > queue are still valid from process point of view but commands
> > > process schedules on them will not be executed until kernel re-bind
> > > the queue.
> > > - Kernel can schedule commands itself onto its dedicated command
> > > queues (kernel driver create its own command queues).
> > > - Kernel can control priorities between all the queues ie it can
> > > decides which queues should the hardware executed first next.
> > >
> >
> > Thank you. Now I think I understand the point. Indeed, I can see some drivers,
> > such GPU and IB, attach their own iommu_domain to their iommu_group and do their
> > own iommu_map().
> >
> > But we have another requirement which is to combine some device together to
> > share the same address space. This is a little like these kinds of solution:
> >
> > http://tce.technion.ac.il/wp-content/uploads/sites/8/2015/06/SC-7.2-M.-Silberstein.pdf
> >
> > With that, the application can directly pass the NiC packet pointer to the
> > decryption accelerator, and get the bare data in place. This is the feature that
> > the VFIO container can provide.
>
> Yes and GPU would very much like do the same. There is already out of
> tree solution that allow NiC to stream into GPU memory or GPU to stream
> its memory to a NiC. I am sure we will want to use more accelerator in
> conjunction with GPU in the future.
>
> >
> > > I believe all of the above are the aspects that matters to you. The main
> > > reason i don't like creating a new driver infrastructure is that a lot
> > > of existing drivers will want to use some of the new features that are
> > > coming (memory topology, where to place process memory, pipeline devices,
> > > ...) and thus existing drivers are big (GPU drivers are the biggest of
> > > all the kernel drivers).
> > >
> >
> > I think it is not necessarily to rewrite the GPU driver if they don't need to
> > share their space with others. But if they do, no matter how, they have to create
> > some facility similar to VFIO container. Then why not just create them in VFIO?
>
> No they do not, nor does anyone needs to. We already have that. If you want
> device to share memory object you have either:
> - PASID and everything is just easy no need to create anything, as
> all valid virtual address will work
> - no PASID or one of the device does not support PASID then use the
> existing kernel infrastructure aka dma buffer see Documentation/
> driver-api/dma-buf.rst
>
> Everything you want to do is already happening upstream and they are allready
> working example.

OK, I accept it.

>
> > Actually, some GPUs have already used mdev to manage the resource by different
> > users, it is already part of VFIO.
>
> The current use of mdev with GPU is to "emulate" the SR_IOV of PCIE in
> software so that a single device can be share between multiple guests.
> For this using VFIO make sense, as we want to expose device as a single
> entity that can be manage without the userspace (QEMU) having to know
> or learn about each individual devices.
>
> QEMU just has a well define API to probe and attach device to guest.
> It is the guest that have dedicated drivers for each of those mdev
> devices.
>

Agree

>
> What you are trying to do is recreate a whole driver API inside the
> VFIO subsystem and i do not see any valid reasons for that. Moreover
> you want to restrict future use to only drivers that are part of this
> new driver subsystem and again i do not see any good reasons to
> mandate any of the existing driver to be rewritten inside a new VFIO
> infrastructure.
>

I think you have persuaded me now. I thought VFIO was the only unified place to do
user land DMA operations. So I tried to make use of it even there was facility I
needed not.

Thank you very much for the help.

>
> You can achieve everything you want to achieve with existing upstream
> solution. Re-inventing a whole new driver infrastructure should really
> be motivated with strong and obvious reasons.

I want to understand better of your idea. If I create some unified helper
APIs in drivers/iommu/, say:

wd_create_dev(parent_dev, wd_dev)
wd_release_dev(wd_dev)

The API create chrdev to take request from user space for open(resource
allocation), iomap, epoll (irq), and dma_map(with pasid automatically).

Do you think it is acceptable?

I agree that these all can be done with the driver itself. But without a unified
interface, it is hard to create a user land ecosystem for accelerator to
cooperate.

>
> Cheers,
> Jérôme

--
-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

2018-08-10 13:12:37

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

Hi Kenneth,

On 10/08/18 04:39, Kenneth Lee wrote:
>> You can achieve everything you want to achieve with existing upstream
>> solution. Re-inventing a whole new driver infrastructure should really
>> be motivated with strong and obvious reasons.
>
> I want to understand better of your idea. If I create some unified helper
> APIs in drivers/iommu/, say:
>
> wd_create_dev(parent_dev, wd_dev)
> wd_release_dev(wd_dev)
>
> The API create chrdev to take request from user space for open(resource
> allocation), iomap, epoll (irq), and dma_map(with pasid automatically).
>
> Do you think it is acceptable?

Maybe not drivers/iommu/ :) That subsystem only contains tools for
dealing with DMA, I don't think epoll, resource enumeration or iomap fit
in there.

Creating new helpers seems to be precisely what we're trying to avoid in
this thread, and vfio-mdev does provide the components that you
describe, so I wouldn't discard it right away. When the GPU, net, block
or another subsystem doesn't fit your needs, either because your
accelerator provides some specialized function, or because for
performance reasons your client wants direct MMIO access, you can at
least build your driver and library on top of those existing VFIO
components:

* open allocates a partition of an accelerator.
* vfio_device_info, vfio_region_info and vfio_irq_info enumerates
available resources.
* vfio_irq_set deals with epoll.
* mmap gives you a private MMIO doorbell.
* vfio_iommu_type1 provides the DMA operations.

Currently missing:

* Sharing the parent IOMMU between mdev, which is also what the "IOMMU
aware mediated device" series tackles, and seems like a logical addition
to VFIO. I'd argue that the existing IOMMU ops (or ones implemented by
the SVA series) can be used to deal with this

* The interface to discover an accelerator near your memory node, or one
that you can chain with other devices. If I understood correctly the
conclusion was that the API (a topology description in sysfs?) should be
common to various subsystems, in which case vfio-mdev (or the mediating
driver) could also use it.

* The queue abstraction discussed on patch 3/7. Perhaps the current vfio
resource description of MMIO and IRQ is sufficient here as well, since
vendors tend to each implement their own queue schemes. If you need
additional features, read/write fops give the mediating driver a lot of
freedom. To support features that are too specific for drivers/vfio/ you
can implement a config space with capabilities and registers of your
choice. If you're versioning the capabilities, the code to handle them
could even be shared between different accelerator drivers and libraries.

Thanks,
Jean

2018-08-10 14:32:31

by Jerome Glisse

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Fri, Aug 10, 2018 at 11:39:13AM +0800, Kenneth Lee wrote:
> On Thu, Aug 09, 2018 at 10:46:13AM -0400, Jerome Glisse wrote:
> > Date: Thu, 9 Aug 2018 10:46:13 -0400
> > From: Jerome Glisse <[email protected]>
> > To: Kenneth Lee <[email protected]>
> > CC: Kenneth Lee <[email protected]>, "Tian, Kevin"
> > <[email protected]>, Alex Williamson <[email protected]>,
> > Herbert Xu <[email protected]>, "[email protected]"
> > <[email protected]>, Jonathan Corbet <[email protected]>, Greg
> > Kroah-Hartman <[email protected]>, Zaibo Xu <[email protected]>,
> > "[email protected]" <[email protected]>, "Kumar, Sanjay K"
> > <[email protected]>, Hao Fang <[email protected]>,
> > "[email protected]" <[email protected]>,
> > "[email protected]" <[email protected]>,
> > "[email protected]" <[email protected]>,
> > "[email protected]" <[email protected]>, Philippe
> > Ombredanne <[email protected]>, Thomas Gleixner <[email protected]>,
> > "David S . Miller" <[email protected]>,
> > "[email protected]"
> > <[email protected]>
> > Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
> > User-Agent: Mutt/1.10.0 (2018-05-17)
> > Message-ID: <[email protected]>
> >
> > On Thu, Aug 09, 2018 at 04:03:52PM +0800, Kenneth Lee wrote:
> > > On Wed, Aug 08, 2018 at 11:18:35AM -0400, Jerome Glisse wrote:
> > > > On Wed, Aug 08, 2018 at 09:08:42AM +0800, Kenneth Lee wrote:
> > > > > 在 2018年08月06日 星期一 11:32 下午, Jerome Glisse 写道:
> > > > > > On Mon, Aug 06, 2018 at 11:12:52AM +0800, Kenneth Lee wrote:
> > > > > > > On Fri, Aug 03, 2018 at 10:39:44AM -0400, Jerome Glisse wrote:
> > > > > > > > On Fri, Aug 03, 2018 at 11:47:21AM +0800, Kenneth Lee wrote:
> > > > > > > > > On Thu, Aug 02, 2018 at 10:22:43AM -0400, Jerome Glisse wrote:
> > > > > > > > > > On Thu, Aug 02, 2018 at 12:05:57PM +0800, Kenneth Lee wrote:
> >
> > [...]
> >
> > > > > > > > > > your mechanisms the userspace must have a specific userspace
> > > > > > > > > > drivers for each hardware and thus there are virtually no
> > > > > > > > > > differences between having this userspace driver open a device
> > > > > > > > > > file in vfio or somewhere else in the device filesystem. This is
> > > > > > > > > > just a different path.
> > > > > > > > > >
> > > > > > > > > The basic problem WarpDrive want to solve it to avoid syscall. This is important
> > > > > > > > > to accelerators. We have some data here:
> > > > > > > > > https://www.slideshare.net/linaroorg/progress-and-demonstration-of-wrapdrive-a-accelerator-framework-sfo17317
> > > > > > > > >
> > > > > > > > > (see page 3)
> > > > > > > > >
> > > > > > > > > The performance is different on using kernel and user drivers.
> > > > > > > > Yes and example i point to is exactly that. You have a one time setup
> > > > > > > > cost (creating command buffer binding PASID with command buffer and
> > > > > > > > couple other setup steps). Then userspace no longer have to do any
> > > > > > > > ioctl to schedule work on the GPU. It is all down from userspace and
> > > > > > > > it use a doorbell to notify hardware when it should go look at command
> > > > > > > > buffer for new thing to execute.
> > > > > > > >
> > > > > > > > My point stands on that. You have existing driver already doing so
> > > > > > > > with no new framework and in your scheme you need a userspace driver.
> > > > > > > > So i do not see the value add, using one path or the other in the
> > > > > > > > userspace driver is litteraly one line to change.
> > > > > > > >
> > > > > > > Sorry, I'd got confuse here. I partially agree that the user driver is
> > > > > > > redundance of kernel driver. (But for WarpDrive, the kernel driver is a full
> > > > > > > driver include all preparation and setup stuff for the hardware, the user driver
> > > > > > > is simply to send request and receive answer). Yes, it is just a choice of path.
> > > > > > > But the user path is faster if the request come from use space. And to do that,
> > > > > > > we need user land DMA support. Then why is it invaluable to let VFIO involved?
> > > > > > Some drivers in the kernel already do exactly what you said. The user
> > > > > > space emit commands without ever going into kernel by directly scheduling
> > > > > > commands and ringing a doorbell. They do not need VFIO either and they
> > > > > > can map userspace address into the DMA address space of the device and
> > > > > > again they do not need VFIO for that.
> > > > > Could you please directly point out which driver you refer to here? Thank
> > > > > you.
> > > >
> > > > drivers/gpu/drm/amd/
> > > >
> > > > Sub-directory of interest is amdkfd
> > > >
> > > > Because it is a big driver here is a highlevel overview of how it works
> > > > (this is a simplification):
> > > > - Process can allocate GPUs buffer (through ioclt) and map them into
> > > > its address space (through mmap of device file at buffer object
> > > > specific offset).
> > > > - Process can map any valid range of virtual address space into device
> > > > address space (IOMMU mapping). This must be regular memory ie not an
> > > > mmap of a device file or any special file (this is the non PASID
> > > > path)
> > > > - Process can create a command queue and bind its process to it aka
> > > > PASID, this is done through an ioctl.
> > > > - Process can schedule commands onto queues it created from userspace
> > > > without ioctl. For that it just write command into a ring buffer
> > > > that it mapped during the command queue creation process and it
> > > > rings a doorbell when commands are ready to be consume by the
> > > > hardware.
> > > > - Commands can reference (access) all 3 types of object above ie
> > > > either full GPUs buffer, process regular memory maped as object
> > > > (non PASID) and PASID memory all at the same time ie you can
> > > > mix all of the above in same commands queue.
> > > > - Kernel can evict, unbind any process command queues, unbind commands
> > > > queue are still valid from process point of view but commands
> > > > process schedules on them will not be executed until kernel re-bind
> > > > the queue.
> > > > - Kernel can schedule commands itself onto its dedicated command
> > > > queues (kernel driver create its own command queues).
> > > > - Kernel can control priorities between all the queues ie it can
> > > > decides which queues should the hardware executed first next.
> > > >
> > >
> > > Thank you. Now I think I understand the point. Indeed, I can see some drivers,
> > > such GPU and IB, attach their own iommu_domain to their iommu_group and do their
> > > own iommu_map().
> > >
> > > But we have another requirement which is to combine some device together to
> > > share the same address space. This is a little like these kinds of solution:
> > >
> > > http://tce.technion.ac.il/wp-content/uploads/sites/8/2015/06/SC-7.2-M.-Silberstein.pdf
> > >
> > > With that, the application can directly pass the NiC packet pointer to the
> > > decryption accelerator, and get the bare data in place. This is the feature that
> > > the VFIO container can provide.
> >
> > Yes and GPU would very much like do the same. There is already out of
> > tree solution that allow NiC to stream into GPU memory or GPU to stream
> > its memory to a NiC. I am sure we will want to use more accelerator in
> > conjunction with GPU in the future.
> >
> > >
> > > > I believe all of the above are the aspects that matters to you. The main
> > > > reason i don't like creating a new driver infrastructure is that a lot
> > > > of existing drivers will want to use some of the new features that are
> > > > coming (memory topology, where to place process memory, pipeline devices,
> > > > ...) and thus existing drivers are big (GPU drivers are the biggest of
> > > > all the kernel drivers).
> > > >
> > >
> > > I think it is not necessarily to rewrite the GPU driver if they don't need to
> > > share their space with others. But if they do, no matter how, they have to create
> > > some facility similar to VFIO container. Then why not just create them in VFIO?
> >
> > No they do not, nor does anyone needs to. We already have that. If you want
> > device to share memory object you have either:
> > - PASID and everything is just easy no need to create anything, as
> > all valid virtual address will work
> > - no PASID or one of the device does not support PASID then use the
> > existing kernel infrastructure aka dma buffer see Documentation/
> > driver-api/dma-buf.rst
> >
> > Everything you want to do is already happening upstream and they are allready
> > working example.
>
> OK, I accept it.
>
> >
> > > Actually, some GPUs have already used mdev to manage the resource by different
> > > users, it is already part of VFIO.
> >
> > The current use of mdev with GPU is to "emulate" the SR_IOV of PCIE in
> > software so that a single device can be share between multiple guests.
> > For this using VFIO make sense, as we want to expose device as a single
> > entity that can be manage without the userspace (QEMU) having to know
> > or learn about each individual devices.
> >
> > QEMU just has a well define API to probe and attach device to guest.
> > It is the guest that have dedicated drivers for each of those mdev
> > devices.
> >
>
> Agree
>
> >
> > What you are trying to do is recreate a whole driver API inside the
> > VFIO subsystem and i do not see any valid reasons for that. Moreover
> > you want to restrict future use to only drivers that are part of this
> > new driver subsystem and again i do not see any good reasons to
> > mandate any of the existing driver to be rewritten inside a new VFIO
> > infrastructure.
> >
>
> I think you have persuaded me now. I thought VFIO was the only unified place to do
> user land DMA operations. So I tried to make use of it even there was facility I
> needed not.
>
> Thank you very much for the help.
>
> >
> > You can achieve everything you want to achieve with existing upstream
> > solution. Re-inventing a whole new driver infrastructure should really
> > be motivated with strong and obvious reasons.
>
> I want to understand better of your idea. If I create some unified helper
> APIs in drivers/iommu/, say:
>
> wd_create_dev(parent_dev, wd_dev)
> wd_release_dev(wd_dev)
>
> The API create chrdev to take request from user space for open(resource
> allocation), iomap, epoll (irq), and dma_map(with pasid automatically).
>
> Do you think it is acceptable?
>
> I agree that these all can be done with the driver itself. But without a unified
> interface, it is hard to create a user land ecosystem for accelerator to
> cooperate.

I am fine with creating a new sub-system for pipelining devices (ie some
place where it is easy for userspace to get list of all devices that can
fit in such model). I am not sure if we want to have everything part of
it (ie DMA mapping of memory and pipelining as same thing) i tend to be
more unix oriented one tool for one job. But maybe it makes sense, this
would likely avoid having each device driver have their own code to
create dma buffer or track SVM pointer.

In any case for the non SVM case i would suggest to use dma buffer, it
has been widely use, it is well tested in many scenarios. Moreover it
would allow many existing device driver to register themself in this
new framework with minimal changes (couple hundred lines).

Cheers,
Jérôme

2018-08-11 14:44:13

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive



在 2018年08月10日 星期五 10:32 下午, Jerome Glisse 写道:
> On Fri, Aug 10, 2018 at 11:39:13AM +0800, Kenneth Lee wrote:
>> On Thu, Aug 09, 2018 at 10:46:13AM -0400, Jerome Glisse wrote:
>>> Date: Thu, 9 Aug 2018 10:46:13 -0400
>>> From: Jerome Glisse <[email protected]>
>>> To: Kenneth Lee <[email protected]>
>>> CC: Kenneth Lee <[email protected]>, "Tian, Kevin"
>>> <[email protected]>, Alex Williamson <[email protected]>,
>>> Herbert Xu <[email protected]>, "[email protected]"
>>> <[email protected]>, Jonathan Corbet <[email protected]>, Greg
>>> Kroah-Hartman <[email protected]>, Zaibo Xu <[email protected]>,
>>> "[email protected]" <[email protected]>, "Kumar, Sanjay K"
>>> <[email protected]>, Hao Fang <[email protected]>,
>>> "[email protected]" <[email protected]>,
>>> "[email protected]" <[email protected]>,
>>> "[email protected]" <[email protected]>,
>>> "[email protected]" <[email protected]>, Philippe
>>> Ombredanne <[email protected]>, Thomas Gleixner <[email protected]>,
>>> "David S . Miller" <[email protected]>,
>>> "[email protected]"
>>> <[email protected]>
>>> Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
>>> User-Agent: Mutt/1.10.0 (2018-05-17)
>>> Message-ID: <[email protected]>
>>>
>>> On Thu, Aug 09, 2018 at 04:03:52PM +0800, Kenneth Lee wrote:
>>>> On Wed, Aug 08, 2018 at 11:18:35AM -0400, Jerome Glisse wrote:
>>>>> On Wed, Aug 08, 2018 at 09:08:42AM +0800, Kenneth Lee wrote:
>>>>>> 在 2018年08月06日 星期一 11:32 下午, Jerome Glisse 写道:
>>>>>>> On Mon, Aug 06, 2018 at 11:12:52AM +0800, Kenneth Lee wrote:
>>>>>>>> On Fri, Aug 03, 2018 at 10:39:44AM -0400, Jerome Glisse wrote:
>>>>>>>>> On Fri, Aug 03, 2018 at 11:47:21AM +0800, Kenneth Lee wrote:
>>>>>>>>>> On Thu, Aug 02, 2018 at 10:22:43AM -0400, Jerome Glisse wrote:
>>>>>>>>>>> On Thu, Aug 02, 2018 at 12:05:57PM +0800, Kenneth Lee wrote:
>>> [...]
>>>
>>>>>>>>>>> your mechanisms the userspace must have a specific userspace
>>>>>>>>>>> drivers for each hardware and thus there are virtually no
>>>>>>>>>>> differences between having this userspace driver open a device
>>>>>>>>>>> file in vfio or somewhere else in the device filesystem. This is
>>>>>>>>>>> just a different path.
>>>>>>>>>>>
>>>>>>>>>> The basic problem WarpDrive want to solve it to avoid syscall. This is important
>>>>>>>>>> to accelerators. We have some data here:
>>>>>>>>>> https://www.slideshare.net/linaroorg/progress-and-demonstration-of-wrapdrive-a-accelerator-framework-sfo17317
>>>>>>>>>>
>>>>>>>>>> (see page 3)
>>>>>>>>>>
>>>>>>>>>> The performance is different on using kernel and user drivers.
>>>>>>>>> Yes and example i point to is exactly that. You have a one time setup
>>>>>>>>> cost (creating command buffer binding PASID with command buffer and
>>>>>>>>> couple other setup steps). Then userspace no longer have to do any
>>>>>>>>> ioctl to schedule work on the GPU. It is all down from userspace and
>>>>>>>>> it use a doorbell to notify hardware when it should go look at command
>>>>>>>>> buffer for new thing to execute.
>>>>>>>>>
>>>>>>>>> My point stands on that. You have existing driver already doing so
>>>>>>>>> with no new framework and in your scheme you need a userspace driver.
>>>>>>>>> So i do not see the value add, using one path or the other in the
>>>>>>>>> userspace driver is litteraly one line to change.
>>>>>>>>>
>>>>>>>> Sorry, I'd got confuse here. I partially agree that the user driver is
>>>>>>>> redundance of kernel driver. (But for WarpDrive, the kernel driver is a full
>>>>>>>> driver include all preparation and setup stuff for the hardware, the user driver
>>>>>>>> is simply to send request and receive answer). Yes, it is just a choice of path.
>>>>>>>> But the user path is faster if the request come from use space. And to do that,
>>>>>>>> we need user land DMA support. Then why is it invaluable to let VFIO involved?
>>>>>>> Some drivers in the kernel already do exactly what you said. The user
>>>>>>> space emit commands without ever going into kernel by directly scheduling
>>>>>>> commands and ringing a doorbell. They do not need VFIO either and they
>>>>>>> can map userspace address into the DMA address space of the device and
>>>>>>> again they do not need VFIO for that.
>>>>>> Could you please directly point out which driver you refer to here? Thank
>>>>>> you.
>>>>> drivers/gpu/drm/amd/
>>>>>
>>>>> Sub-directory of interest is amdkfd
>>>>>
>>>>> Because it is a big driver here is a highlevel overview of how it works
>>>>> (this is a simplification):
>>>>> - Process can allocate GPUs buffer (through ioclt) and map them into
>>>>> its address space (through mmap of device file at buffer object
>>>>> specific offset).
>>>>> - Process can map any valid range of virtual address space into device
>>>>> address space (IOMMU mapping). This must be regular memory ie not an
>>>>> mmap of a device file or any special file (this is the non PASID
>>>>> path)
>>>>> - Process can create a command queue and bind its process to it aka
>>>>> PASID, this is done through an ioctl.
>>>>> - Process can schedule commands onto queues it created from userspace
>>>>> without ioctl. For that it just write command into a ring buffer
>>>>> that it mapped during the command queue creation process and it
>>>>> rings a doorbell when commands are ready to be consume by the
>>>>> hardware.
>>>>> - Commands can reference (access) all 3 types of object above ie
>>>>> either full GPUs buffer, process regular memory maped as object
>>>>> (non PASID) and PASID memory all at the same time ie you can
>>>>> mix all of the above in same commands queue.
>>>>> - Kernel can evict, unbind any process command queues, unbind commands
>>>>> queue are still valid from process point of view but commands
>>>>> process schedules on them will not be executed until kernel re-bind
>>>>> the queue.
>>>>> - Kernel can schedule commands itself onto its dedicated command
>>>>> queues (kernel driver create its own command queues).
>>>>> - Kernel can control priorities between all the queues ie it can
>>>>> decides which queues should the hardware executed first next.
>>>>>
>>>> Thank you. Now I think I understand the point. Indeed, I can see some drivers,
>>>> such GPU and IB, attach their own iommu_domain to their iommu_group and do their
>>>> own iommu_map().
>>>>
>>>> But we have another requirement which is to combine some device together to
>>>> share the same address space. This is a little like these kinds of solution:
>>>>
>>>> http://tce.technion.ac.il/wp-content/uploads/sites/8/2015/06/SC-7.2-M.-Silberstein.pdf
>>>>
>>>> With that, the application can directly pass the NiC packet pointer to the
>>>> decryption accelerator, and get the bare data in place. This is the feature that
>>>> the VFIO container can provide.
>>> Yes and GPU would very much like do the same. There is already out of
>>> tree solution that allow NiC to stream into GPU memory or GPU to stream
>>> its memory to a NiC. I am sure we will want to use more accelerator in
>>> conjunction with GPU in the future.
>>>
>>>>> I believe all of the above are the aspects that matters to you. The main
>>>>> reason i don't like creating a new driver infrastructure is that a lot
>>>>> of existing drivers will want to use some of the new features that are
>>>>> coming (memory topology, where to place process memory, pipeline devices,
>>>>> ...) and thus existing drivers are big (GPU drivers are the biggest of
>>>>> all the kernel drivers).
>>>>>
>>>> I think it is not necessarily to rewrite the GPU driver if they don't need to
>>>> share their space with others. But if they do, no matter how, they have to create
>>>> some facility similar to VFIO container. Then why not just create them in VFIO?
>>> No they do not, nor does anyone needs to. We already have that. If you want
>>> device to share memory object you have either:
>>> - PASID and everything is just easy no need to create anything, as
>>> all valid virtual address will work
>>> - no PASID or one of the device does not support PASID then use the
>>> existing kernel infrastructure aka dma buffer see Documentation/
>>> driver-api/dma-buf.rst
>>>
>>> Everything you want to do is already happening upstream and they are allready
>>> working example.
>> OK, I accept it.
>>
>>>> Actually, some GPUs have already used mdev to manage the resource by different
>>>> users, it is already part of VFIO.
>>> The current use of mdev with GPU is to "emulate" the SR_IOV of PCIE in
>>> software so that a single device can be share between multiple guests.
>>> For this using VFIO make sense, as we want to expose device as a single
>>> entity that can be manage without the userspace (QEMU) having to know
>>> or learn about each individual devices.
>>>
>>> QEMU just has a well define API to probe and attach device to guest.
>>> It is the guest that have dedicated drivers for each of those mdev
>>> devices.
>>>
>> Agree
>>
>>> What you are trying to do is recreate a whole driver API inside the
>>> VFIO subsystem and i do not see any valid reasons for that. Moreover
>>> you want to restrict future use to only drivers that are part of this
>>> new driver subsystem and again i do not see any good reasons to
>>> mandate any of the existing driver to be rewritten inside a new VFIO
>>> infrastructure.
>>>
>> I think you have persuaded me now. I thought VFIO was the only unified place to do
>> user land DMA operations. So I tried to make use of it even there was facility I
>> needed not.
>>
>> Thank you very much for the help.
>>
>>> You can achieve everything you want to achieve with existing upstream
>>> solution. Re-inventing a whole new driver infrastructure should really
>>> be motivated with strong and obvious reasons.
>> I want to understand better of your idea. If I create some unified helper
>> APIs in drivers/iommu/, say:
>>
>> wd_create_dev(parent_dev, wd_dev)
>> wd_release_dev(wd_dev)
>>
>> The API create chrdev to take request from user space for open(resource
>> allocation), iomap, epoll (irq), and dma_map(with pasid automatically).
>>
>> Do you think it is acceptable?
>>
>> I agree that these all can be done with the driver itself. But without a unified
>> interface, it is hard to create a user land ecosystem for accelerator to
>> cooperate.
> I am fine with creating a new sub-system for pipelining devices (ie some
> place where it is easy for userspace to get list of all devices that can
> fit in such model). I am not sure if we want to have everything part of
> it (ie DMA mapping of memory and pipelining as same thing) i tend to be
> more unix oriented one tool for one job. But maybe it makes sense, this
> would likely avoid having each device driver have their own code to
> create dma buffer or track SVM pointer.
Yes. The subsystem will become the point to bind the pasid and the
allocated resource between the hardware and the process. The io space
mapping and command queuing can always be done by the original device
driver itself. The subsystem can remain in resource (hardware resource
and iommu resource) management.
>
> In any case for the non SVM case i would suggest to use dma buffer, it
> has been widely use, it is well tested in many scenarios. Moreover it
> would allow many existing device driver to register themself in this
> new framework with minimal changes (couple hundred lines).
I will consider it. But maybe I should solve the SVM case first.

Thanks again. You suggestion is very helpful.
> Cheers,
> Jérôme
>

2018-08-11 15:26:48

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive



在 2018年08月10日 星期五 09:12 下午, Jean-Philippe Brucker 写道:
> Hi Kenneth,
>
> On 10/08/18 04:39, Kenneth Lee wrote:
>>> You can achieve everything you want to achieve with existing upstream
>>> solution. Re-inventing a whole new driver infrastructure should really
>>> be motivated with strong and obvious reasons.
>> I want to understand better of your idea. If I create some unified helper
>> APIs in drivers/iommu/, say:
>>
>> wd_create_dev(parent_dev, wd_dev)
>> wd_release_dev(wd_dev)
>>
>> The API create chrdev to take request from user space for open(resource
>> allocation), iomap, epoll (irq), and dma_map(with pasid automatically).
>>
>> Do you think it is acceptable?
> Maybe not drivers/iommu/ :) That subsystem only contains tools for
> dealing with DMA, I don't think epoll, resource enumeration or iomap fit
> in there.
Yes. I should consider where to put it carefully.
>
> Creating new helpers seems to be precisely what we're trying to avoid in
> this thread, and vfio-mdev does provide the components that you
> describe, so I wouldn't discard it right away. When the GPU, net, block
> or another subsystem doesn't fit your needs, either because your
> accelerator provides some specialized function, or because for
> performance reasons your client wants direct MMIO access, you can at
> least build your driver and library on top of those existing VFIO
> components:
>
> * open allocates a partition of an accelerator.
> * vfio_device_info, vfio_region_info and vfio_irq_info enumerates
> available resources.
> * vfio_irq_set deals with epoll.
> * mmap gives you a private MMIO doorbell.
> * vfio_iommu_type1 provides the DMA operations.
>
> Currently missing:
>
> * Sharing the parent IOMMU between mdev, which is also what the "IOMMU
> aware mediated device" series tackles, and seems like a logical addition
> to VFIO. I'd argue that the existing IOMMU ops (or ones implemented by
> the SVA series) can be used to deal with this
>
> * The interface to discover an accelerator near your memory node, or one
> that you can chain with other devices. If I understood correctly the
> conclusion was that the API (a topology description in sysfs?) should be
> common to various subsystems, in which case vfio-mdev (or the mediating
> driver) could also use it.
>
> * The queue abstraction discussed on patch 3/7. Perhaps the current vfio
> resource description of MMIO and IRQ is sufficient here as well, since
> vendors tend to each implement their own queue schemes. If you need
> additional features, read/write fops give the mediating driver a lot of
> freedom. To support features that are too specific for drivers/vfio/ you
> can implement a config space with capabilities and registers of your
> choice. If you're versioning the capabilities, the code to handle them
> could even be shared between different accelerator drivers and libraries.
Thank you, Jean,

The major reason that I want to remove dependency to VFIO is: I accepted
that the whole logic of VFIO was built on the idea of creating virtual
device.

Let's consider it in this way: We have hardware with IOMMU support. So
we create a default_domain to the particular IOMMU (unit) in the group
for the kernel driver to use it. Now the device is going to be used by a
VM or a Container. So we unbind it from the original driver, and put the
default_domain away,  create a new domain for this particular use case. 
So now the device shows up as a platform or pci device to the user
space. This is what VFIO try to provide. Mdev extends the scenario but
dose not change the intention. And I think that is why Alex emphasis
pre-allocating resource to the mdev.

But what WarpDrive need is to get service from the hardware itself and
set mapping to its current domain, aka defaut_domain. If we do it in
VFIO-mdev, it looks like the VFIO framework takes all the effort to put
the default_domain away and create a new one and be ready for user space
to use. But I tell him stop using the new domain and try the original one...

It is not reasonable, isn't it:)

So why don't I just take the request and set it into the default_domain
directly? The true requirement of WarpDrive is to let process set the
page table for particular pasid or substream id, so it can accept
command with address in the process space. It needs no device.

From this perspective, it seems there is no reason to keep it in VFIO.

Thanks
Kenneth
>
> Thanks,
> Jean
>

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

2018-08-13 09:29:31

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Sat, Aug 11, 2018 at 11:26:48PM +0800, Kenneth Lee wrote:
> Date: Sat, 11 Aug 2018 23:26:48 +0800
> From: Kenneth Lee <[email protected]>
> To: Jean-Philippe Brucker <[email protected]>, Kenneth Lee
> <[email protected]>, Jerome Glisse <[email protected]>
> CC: Herbert Xu <[email protected]>, "[email protected]"
> <[email protected]>, Jonathan Corbet <[email protected]>, Greg
> Kroah-Hartman <[email protected]>, Zaibo Xu <[email protected]>,
> "[email protected]" <[email protected]>, "Kumar, Sanjay K"
> <[email protected]>, "Tian, Kevin" <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>,
> "[email protected]" <[email protected]>, Alex Williamson
> <[email protected]>, "[email protected]"
> <[email protected]>, Philippe Ombredanne
> <[email protected]>, Thomas Gleixner <[email protected]>, Hao Fang
> <[email protected]>, "David S . Miller" <[email protected]>,
> "[email protected]"
> <[email protected]>
> Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
> Thunderbird/52.9.1
> Message-ID: <[email protected]>
>
>
>
> 在 2018年08月10日 星期五 09:12 下午, Jean-Philippe Brucker 写道:
> >Hi Kenneth,
> >
> >On 10/08/18 04:39, Kenneth Lee wrote:
> >>>You can achieve everything you want to achieve with existing upstream
> >>>solution. Re-inventing a whole new driver infrastructure should really
> >>>be motivated with strong and obvious reasons.
> >>I want to understand better of your idea. If I create some unified helper
> >>APIs in drivers/iommu/, say:
> >>
> >> wd_create_dev(parent_dev, wd_dev)
> >> wd_release_dev(wd_dev)
> >>
> >>The API create chrdev to take request from user space for open(resource
> >>allocation), iomap, epoll (irq), and dma_map(with pasid automatically).
> >>
> >>Do you think it is acceptable?
> >Maybe not drivers/iommu/ :) That subsystem only contains tools for
> >dealing with DMA, I don't think epoll, resource enumeration or iomap fit
> >in there.
> Yes. I should consider where to put it carefully.
> >
> >Creating new helpers seems to be precisely what we're trying to avoid in
> >this thread, and vfio-mdev does provide the components that you
> >describe, so I wouldn't discard it right away. When the GPU, net, block
> >or another subsystem doesn't fit your needs, either because your
> >accelerator provides some specialized function, or because for
> >performance reasons your client wants direct MMIO access, you can at
> >least build your driver and library on top of those existing VFIO
> >components:
> >
> >* open allocates a partition of an accelerator.
> >* vfio_device_info, vfio_region_info and vfio_irq_info enumerates
> >available resources.
> >* vfio_irq_set deals with epoll.
> >* mmap gives you a private MMIO doorbell.
> >* vfio_iommu_type1 provides the DMA operations.
> >
> >Currently missing:
> >
> >* Sharing the parent IOMMU between mdev, which is also what the "IOMMU
> >aware mediated device" series tackles, and seems like a logical addition
> >to VFIO. I'd argue that the existing IOMMU ops (or ones implemented by
> >the SVA series) can be used to deal with this
> >
> >* The interface to discover an accelerator near your memory node, or one
> >that you can chain with other devices. If I understood correctly the
> >conclusion was that the API (a topology description in sysfs?) should be
> >common to various subsystems, in which case vfio-mdev (or the mediating
> >driver) could also use it.
> >
> >* The queue abstraction discussed on patch 3/7. Perhaps the current vfio
> >resource description of MMIO and IRQ is sufficient here as well, since
> >vendors tend to each implement their own queue schemes. If you need
> >additional features, read/write fops give the mediating driver a lot of
> >freedom. To support features that are too specific for drivers/vfio/ you
> >can implement a config space with capabilities and registers of your
> >choice. If you're versioning the capabilities, the code to handle them
> >could even be shared between different accelerator drivers and libraries.
> Thank you, Jean,
>
> The major reason that I want to remove dependency to VFIO is: I
> accepted that the whole logic of VFIO was built on the idea of
> creating virtual device.
>
> Let's consider it in this way: We have hardware with IOMMU support.
> So we create a default_domain to the particular IOMMU (unit) in the
> group for the kernel driver to use it. Now the device is going to be
> used by a VM or a Container. So we unbind it from the original
> driver, and put the default_domain away,  create a new domain for
> this particular use case.  So now the device shows up as a platform
> or pci device to the user space. This is what VFIO try to provide.
> Mdev extends the scenario but dose not change the intention. And I
> think that is why Alex emphasis pre-allocating resource to the mdev.
>
> But what WarpDrive need is to get service from the hardware itself
> and set mapping to its current domain, aka defaut_domain. If we do
> it in VFIO-mdev, it looks like the VFIO framework takes all the
> effort to put the default_domain away and create a new one and be
> ready for user space to use. But I tell him stop using the new
> domain and try the original one...
>
> It is not reasonable, isn't it:)
>
> So why don't I just take the request and set it into the
> default_domain directly? The true requirement of WarpDrive is to let
> process set the page table for particular pasid or substream id, so
> it can accept command with address in the process space. It needs no
> device.
>
> From this perspective, it seems there is no reason to keep it in VFIO.
>

I made a quick change basing on the RFCv1 here:

https://github.com/Kenneth-Lee/linux-kernel-warpdrive/commits/warpdrive-v0.6

I just made it compilable and not test it yet. But it shows how the idea is
going to be.

The Pros is: most of the virtual device stuff can be removed. Resource
management is on the openned files only.

The Cons is: as Jean said, we have to redo something that has been done by VFIO.
These mainly are:

1. Track the dma operation and remove them on resource releasing
2. Pin the memory with gup and do accounting

It not going to be easy to make a decision...

> Thanks
> Kenneth
> >
> >Thanks,
> >Jean
> >

2018-08-13 19:23:01

by Jerome Glisse

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Mon, Aug 13, 2018 at 05:29:31PM +0800, Kenneth Lee wrote:
>
> I made a quick change basing on the RFCv1 here:
>
> https://github.com/Kenneth-Lee/linux-kernel-warpdrive/commits/warpdrive-v0.6
>
> I just made it compilable and not test it yet. But it shows how the idea is
> going to be.
>
> The Pros is: most of the virtual device stuff can be removed. Resource
> management is on the openned files only.
>
> The Cons is: as Jean said, we have to redo something that has been done by VFIO.
> These mainly are:
>
> 1. Track the dma operation and remove them on resource releasing
> 2. Pin the memory with gup and do accounting
>
> It not going to be easy to make a decision...
>

Maybe it would be good to list things you want do. Looking at your tree
it seems you are re-inventing what dma-buf is already doing.

So here is what i understand for SVM/SVA:
(1) allow userspace to create a command buffer for a device and bind
it to its address space (PASID)
(2) allow userspace to directly schedule commands on its command buffer

No need to do tracking here as SVM/SVA which rely on PASID and something
like PCIE ATS (address translation service). Userspace can shoot itself
in the foot but nothing harmful can happen.


Non SVM/SVA:
(3) allow userspace to wrap a region of its memory into an object so
that it can be DMA map (ie GUP + dma_map_page())
(4) Have userspace schedule command referencing object created in (3)
using an ioctl.

We need to keep track of object usage by the hardware so that we know
when it is safe to release resources (dma_unmap_page()). The dma-buf
provides everything you want for (3) and (4). With dma-buf you create
object and each time it is use by a device you associate a fence with
it. When fence is signaled it means that the hardware is done using
that object. Fence also allow proper synchronization between multiple
devices. For instance making sure that the second device wait for the
first device before starting doing its thing. dma-buf documentations is
much more thorough explaining all this.


Now from implementation point of view, maybe it would be a good idea
to create something like the virtual gem driver. It is a virtual device
that allow to create GEM object. So maybe we want a virtual device that
allow to create dma-buf object from process memory and allow sharing of
those dma-buf between multiple devices.

Userspace would only have to talk to this virtual device to create
object and wrap its memory around, then it could use this object against
many actual devices.


This decouples the memory management, that can be share between all
devices, from the actual device driver, which is obviously specific to
every single device.


Note that dma-buf use file so that once all file reference are gone the
resource can be free and cleanup can happen (dma_unmap_page() ...). This
properly handle the resource lifetime issue you seem to worried about.

Cheers,
J?r?me

2018-08-14 03:46:29

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

On Mon, Aug 13, 2018 at 03:23:01PM -0400, Jerome Glisse wrote:
> Received: from popscn.huawei.com [10.3.17.45] by Turing-Arch-b with POP3
> (fetchmail-6.3.26) for <kenny@localhost> (single-drop); Tue, 14 Aug 2018
> 03:30:02 +0800 (CST)
> Received: from DGGEMM401-HUB.china.huawei.com (10.3.20.209) by
> DGGEML402-HUB.china.huawei.com (10.3.17.38) with Microsoft SMTP Server
> (TLS) id 14.3.399.0; Tue, 14 Aug 2018 03:23:25 +0800
> Received: from dggwg01-in.huawei.com (172.30.65.34) by
> DGGEMM401-HUB.china.huawei.com (10.3.20.209) with Microsoft SMTP Server id
> 14.3.399.0; Tue, 14 Aug 2018 03:23:21 +0800
> Received: from mx1.redhat.com (unknown [66.187.233.73]) by Forcepoint
> Email with ESMTPS id 301B5D4F60895 for <[email protected]>; Tue, 14
> Aug 2018 03:23:16 +0800 (CST)
> Received: from smtp.corp.redhat.com
> (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using
> TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client
> certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id
> 4D0A14023461; Mon, 13 Aug 2018 19:23:05 +0000 (UTC)
> Received: from redhat.com (unknown [10.20.6.215]) by
> smtp.corp.redhat.com (Postfix) with ESMTPS id 20D072156712; Mon, 13 Aug
> 2018 19:23:03 +0000 (UTC)
> Date: Mon, 13 Aug 2018 15:23:01 -0400
> From: Jerome Glisse <[email protected]>
> To: Kenneth Lee <[email protected]>
> CC: Kenneth Lee <[email protected]>, Jean-Philippe Brucker
> <[email protected]>, Herbert Xu <[email protected]>,
> "[email protected]" <[email protected]>, Jonathan Corbet
> <[email protected]>, Greg Kroah-Hartman <[email protected]>, Zaibo
> Xu <[email protected]>, "[email protected]"
> <[email protected]>, "Kumar, Sanjay K" <[email protected]>,
> "Tian, Kevin" <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, "[email protected]"
> <[email protected]>, Alex Williamson <[email protected]>,
> "[email protected]" <[email protected]>, Philippe
> Ombredanne <[email protected]>, Thomas Gleixner <[email protected]>,
> Hao Fang <[email protected]>, "David S . Miller" <[email protected]>,
> "[email protected]"
> <[email protected]>
> Subject: Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive
> Message-ID: <[email protected]>
> References: <20180806031252.GG91035@Turing-Arch-b>
> <[email protected]>
> <[email protected]>
> <[email protected]> <20180809080352.GI91035@Turing-Arch-b>
> <[email protected]> <20180810033913.GK91035@Turing-Arch-b>
> <[email protected]>
> <[email protected]>
> <20180813092931.GL91035@Turing-Arch-b>
> Content-Type: text/plain; charset="iso-8859-1"
> Content-Disposition: inline
> Content-Transfer-Encoding: 8bit
> In-Reply-To: <20180813092931.GL91035@Turing-Arch-b>
> User-Agent: Mutt/1.10.0 (2018-05-17)
> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6
> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
> (mx1.redhat.com [10.11.55.6]); Mon, 13 Aug 2018 19:23:05 +0000 (UTC)
> X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com
> [10.11.55.6]); Mon, 13 Aug 2018 19:23:05 +0000 (UTC) for IP:'10.11.54.6'
> DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com'
> HELO:'smtp.corp.redhat.com' FROM:'[email protected]' RCPT:''
> Return-Path: [email protected]
> X-MS-Exchange-Organization-AuthSource: DGGEMM401-HUB.china.huawei.com
> X-MS-Exchange-Organization-AuthAs: Anonymous
> MIME-Version: 1.0
>
> On Mon, Aug 13, 2018 at 05:29:31PM +0800, Kenneth Lee wrote:
> >
> > I made a quick change basing on the RFCv1 here:
> >
> > https://github.com/Kenneth-Lee/linux-kernel-warpdrive/commits/warpdrive-v0.6
> >
> > I just made it compilable and not test it yet. But it shows how the idea is
> > going to be.
> >
> > The Pros is: most of the virtual device stuff can be removed. Resource
> > management is on the openned files only.
> >
> > The Cons is: as Jean said, we have to redo something that has been done by VFIO.
> > These mainly are:
> >
> > 1. Track the dma operation and remove them on resource releasing
> > 2. Pin the memory with gup and do accounting
> >
> > It not going to be easy to make a decision...
> >
>
> Maybe it would be good to list things you want do. Looking at your tree
> it seems you are re-inventing what dma-buf is already doing.

My English is quite limited;). I think I did not explain it well in the WrapDrive
document. Please let me try again here:

The requirement of WrapDrive is simple. Many acceleration requirements are from
user space, such as OpenSSL, AI, and so on. We want to provide a framework for
the user land application to summon the accelerator easily.

So the scenario is simple: The application is doing its job and faces a tough
and boring task, say compression. Then it drops the data pointer to the command
queue and let the accelerator do it at its bidding, and continue its job until
the task is done, or its other task synchronously

My understanding to the dma-buf is driver oriented. The buffer is created by one
driver and exported as fd to user space, then the user space can share it with
some attached devices.

But our scenario is that the whole buffer is from the user space. The
application will directly assign the pointer to the command queue. The hardware
will directly use this address. The kernel should set this particular virtual
memory or the whole process space to the IOMMU with its pasid. So the hardware
can use the process' address, which may not only be the address assigning in the
command queue, it can also be an address inside the memory itself.

To let this work, we have to pin the memory or set a page fault handler when the
memory is shared (to the hardware). And... the big work of pinning memory is not
gup (get user page), but rlimit accounting:)

>
> So here is what i understand for SVM/SVA:
> (1) allow userspace to create a command buffer for a device and bind
> it to its address space (PASID)
> (2) allow userspace to directly schedule commands on its command buffer
>
> No need to do tracking here as SVM/SVA which rely on PASID and something
> like PCIE ATS (address translation service). Userspace can shoot itself
> in the foot but nothing harmful can happen.
>

Yes, we can release the whole page table based on PASID (It also need Jean to
provide this interface;)). But the gup part still need to be tracked. (This is
what is done in VFIO)

>
> Non SVM/SVA:
> (3) allow userspace to wrap a region of its memory into an object so
> that it can be DMA map (ie GUP + dma_map_page())
> (4) Have userspace schedule command referencing object created in (3)
> using an ioctl.

Yes, this is going to be something like NOIOMMU mode in VFIO. The hardware have
to accept DMA/physical address. But anyway, this not the major intension of
WrapDrive.

>
> We need to keep track of object usage by the hardware so that we know
> when it is safe to release resources (dma_unmap_page()). The dma-buf
> provides everything you want for (3) and (4). With dma-buf you create
> object and each time it is use by a device you associate a fence with
> it. When fence is signaled it means that the hardware is done using
> that object. Fence also allow proper synchronization between multiple
> devices. For instance making sure that the second device wait for the
> first device before starting doing its thing. dma-buf documentations is
> much more thorough explaining all this.
>

Your idea hints me that the dma_buf design is based on sharing memory from the
driver, not from the user space. That's why the signal is based on the buffer
itself, because the buffer can be used again and again.

This is good for performance. The cost is high if remap the data every time. But
if consider we can devote the whole application space with SVM/SVA support. This
can become acceptable.

>
> Now from implementation point of view, maybe it would be a good idea
> to create something like the virtual gem driver. It is a virtual device
> that allow to create GEM object. So maybe we want a virtual device that
> allow to create dma-buf object from process memory and allow sharing of
> those dma-buf between multiple devices.
>
> Userspace would only have to talk to this virtual device to create
> object and wrap its memory around, then it could use this object against
> many actual devices.
>
>
> This decouples the memory management, that can be share between all
> devices, from the actual device driver, which is obviously specific to
> every single device.
>

Forcing the application to use the device allocate memory is an alluring choice,
it makes thing simple. Let us consider it for a while...

>
> Note that dma-buf use file so that once all file reference are gone the
> resource can be free and cleanup can happen (dma_unmap_page() ...). This
> properly handle the resource lifetime issue you seem to worried about.
>
> Cheers,
> J?r?me

--
-Kenneth(Hisilicon)