2018-11-12 17:50:32

by Kenneth Lee

[permalink] [raw]
Subject: [RFCv3 PATCH 0/6] A General Accelerator Framework, WarpDrive

From: Kenneth Lee <[email protected]>

*WarpDrive* is a general accelerator framework for the user application to
access the hardware without going through the kernel in data path.

WarpDrive is the name for the whole framework. The component in kernel
is called uacce, meaning "Unified/User-space-access-intended Accelerator
Framework". It makes use of the capability of IOMMU to maintain a
unified virtual address space between the hardware and the process.

The first patch in this patchset contains document of the framework.
Please refer to it for more information.

WarpDrive is intended to be used with Jean Philippe Brucker's SVA
patchset[1], which enables IO side page fault and PASID support. But
this patchset itself is simply the first step to support the no-PASID
version.

The patch 1 is document of the framework. The patch 2 add uacce support.
The patch 3 and 4 are drivers for Hisilicon's ZIP Accelerator. It is
registered to crypto subsystem. The patch 5 adds uacce support to the
accelerator driver. It then can work in either crypto or uacce mode. It
can support both mode at the same time when PASID mode is added in the
future.

The patch 6 is is a user space sample demonstrating how WarpDrive is
used.

The intension of this RFC is to know if the idea is acceptable.
So the focus is the uacce itself. The patch 3 and 4 will be sent to mainline
by another topic. We assume they will be accepted first and we can
upstream uacce. And if SVA is upstreamed, the new feature will be added
accordingly.

Change History:
V3 changed from V2:
1. Build uacce from original IOMMU interface. V2 is built on VFIO.
But the VFIO way locking the user memory in place will not
work properly if the process fork a child. Because the
copy-on-write strategy will make the parent process lost its
page. This is not acceptable to accelerator user.
2. The kernel component is renamed to uacce from sdmdev
accordingly
3. Document is updated for the new design. The Static Shared
Virtual Memory concept is introduced to replace the User
Memory Sharing concept.
4. Rebase to the lastest kernel (4.20.0-rc1)
5. As an RFC, this version is tested only with "test-to-pass"
test case and not tested with Jean's SVA patch.

V2 changed from V1:
1. Change kernel framework name from SPIMDEV (Share Parent IOMMU
Mdev) to SDMDEV (Share Domain Mdev).
2. Allocate Hardware Resource when a new mdev is created (While
it is allocated when the mdev is openned)
3. Unmap pages from the shared domain when the sdmdev iommu group is
detached. (This procedure is necessary, but missed in V1)
4. Update document accordingly.
5. Rebase to the latest kernel (4.19.0-rc1)


Refernces:
[1] https://www.spinics.net/lists/kernel/msg2651481.html
[2] https://github.com/Kenneth-Lee/linux-kernel-warpdrive/tree/warpdrive-sva-v0.5
[3] https://lkml.org/lkml/2018/7/22/34

Best Regards
Kenneth Lee

Kenneth Lee (6):
uacce: Add documents for WarpDrive/uacce
uacce: add uacce module
crypto/hisilicon: add hisilicon Queue Manager driver
crypto/hisilicon: add Hisilicon zip driver
crypto: add uacce support to Hisilicon qm
uacce: add user sample for uacce/warpdrive

Documentation/warpdrive/warpdrive.rst | 260 ++++++
Documentation/warpdrive/wd-arch.svg | 764 ++++++++++++++++
Documentation/warpdrive/wd.svg | 526 +++++++++++
Documentation/warpdrive/wd_q_addr_space.svg | 359 ++++++++
drivers/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/crypto/hisilicon/Kconfig | 19 +-
drivers/crypto/hisilicon/Makefile | 2 +
drivers/crypto/hisilicon/qm.c | 926 ++++++++++++++++++++
drivers/crypto/hisilicon/qm.h | 227 +++++
drivers/crypto/hisilicon/qm_usr_if.h | 32 +
drivers/crypto/hisilicon/zip/Makefile | 2 +
drivers/crypto/hisilicon/zip/zip.h | 57 ++
drivers/crypto/hisilicon/zip/zip_crypto.c | 362 ++++++++
drivers/crypto/hisilicon/zip/zip_crypto.h | 18 +
drivers/crypto/hisilicon/zip/zip_main.c | 191 ++++
drivers/uacce/Kconfig | 11 +
drivers/uacce/Makefile | 2 +
drivers/uacce/uacce.c | 902 +++++++++++++++++++
include/linux/uacce.h | 117 +++
include/uapi/linux/uacce.h | 33 +
samples/warpdrive/AUTHORS | 3 +
samples/warpdrive/ChangeLog | 1 +
samples/warpdrive/Makefile.am | 9 +
samples/warpdrive/NEWS | 1 +
samples/warpdrive/README | 32 +
samples/warpdrive/autogen.sh | 3 +
samples/warpdrive/cleanup.sh | 13 +
samples/warpdrive/conf.sh | 4 +
samples/warpdrive/configure.ac | 52 ++
samples/warpdrive/drv/hisi_qm_udrv.c | 228 +++++
samples/warpdrive/drv/hisi_qm_udrv.h | 57 ++
samples/warpdrive/drv/wd_drv.h | 19 +
samples/warpdrive/test/Makefile.am | 7 +
samples/warpdrive/test/test_hisi_zip.c | 150 ++++
samples/warpdrive/wd.c | 96 ++
samples/warpdrive/wd.h | 97 ++
samples/warpdrive/wd_adapter.c | 71 ++
samples/warpdrive/wd_adapter.h | 36 +
39 files changed, 5691 insertions(+), 1 deletion(-)
create mode 100644 Documentation/warpdrive/warpdrive.rst
create mode 100644 Documentation/warpdrive/wd-arch.svg
create mode 100644 Documentation/warpdrive/wd.svg
create mode 100644 Documentation/warpdrive/wd_q_addr_space.svg
create mode 100644 drivers/crypto/hisilicon/qm.c
create mode 100644 drivers/crypto/hisilicon/qm.h
create mode 100644 drivers/crypto/hisilicon/qm_usr_if.h
create mode 100644 drivers/crypto/hisilicon/zip/Makefile
create mode 100644 drivers/crypto/hisilicon/zip/zip.h
create mode 100644 drivers/crypto/hisilicon/zip/zip_crypto.c
create mode 100644 drivers/crypto/hisilicon/zip/zip_crypto.h
create mode 100644 drivers/crypto/hisilicon/zip/zip_main.c
create mode 100644 drivers/uacce/Kconfig
create mode 100644 drivers/uacce/Makefile
create mode 100644 drivers/uacce/uacce.c
create mode 100644 include/linux/uacce.h
create mode 100644 include/uapi/linux/uacce.h
create mode 100644 samples/warpdrive/AUTHORS
create mode 100644 samples/warpdrive/ChangeLog
create mode 100644 samples/warpdrive/Makefile.am
create mode 100644 samples/warpdrive/NEWS
create mode 100644 samples/warpdrive/README
create mode 100755 samples/warpdrive/autogen.sh
create mode 100755 samples/warpdrive/cleanup.sh
create mode 100755 samples/warpdrive/conf.sh
create mode 100644 samples/warpdrive/configure.ac
create mode 100644 samples/warpdrive/drv/hisi_qm_udrv.c
create mode 100644 samples/warpdrive/drv/hisi_qm_udrv.h
create mode 100644 samples/warpdrive/drv/wd_drv.h
create mode 100644 samples/warpdrive/test/Makefile.am
create mode 100644 samples/warpdrive/test/test_hisi_zip.c
create mode 100644 samples/warpdrive/wd.c
create mode 100644 samples/warpdrive/wd.h
create mode 100644 samples/warpdrive/wd_adapter.c
create mode 100644 samples/warpdrive/wd_adapter.h

--
2.17.1


2018-11-20 05:50:23

by Jerome Glisse

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 07:19:04PM +0000, Christopher Lameter wrote:
> On Mon, 19 Nov 2018, Jerome Glisse wrote:
>
> > > IIRC this is solved in IB by automatically calling
> > > madvise(MADV_DONTFORK) before creating the MR.
> > >
> > > MADV_DONTFORK
> > > .. This is useful to prevent copy-on-write semantics from changing the
> > > physical location of a page if the parent writes to it after a
> > > fork(2) ..
> >
> > This would work around the issue but this is not transparent ie
> > range marked with DONTFORK no longer behave as expected from the
> > application point of view.
>
> Why would anyone expect a range registered via MR behave as normal? Device
> I/O is going on into that range. Memory is already special.
>
>
> > Also it relies on userspace doing the right thing (which is not
> > something i usualy trust :)).
>
> This has been established practice for 15 or so years in a couple of use
> cases. Again user space already has to be doing special things in order to
> handle RDMA is that area.

Yes RDMA as an existing historical track record and thus people
should now be aware of its limitation. What i am fighting against
is new addition to kernel that pretend to do SVA (share virtual
address) while their hardware is not realy doing SVA. SVA with
IOMMU and ATS/PASID is fine, SVA in software with device driver
that abide to mmu notifier is fine. Anything else is not.

So i am just worrying about new user and making sure they under-
stand what is happening and not sell to their user something
false.

Cheers,
J?r?me

2018-11-20 05:27:31

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 01:42:16PM -0500, Jerome Glisse wrote:
> On Mon, Nov 19, 2018 at 11:27:52AM -0700, Jason Gunthorpe wrote:
> > On Mon, Nov 19, 2018 at 11:48:54AM -0500, Jerome Glisse wrote:
> >
> > > Just to comment on this, any infiniband driver which use umem and do
> > > not have ODP (here ODP for me means listening to mmu notifier so all
> > > infiniband driver except mlx5) will be affected by same issue AFAICT.
> > >
> > > AFAICT there is no special thing happening after fork() inside any of
> > > those driver. So if parent create a umem mr before fork() and program
> > > hardware with it then after fork() the parent might start using new
> > > page for the umem range while the old memory is use by the child. The
> > > reverse is also true (parent using old memory and child new memory)
> > > bottom line you can not predict which memory the child or the parent
> > > will use for the range after fork().
> > >
> > > So no matter what you consider the child or the parent, what the hw
> > > will use for the mr is unlikely to match what the CPU use for the
> > > same virtual address. In other word:
> > >
> > > Before fork:
> > > CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
> > > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
> > >
> > > Case 1:
> > > CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
> > > CPU child: virtual addr ptr1 -> physical address = 0xDEAD
> > > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
> > >
> > > Case 2:
> > > CPU parent: virtual addr ptr1 -> physical address = 0xBEEF
> > > CPU child: virtual addr ptr1 -> physical address = 0xCAFE
> > > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
> >
> > IIRC this is solved in IB by automatically calling
> > madvise(MADV_DONTFORK) before creating the MR.
> >
> > MADV_DONTFORK
> > .. This is useful to prevent copy-on-write semantics from changing the
> > physical location of a page if the parent writes to it after a
> > fork(2) ..
>
> This would work around the issue but this is not transparent ie
> range marked with DONTFORK no longer behave as expected from the
> application point of view.
>
> Also it relies on userspace doing the right thing (which is not
> something i usualy trust :)).

The good thing that we didn't see anyone who succeeded to run
IB stack without our user space, which does right thing under
the hood :).

>
> Cheers,
> J?r?me


Attachments:
(No filename) (2.40 kB)
signature.asc (801.00 B)
Download all attachments

2018-11-12 17:52:39

by Kenneth Lee

[permalink] [raw]
Subject: [RFCv3 PATCH 4/6] crypto/hisilicon: add Hisilicon zip driver

From: Kenneth Lee <[email protected]>

The Hisilicon ZIP accelerator implements the zlib and gzip algorithm. It
uses Hisilicon QM as the interface to the CPU.

This patch provides PCIE driver to the accelerator and register it to
the crypto subsystem.

Signed-off-by: Kenneth Lee <[email protected]>
Signed-off-by: Zhou Wang <[email protected]>
Signed-off-by: Hao Fang <[email protected]>
---
drivers/crypto/hisilicon/Kconfig | 7 +
drivers/crypto/hisilicon/Makefile | 1 +
drivers/crypto/hisilicon/zip/Makefile | 2 +
drivers/crypto/hisilicon/zip/zip.h | 57 ++++
drivers/crypto/hisilicon/zip/zip_crypto.c | 362 ++++++++++++++++++++++
drivers/crypto/hisilicon/zip/zip_crypto.h | 18 ++
drivers/crypto/hisilicon/zip/zip_main.c | 174 +++++++++++
7 files changed, 621 insertions(+)
create mode 100644 drivers/crypto/hisilicon/zip/Makefile
create mode 100644 drivers/crypto/hisilicon/zip/zip.h
create mode 100644 drivers/crypto/hisilicon/zip/zip_crypto.c
create mode 100644 drivers/crypto/hisilicon/zip/zip_crypto.h
create mode 100644 drivers/crypto/hisilicon/zip/zip_main.c

diff --git a/drivers/crypto/hisilicon/Kconfig b/drivers/crypto/hisilicon/Kconfig
index 0e40f4a6666b..ce9deefbf037 100644
--- a/drivers/crypto/hisilicon/Kconfig
+++ b/drivers/crypto/hisilicon/Kconfig
@@ -15,3 +15,10 @@ config CRYPTO_DEV_HISI_SEC
config CRYPTO_DEV_HISI_QM
tristate
depends on ARM64 && PCI
+
+config CRYPTO_DEV_HISI_ZIP
+ tristate "Support for HISI ZIP Driver"
+ depends on ARM64
+ select CRYPTO_DEV_HISI_QM
+ help
+ Support for HiSilicon HIP08 ZIP Driver.
diff --git a/drivers/crypto/hisilicon/Makefile b/drivers/crypto/hisilicon/Makefile
index 05e9052e0f52..c97c5b27c3cb 100644
--- a/drivers/crypto/hisilicon/Makefile
+++ b/drivers/crypto/hisilicon/Makefile
@@ -1,3 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_CRYPTO_DEV_HISI_SEC) += sec/
obj-$(CONFIG_CRYPTO_DEV_HISI_QM) += qm.o
+obj-$(CONFIG_CRYPTO_DEV_HISI_ZIP) += zip/
diff --git a/drivers/crypto/hisilicon/zip/Makefile b/drivers/crypto/hisilicon/zip/Makefile
new file mode 100644
index 000000000000..a936f099ee22
--- /dev/null
+++ b/drivers/crypto/hisilicon/zip/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_CRYPTO_DEV_HISI_ZIP) += hisi_zip.o
+hisi_zip-objs = zip_main.o zip_crypto.o
diff --git a/drivers/crypto/hisilicon/zip/zip.h b/drivers/crypto/hisilicon/zip/zip.h
new file mode 100644
index 000000000000..26d72f7153b0
--- /dev/null
+++ b/drivers/crypto/hisilicon/zip/zip.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+#ifndef HISI_ZIP_H
+#define HISI_ZIP_H
+
+#include <linux/list.h>
+#include "../qm.h"
+
+#define HZIP_SQE_SIZE 128
+#define HZIP_SQ_SIZE (HZIP_SQE_SIZE * QM_Q_DEPTH)
+#define QM_CQ_SIZE (QM_CQE_SIZE * QM_Q_DEPTH)
+#define HZIP_PF_DEF_Q_NUM 64
+#define HZIP_PF_DEF_Q_BASE 0
+
+struct hisi_zip {
+ struct qm_info qm;
+ struct list_head list;
+
+#ifdef CONFIG_UACCE
+ struct uacce *uacce;
+#endif
+};
+
+struct hisi_zip_sqe {
+ __u32 consumed;
+ __u32 produced;
+ __u32 comp_data_length;
+ __u32 dw3;
+ __u32 input_data_length;
+ __u32 lba_l;
+ __u32 lba_h;
+ __u32 dw7;
+ __u32 dw8;
+ __u32 dw9;
+ __u32 dw10;
+ __u32 priv_info;
+ __u32 dw12;
+ __u32 tag;
+ __u32 dest_avail_out;
+ __u32 rsvd0;
+ __u32 comp_head_addr_l;
+ __u32 comp_head_addr_h;
+ __u32 source_addr_l;
+ __u32 source_addr_h;
+ __u32 dest_addr_l;
+ __u32 dest_addr_h;
+ __u32 stream_ctx_addr_l;
+ __u32 stream_ctx_addr_h;
+ __u32 cipher_key1_addr_l;
+ __u32 cipher_key1_addr_h;
+ __u32 cipher_key2_addr_l;
+ __u32 cipher_key2_addr_h;
+ __u32 rsvd1[4];
+};
+
+extern struct list_head hisi_zip_list;
+
+#endif
diff --git a/drivers/crypto/hisilicon/zip/zip_crypto.c b/drivers/crypto/hisilicon/zip/zip_crypto.c
new file mode 100644
index 000000000000..104e5140bf4b
--- /dev/null
+++ b/drivers/crypto/hisilicon/zip/zip_crypto.c
@@ -0,0 +1,362 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Copyright 2018 (c) HiSilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/crypto.h>
+#include <linux/dma-mapping.h>
+#include <linux/pci.h>
+#include <linux/topology.h>
+#include "../qm.h"
+#include "zip.h"
+
+#define INPUT_BUFFER_SIZE (64 * 1024)
+#define OUTPUT_BUFFER_SIZE (64 * 1024)
+
+#define COMP_NAME_TO_TYPE(alg_name) \
+ (!strcmp((alg_name), "zlib-deflate") ? 0x02 : \
+ !strcmp((alg_name), "gzip") ? 0x03 : 0) \
+
+struct hisi_zip_buffer {
+ u8 *input;
+ dma_addr_t input_dma;
+ u8 *output;
+ dma_addr_t output_dma;
+};
+
+struct hisi_zip_qp_ctx {
+ struct hisi_zip_buffer buffer;
+ struct hisi_qp *qp;
+ struct hisi_zip_sqe zip_sqe;
+};
+
+struct hisi_zip_ctx {
+#define QPC_COMP 0
+#define QPC_DECOMP 1
+ struct hisi_zip_qp_ctx qp_ctx[2];
+};
+
+static struct hisi_zip *find_zip_device(int node)
+{
+ struct hisi_zip *hisi_zip, *ret = NULL;
+ struct device *dev;
+ int min_distance = 100;
+
+ list_for_each_entry(hisi_zip, &hisi_zip_list, list) {
+ dev = &hisi_zip->qm.pdev->dev;
+ if (node_distance(dev->numa_node, node) < min_distance) {
+ ret = hisi_zip;
+ min_distance = node_distance(dev->numa_node, node);
+ }
+ }
+
+ return ret;
+}
+
+static void hisi_zip_qp_event_notifier(struct hisi_qp *qp)
+{
+ complete(&qp->completion);
+}
+
+static int hisi_zip_fill_sqe_v1(void *sqe, void *q_parm, u32 len)
+{
+ struct hisi_zip_sqe *zip_sqe = (struct hisi_zip_sqe *)sqe;
+ struct hisi_zip_qp_ctx *qp_ctx = (struct hisi_zip_qp_ctx *)q_parm;
+ struct hisi_zip_buffer *buffer = &qp_ctx->buffer;
+
+ memset(zip_sqe, 0, sizeof(struct hisi_zip_sqe));
+
+ zip_sqe->input_data_length = len;
+ zip_sqe->dw9 = qp_ctx->qp->req_type;
+ zip_sqe->dest_avail_out = OUTPUT_BUFFER_SIZE;
+ zip_sqe->source_addr_l = lower_32_bits(buffer->input_dma);
+ zip_sqe->source_addr_h = upper_32_bits(buffer->input_dma);
+ zip_sqe->dest_addr_l = lower_32_bits(buffer->output_dma);
+ zip_sqe->dest_addr_h = upper_32_bits(buffer->output_dma);
+
+ return 0;
+}
+
+/* let's allocate one buffer now, may have problem in async case */
+static int hisi_zip_alloc_qp_buffer(struct hisi_zip_qp_ctx *hisi_zip_qp_ctx)
+{
+ struct hisi_zip_buffer *buffer = &hisi_zip_qp_ctx->buffer;
+ struct hisi_qp *qp = hisi_zip_qp_ctx->qp;
+ struct device *dev = &qp->qm->pdev->dev;
+ int ret;
+
+ /* todo: we are using dma api here. it should be updated to uacce api
+ * for user and kernel mode working at the same time
+ */
+ buffer->input = dma_alloc_coherent(dev, INPUT_BUFFER_SIZE,
+ &buffer->input_dma, GFP_KERNEL);
+ if (!buffer->input)
+ return -ENOMEM;
+
+ buffer->output = dma_alloc_coherent(dev, OUTPUT_BUFFER_SIZE,
+ &buffer->output_dma, GFP_KERNEL);
+ if (!buffer->output) {
+ ret = -ENOMEM;
+ goto err_alloc_output_buffer;
+ }
+
+ return 0;
+
+err_alloc_output_buffer:
+ dma_free_coherent(dev, INPUT_BUFFER_SIZE, buffer->input,
+ buffer->input_dma);
+ return ret;
+}
+
+static void hisi_zip_free_qp_buffer(struct hisi_zip_qp_ctx *hisi_zip_qp_ctx)
+{
+ struct hisi_zip_buffer *buffer = &hisi_zip_qp_ctx->buffer;
+ struct hisi_qp *qp = hisi_zip_qp_ctx->qp;
+ struct device *dev = &qp->qm->pdev->dev;
+
+ dma_free_coherent(dev, INPUT_BUFFER_SIZE, buffer->input,
+ buffer->input_dma);
+ dma_free_coherent(dev, OUTPUT_BUFFER_SIZE, buffer->output,
+ buffer->output_dma);
+}
+
+static int hisi_zip_create_qp(struct qm_info *qm, struct hisi_zip_qp_ctx *ctx,
+ int alg_type, int req_type)
+{
+ struct hisi_qp *qp;
+ int ret;
+
+ qp = hisi_qm_create_qp(qm, alg_type);
+
+ if (IS_ERR(qp))
+ return PTR_ERR(qp);
+
+ qp->event_cb = hisi_zip_qp_event_notifier;
+ qp->req_type = req_type;
+
+ qp->qp_ctx = ctx;
+ ctx->qp = qp;
+
+ ret = hisi_zip_alloc_qp_buffer(ctx);
+ if (ret)
+ goto err_with_qp;
+
+ ret = hisi_qm_start_qp(qp, 0);
+ if (ret)
+ goto err_with_qp_buffer;
+
+ return 0;
+err_with_qp_buffer:
+ hisi_zip_free_qp_buffer(ctx);
+err_with_qp:
+ hisi_qm_release_qp(qp);
+ return ret;
+}
+
+static void hisi_zip_release_qp(struct hisi_zip_qp_ctx *ctx)
+{
+ hisi_qm_release_qp(ctx->qp);
+ hisi_zip_free_qp_buffer(ctx);
+}
+
+static int hisi_zip_alloc_comp_ctx(struct crypto_tfm *tfm)
+{
+ struct hisi_zip_ctx *hisi_zip_ctx = crypto_tfm_ctx(tfm);
+ const char *alg_name = crypto_tfm_alg_name(tfm);
+ struct hisi_zip *hisi_zip;
+ struct qm_info *qm;
+ int ret, i, j;
+ u8 req_type = COMP_NAME_TO_TYPE(alg_name);
+
+ pr_debug("hisi_zip init %s \n", alg_name);
+
+ /* find the proper zip device */
+ hisi_zip = find_zip_device(cpu_to_node(smp_processor_id()));
+ if (!hisi_zip) {
+ pr_err("Can not find proper ZIP device!\n");
+ return -1;
+ }
+ qm = &hisi_zip->qm;
+
+ for (i = 0; i < 2; i++) {
+ /* it is just happen that 0 is compress, 1 is decompress on alg_type */
+ ret = hisi_zip_create_qp(qm, &hisi_zip_ctx->qp_ctx[i], i,
+ req_type);
+ if (ret)
+ goto err;
+ }
+
+ return 0;
+err:
+ for (j = i-1; j >= 0; j--)
+ hisi_zip_release_qp(&hisi_zip_ctx->qp_ctx[j]);
+
+ return ret;
+}
+
+static void hisi_zip_free_comp_ctx(struct crypto_tfm *tfm)
+{
+ struct hisi_zip_ctx *hisi_zip_ctx = crypto_tfm_ctx(tfm);
+ int i;
+
+ /* release the qp */
+ for (i = 1; i >= 0; i--)
+ hisi_zip_release_qp(&hisi_zip_ctx->qp_ctx[i]);
+}
+
+static int hisi_zip_copy_data_to_buffer(struct hisi_zip_qp_ctx *qp_ctx,
+ const u8 *src, unsigned int slen)
+{
+ struct hisi_zip_buffer *buffer = &qp_ctx->buffer;
+
+ if (slen > INPUT_BUFFER_SIZE)
+ return -EINVAL;
+
+ memcpy(buffer->input, src, slen);
+
+ return 0;
+}
+
+static struct hisi_zip_sqe *hisi_zip_get_writeback_sqe(struct hisi_qp *qp)
+{
+ struct hisi_acc_qp_status *qp_status = &qp->qp_status;
+ struct hisi_zip_sqe *sq_base = qp->sqe;
+ u16 sq_head = qp_status->sq_head;
+
+ return sq_base + sq_head;
+}
+
+static int hisi_zip_copy_data_from_buffer(struct hisi_zip_qp_ctx *qp_ctx,
+ u8 *dst, unsigned int *dlen)
+{
+ struct hisi_zip_buffer *buffer = &qp_ctx->buffer;
+ struct hisi_qp *qp = qp_ctx->qp;
+ struct hisi_zip_sqe *zip_sqe = hisi_zip_get_writeback_sqe(qp);
+ u32 status = zip_sqe->dw3 & 0xff;
+
+ if (status != 0) {
+ pr_err("hisi zip: %s fail!\n", (qp->alg_type == 0) ?
+ "compression" : "decompression");
+ return status;
+ }
+
+ if (zip_sqe->produced > OUTPUT_BUFFER_SIZE)
+ return -ENOMEM;
+
+ memcpy(dst, buffer->output, zip_sqe->produced);
+ *dlen = zip_sqe->produced;
+ qp->qp_status.sq_head++;
+
+ return 0;
+}
+
+static int hisi_zip_compress(struct crypto_tfm *tfm, const u8 *src,
+ unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+ struct hisi_zip_ctx *hisi_zip_ctx = crypto_tfm_ctx(tfm);
+ struct hisi_zip_qp_ctx *qp_ctx = &hisi_zip_ctx->qp_ctx[QPC_COMP];
+ struct hisi_qp *qp = qp_ctx->qp;
+ struct hisi_zip_sqe *zip_sqe = &qp_ctx->zip_sqe;
+ int ret;
+
+ ret = hisi_zip_copy_data_to_buffer(qp_ctx, src, slen);
+ if (ret < 0)
+ return ret;
+
+ hisi_zip_fill_sqe_v1(zip_sqe, qp_ctx, slen);
+
+ /* send command to start the compress job */
+ hisi_qp_send(qp, zip_sqe);
+
+ return hisi_zip_copy_data_from_buffer(qp_ctx, dst, dlen);
+}
+
+static int hisi_zip_decompress(struct crypto_tfm *tfm, const u8 *src,
+ unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+ struct hisi_zip_ctx *hisi_zip_ctx = crypto_tfm_ctx(tfm);
+ struct hisi_zip_qp_ctx *qp_ctx = &hisi_zip_ctx->qp_ctx[QPC_DECOMP];
+ struct hisi_qp *qp = qp_ctx->qp;
+ struct hisi_zip_sqe *zip_sqe = &qp_ctx->zip_sqe;
+ int ret;
+
+ ret = hisi_zip_copy_data_to_buffer(qp_ctx, src, slen);
+ if (ret < 0)
+ return ret;
+
+ hisi_zip_fill_sqe_v1(zip_sqe, qp_ctx, slen);
+
+ /* send command to start the compress job */
+ hisi_qp_send(qp, zip_sqe);
+
+ return hisi_zip_copy_data_from_buffer(qp_ctx, dst, dlen);
+}
+
+static struct crypto_alg hisi_zip_zlib = {
+ .cra_name = "zlib-deflate",
+ .cra_flags = CRYPTO_ALG_TYPE_COMPRESS,
+ .cra_ctxsize = sizeof(struct hisi_zip_ctx),
+ .cra_priority = 300,
+ .cra_module = THIS_MODULE,
+ .cra_init = hisi_zip_alloc_comp_ctx,
+ .cra_exit = hisi_zip_free_comp_ctx,
+ .cra_u = {
+ .compress = {
+ .coa_compress = hisi_zip_compress,
+ .coa_decompress = hisi_zip_decompress
+ }
+ }
+};
+
+static struct crypto_alg hisi_zip_gzip = {
+ .cra_name = "gzip",
+ .cra_flags = CRYPTO_ALG_TYPE_COMPRESS,
+ .cra_ctxsize = sizeof(struct hisi_zip_ctx),
+ .cra_priority = 300,
+ .cra_module = THIS_MODULE,
+ .cra_init = hisi_zip_alloc_comp_ctx,
+ .cra_exit = hisi_zip_free_comp_ctx,
+ .cra_u = {
+ .compress = {
+ .coa_compress = hisi_zip_compress,
+ .coa_decompress = hisi_zip_decompress
+ }
+ }
+};
+
+int hisi_zip_register_to_crypto(void)
+{
+ int ret;
+
+ ret = crypto_register_alg(&hisi_zip_zlib);
+ if (ret) {
+ pr_err("Zlib algorithm registration failed\n");
+ return ret;
+ } else
+ pr_debug("hisi_zip: registered algorithm zlib\n");
+
+ ret = crypto_register_alg(&hisi_zip_gzip);
+ if (ret) {
+ pr_err("Gzip algorithm registration failed\n");
+ goto err_unregister_zlib;
+ } else
+ pr_debug("hisi_zip: registered algorithm gzip\n");
+
+ return 0;
+
+err_unregister_zlib:
+ crypto_unregister_alg(&hisi_zip_zlib);
+
+ return ret;
+}
+
+void hisi_zip_unregister_from_crypto(void)
+{
+ crypto_unregister_alg(&hisi_zip_zlib);
+ crypto_unregister_alg(&hisi_zip_gzip);
+}
diff --git a/drivers/crypto/hisilicon/zip/zip_crypto.h b/drivers/crypto/hisilicon/zip/zip_crypto.h
new file mode 100644
index 000000000000..6fae34b4df3a
--- /dev/null
+++ b/drivers/crypto/hisilicon/zip/zip_crypto.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Copyright (c) 2018 HiSilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#ifndef HISI_ZIP_CRYPTO_H
+#define HISI_ZIP_CRYPTO_H
+
+int hisi_zip_register_to_crypto(void);
+void hisi_zip_unregister_from_crypto(void);
+
+#endif
diff --git a/drivers/crypto/hisilicon/zip/zip_main.c b/drivers/crypto/hisilicon/zip/zip_main.c
new file mode 100644
index 000000000000..f4f3b6d89340
--- /dev/null
+++ b/drivers/crypto/hisilicon/zip/zip_main.c
@@ -0,0 +1,174 @@
+// SPDX-License-Identifier: GPL-2.0+
+#include <linux/io.h>
+#include <linux/bitops.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include "zip.h"
+#include "zip_crypto.h"
+
+#define HZIP_VF_NUM 63
+#define HZIP_QUEUE_NUM_V1 4096
+#define HZIP_QUEUE_NUM_V2 1024
+
+#define HZIP_FSM_MAX_CNT 0x301008
+
+#define HZIP_PORT_ARCA_CHE_0 0x301040
+#define HZIP_PORT_ARCA_CHE_1 0x301044
+#define HZIP_PORT_AWCA_CHE_0 0x301060
+#define HZIP_PORT_AWCA_CHE_1 0x301064
+
+#define HZIP_BD_RUSER_32_63 0x301110
+#define HZIP_SGL_RUSER_32_63 0x30111c
+#define HZIP_DATA_RUSER_32_63 0x301128
+#define HZIP_DATA_WUSER_32_63 0x301134
+#define HZIP_BD_WUSER_32_63 0x301140
+
+LIST_HEAD(hisi_zip_list);
+DEFINE_MUTEX(hisi_zip_list_lock);
+
+static const char hisi_zip_name[] = "hisi_zip";
+
+static const struct pci_device_id hisi_zip_dev_ids[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, 0xa250) },
+ { 0, }
+};
+
+static inline void hisi_zip_add_to_list(struct hisi_zip *hisi_zip)
+{
+ mutex_lock(&hisi_zip_list_lock);
+ list_add_tail(&hisi_zip->list, &hisi_zip_list);
+ mutex_unlock(&hisi_zip_list_lock);
+}
+
+static void hisi_zip_set_user_domain_and_cache(struct hisi_zip *hisi_zip)
+{
+ u32 val;
+
+ /* qm user domain */
+ writel(0x40001070, hisi_zip->qm.io_base + QM_ARUSER_M_CFG_1);
+ writel(0xfffffffe, hisi_zip->qm.io_base + QM_ARUSER_M_CFG_ENABLE);
+ writel(0x40001070, hisi_zip->qm.io_base + QM_AWUSER_M_CFG_1);
+ writel(0xfffffffe, hisi_zip->qm.io_base + QM_AWUSER_M_CFG_ENABLE);
+ writel(0xffffffff, hisi_zip->qm.io_base + QM_WUSER_M_CFG_ENABLE);
+ writel(0x4893, hisi_zip->qm.io_base + QM_CACHE_CTL);
+
+ val = readl(hisi_zip->qm.io_base + QM_PEH_AXUSER_CFG);
+ val |= (1 << 11);
+ writel(val, hisi_zip->qm.io_base + QM_PEH_AXUSER_CFG);
+
+ /* qm cache */
+ writel(0xffff, hisi_zip->qm.io_base + QM_AXI_M_CFG);
+ writel(0xffffffff, hisi_zip->qm.io_base + QM_AXI_M_CFG_ENABLE);
+ writel(0xffffffff, hisi_zip->qm.io_base + QM_PEH_AXUSER_CFG_ENABLE);
+
+ /* cache */
+ writel(0xffffffff, hisi_zip->qm.io_base + HZIP_PORT_ARCA_CHE_0);
+ writel(0xffffffff, hisi_zip->qm.io_base + HZIP_PORT_ARCA_CHE_1);
+ writel(0xffffffff, hisi_zip->qm.io_base + HZIP_PORT_AWCA_CHE_0);
+ writel(0xffffffff, hisi_zip->qm.io_base + HZIP_PORT_AWCA_CHE_1);
+ /* user domain configurations */
+ writel(0x40001070, hisi_zip->qm.io_base + HZIP_BD_RUSER_32_63);
+ writel(0x40001070, hisi_zip->qm.io_base + HZIP_SGL_RUSER_32_63);
+ writel(0x40001071, hisi_zip->qm.io_base + HZIP_DATA_RUSER_32_63);
+ writel(0x40001071, hisi_zip->qm.io_base + HZIP_DATA_WUSER_32_63);
+ writel(0x40001070, hisi_zip->qm.io_base + HZIP_BD_WUSER_32_63);
+
+ /* fsm count */
+ writel(0xfffffff, hisi_zip->qm.io_base + HZIP_FSM_MAX_CNT);
+
+ /* clock gating, core, decompress verify enable */
+ writel(0x10005, hisi_zip->qm.io_base + 0x301004);
+}
+
+static int hisi_zip_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+ struct hisi_zip *hisi_zip;
+ struct qm_info *qm;
+ int ret;
+ u8 rev_id;
+
+ hisi_zip = devm_kzalloc(&pdev->dev, sizeof(*hisi_zip), GFP_KERNEL);
+ if (!hisi_zip)
+ return -ENOMEM;
+ hisi_zip_add_to_list(hisi_zip); //todo: this is needed only by crypto
+
+ qm = &hisi_zip->qm;
+ qm->pdev = pdev;
+ qm->qp_base = HZIP_PF_DEF_Q_BASE;
+ qm->qp_num = HZIP_PF_DEF_Q_NUM;
+ qm->sqe_size = HZIP_SQE_SIZE;
+
+ pci_set_drvdata(pdev, hisi_zip);
+
+ pci_read_config_byte(pdev, PCI_REVISION_ID, &rev_id);
+ if (rev_id == 0x20)
+ qm->ver = 1;
+
+ ret = hisi_qm_init(qm);
+ if (ret)
+ return ret;
+
+ if (pdev->is_physfn)
+ hisi_zip_set_user_domain_and_cache(hisi_zip);
+
+ ret = hisi_qm_start(qm);
+ if (ret)
+ goto err_with_qm_init;
+
+ return 0;
+
+err_with_qm_init:
+ hisi_qm_uninit(qm);
+ return ret;
+}
+
+static void hisi_zip_remove(struct pci_dev *pdev)
+{
+ struct hisi_zip *hisi_zip = pci_get_drvdata(pdev);
+ struct qm_info *qm = &hisi_zip->qm;
+
+ hisi_qm_stop(qm);
+ hisi_qm_uninit(qm);
+}
+
+static struct pci_driver hisi_zip_pci_driver = {
+ .name = "hisi_zip",
+ .id_table = hisi_zip_dev_ids,
+ .probe = hisi_zip_probe,
+ .remove = hisi_zip_remove,
+};
+
+static int __init hisi_zip_init(void)
+{
+ int ret;
+
+ ret = pci_register_driver(&hisi_zip_pci_driver);
+ if (ret < 0) {
+ pr_err("zip: can't register hisi zip driver.\n");
+ return ret;
+ }
+
+ ret = hisi_zip_register_to_crypto();
+ if (ret < 0) {
+ pci_unregister_driver(&hisi_zip_pci_driver);
+ return ret;
+ }
+
+ return 0;
+}
+
+static void __exit hisi_zip_exit(void)
+{
+ hisi_zip_unregister_from_crypto();
+ pci_unregister_driver(&hisi_zip_pci_driver);
+}
+
+module_init(hisi_zip_init);
+module_exit(hisi_zip_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Zhou Wang <[email protected]>");
+MODULE_DESCRIPTION("Driver for HiSilicon ZIP accelerator");
+MODULE_DEVICE_TABLE(pci, hisi_zip_dev_ids);
--
2.17.1

2018-11-21 16:39:42

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 08:29:39PM -0700, Jason Gunthorpe wrote:
> Date: Mon, 19 Nov 2018 20:29:39 -0700
> From: Jason Gunthorpe <[email protected]>
> To: Kenneth Lee <[email protected]>
> CC: Leon Romanovsky <[email protected]>, Kenneth Lee <[email protected]>,
> Tim Sell <[email protected]>, [email protected], Alexander
> Shishkin <[email protected]>, Zaibo Xu
> <[email protected]>, [email protected], [email protected],
> [email protected], Christoph Lameter <[email protected]>, Hao Fang
> <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> list <[email protected]>, Zhou Wang <[email protected]>,
> Doug Ledford <[email protected]>, Uwe Kleine-K?nig
> <[email protected]>, David Kershner
> <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> Pitchen <[email protected]>, Sagar Dharia
> <[email protected]>, Jens Axboe <[email protected]>,
> [email protected], linux-netdev <[email protected]>, Randy Dunlap
> <[email protected]>, [email protected], Vinod Koul
> <[email protected]>, [email protected], Philippe Ombredanne
> <[email protected]>, Sanyog Kale <[email protected]>, "David S.
> Miller" <[email protected]>, [email protected]
> Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> User-Agent: Mutt/1.9.4 (2018-02-28)
> Message-ID: <[email protected]>
>
> On Tue, Nov 20, 2018 at 11:07:02AM +0800, Kenneth Lee wrote:
> > On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote:
> > > Date: Mon, 19 Nov 2018 11:49:54 -0700
> > > From: Jason Gunthorpe <[email protected]>
> > > To: Kenneth Lee <[email protected]>
> > > CC: Leon Romanovsky <[email protected]>, Kenneth Lee <[email protected]>,
> > > Tim Sell <[email protected]>, [email protected], Alexander
> > > Shishkin <[email protected]>, Zaibo Xu
> > > <[email protected]>, [email protected], [email protected],
> > > [email protected], Christoph Lameter <[email protected]>, Hao Fang
> > > <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> > > list <[email protected]>, Zhou Wang <[email protected]>,
> > > Doug Ledford <[email protected]>, Uwe Kleine-K?nig
> > > <[email protected]>, David Kershner
> > > <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> > > Pitchen <[email protected]>, Sagar Dharia
> > > <[email protected]>, Jens Axboe <[email protected]>,
> > > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > > <[email protected]>, [email protected], Vinod Koul
> > > <[email protected]>, [email protected], Philippe Ombredanne
> > > <[email protected]>, Sanyog Kale <[email protected]>, "David S.
> > > Miller" <[email protected]>, [email protected]
> > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > User-Agent: Mutt/1.9.4 (2018-02-28)
> > > Message-ID: <[email protected]>
> > >
> > > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> > >
> > > > If the hardware cannot share page table with the CPU, we then need to have
> > > > some way to change the device page table. This is what happen in ODP. It
> > > > invalidates the page table in device upon mmu_notifier call back. But this cannot
> > > > solve the COW problem: if the user process A share a page P with device, and A
> > > > forks a new process B, and it continue to write to the page. By COW, the
> > > > process B will keep the page P, while A will get a new page P'. But you have
> > > > no way to let the device know it should use P' rather than P.
> > >
> > > Is this true? I thought mmu_notifiers covered all these cases.
> > >
> > > The mm_notifier for A should fire if B causes the physical address of
> > > A's pages to change via COW.
> > >
> > > And this causes the device page tables to re-synchronize.
> >
> > I don't see such code. The current do_cow_fault() implemenation has nothing to
> > do with mm_notifer.
>
> Well, that sure sounds like it would be a bug in mmu_notifiers..

Yes, it can be taken that way:) But it is going to be a tough bug.

>
> But considering Jean's SVA stuff seems based on mmu notifiers, I have
> a hard time believing that it has any different behavior from RDMA's
> ODP, and if it does have different behavior, then it is probably just
> a bug in the ODP implementation.

As Jean has explained, his solution is based on page table sharing. I think ODP
should also consider this new feature.

>
> > > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it support
> > > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you don't need
> > > > to write any code for that. Because it has been done by IOMMU framework. If it
> > >
> > > Looks like the IOMMU code uses mmu_notifier, so it is identical to
> > > IB's ODP. The only difference is that IB tends to have the IOMMU page
> > > table in the device, not in the CPU.
> > >
> > > The only case I know if that is different is the new-fangled CAPI
> > > stuff where the IOMMU can directly use the CPU's page table and the
> > > IOMMU page table (in device or CPU) is eliminated.
> >
> > Yes. We are not focusing on the current implementation. As mentioned in the
> > cover letter. We are expecting Jean Philips' SVA patch:
> > git://linux-arm.org/linux-jpb.
>
> This SVA stuff does not look comparable to CAPI as it still requires
> maintaining seperate IOMMU page tables.
>
> Also, those patches from Jean have a lot of references to
> mmu_notifiers (ie look at iommu_mmu_notifier).
>
> Are you really sure it is actually any different at all?
>
> > > Anyhow, I don't think a single instance of hardware should justify an
> > > entire new subsystem. Subsystems are hard to make and without multiple
> > > hardware examples there is no way to expect that it would cover any
> > > future use cases.
> >
> > Yes. That's our first expectation. We can keep it with our driver. But because
> > there is no user driver support for any accelerator in mainline kernel. Even the
> > well known QuickAssit has to be maintained out of tree. So we try to see if
> > people is interested in working together to solve the problem.
>
> Well, you should come with patches ack'ed by these other groups.

Yes, that is what we are doing.

>
> > > If all your driver needs is to mmap some PCI bar space, route
> > > interrupts and do DMA mapping then mediated VFIO is probably a good
> > > choice.
> >
> > Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion and
> > try not to add complexity to the mm subsystem.
>
> Why would a mediated VFIO driver touch the mm subsystem? Sounds like
> you don't have a VFIO driver if it needs to do stuff like that...

VFIO has no ODP-like solution, and if we want to solve the fork problem, we have
to make some change to iommu and the fork procedure. Further, VFIO takes every
queue as a independent device. This create a lot of trouble on resource
management. For example, you will need a manager process to withdraw the unused
device and you need to let the user process know about PASID of the queue, and
so on.

>
> > > If it needs to do a bunch of other stuff, not related to PCI bar
> > > space, interrupts and DMA mapping (ie special code for compression,
> > > crypto, AI, whatever) then you should probably do what Jerome said and
> > > make a drivers/char/hisillicon_foo_bar.c that exposes just what your
> > > hardware does.
> >
> > Yes. If no other accelerator driver writer is interested. That is the
> > expectation:)
>
> I don't think it matters what other drivers do.
>
> If your driver does not need any other kernel code then VFIO is
> sensible. In this kind of world you will probably have a RDMA-like
> userspace driver that can bring this to a common user space API, even
> if one driver use VFIO and a different driver uses something else.

Yes, in some way. But in another way, if they don't have the same address space
logic. It won't be easy to cooperate. In VFIO, you have to use DMA address for
device while you cannot direct share RDMA-ODP memory to other device.
WarpDrive/uacce can solve all these problem.

>
> > You create some connections (queues) to NIC, RSA, and AI engine. Then you got
> > data direct from the NIC and pass the pointer to RSA engine for decryption. The
> > CPU then finish some data taking or operation and then pass through to the AI
> > engine for CNN calculation....This will need a place to maintain the same
> > address space by some means.
>
> How is this any different from what we have today?
>
> SVA is not something even remotely new, IB has been doing various
> versions of it for 20 years.

But now we can have them unified to IOMMU framework.

>
> Jason

--

2018-11-19 19:35:31

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote:
> Date: Thu, 15 Nov 2018 16:54:55 +0200
> From: Leon Romanovsky <[email protected]>
> To: Kenneth Lee <[email protected]>
> CC: Kenneth Lee <[email protected]>, Tim Sell <[email protected]>,
> [email protected], Alexander Shishkin
> <[email protected]>, Zaibo Xu <[email protected]>,
> [email protected], [email protected], [email protected],
> Christoph Lameter <[email protected]>, Hao Fang <[email protected]>, Gavin
> Schenk <[email protected]>, RDMA mailing list
> <[email protected]>, Zhou Wang <[email protected]>, Jason
> Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> Kleine-König <[email protected]>, David Kershner
> <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> Pitchen <[email protected]>, Sagar Dharia
> <[email protected]>, Jens Axboe <[email protected]>,
> [email protected], linux-netdev <[email protected]>, Randy Dunlap
> <[email protected]>, [email protected], Vinod Koul
> <[email protected]>, [email protected], Philippe Ombredanne
> <[email protected]>, Sanyog Kale <[email protected]>, "David S.
> Miller" <[email protected]>, [email protected]
> Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> User-Agent: Mutt/1.10.1 (2018-07-13)
> Message-ID: <[email protected]>
>
> On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote:
> > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote:
> > > Date: Wed, 14 Nov 2018 18:00:17 +0200
> > > From: Leon Romanovsky <[email protected]>
> > > To: Kenneth Lee <[email protected]>
> > > CC: Tim Sell <[email protected]>, [email protected],
> > > Alexander Shishkin <[email protected]>, Zaibo Xu
> > > <[email protected]>, [email protected], [email protected],
> > > [email protected], Christoph Lameter <[email protected]>, Hao Fang
> > > <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> > > list <[email protected]>, Zhou Wang <[email protected]>,
> > > Jason Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> > > Kleine-König <[email protected]>, David Kershner
> > > <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> > > Pitchen <[email protected]>, Sagar Dharia
> > > <[email protected]>, Jens Axboe <[email protected]>,
> > > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > > <[email protected]>, [email protected], Vinod Koul
> > > <[email protected]>, [email protected], Philippe Ombredanne
> > > <[email protected]>, Sanyog Kale <[email protected]>, Kenneth Lee
> > > <[email protected]>, "David S. Miller" <[email protected]>,
> > > [email protected]
> > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > User-Agent: Mutt/1.10.1 (2018-07-13)
> > > Message-ID: <[email protected]>
> > >
> > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote:
> > > >
> > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道:
> > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote:
> > > > > > From: Kenneth Lee <[email protected]>
> > > > > >
> > > > > > WarpDrive is a general accelerator framework for the user application to
> > > > > > access the hardware without going through the kernel in data path.
> > > > > >
> > > > > > The kernel component to provide kernel facility to driver for expose the
> > > > > > user interface is called uacce. It a short name for
> > > > > > "Unified/User-space-access-intended Accelerator Framework".
> > > > > >
> > > > > > This patch add document to explain how it works.
> > > > > + RDMA and netdev folks
> > > > >
> > > > > Sorry, to be late in the game, I don't see other patches, but from
> > > > > the description below it seems like you are reinventing RDMA verbs
> > > > > model. I have hard time to see the differences in the proposed
> > > > > framework to already implemented in drivers/infiniband/* for the kernel
> > > > > space and for the https://github.com/linux-rdma/rdma-core/ for the user
> > > > > space parts.
> > > >
> > > > Thanks Leon,
> > > >
> > > > Yes, we tried to solve similar problem in RDMA. We also learned a lot from
> > > > the exist code of RDMA. But we we have to make a new one because we cannot
> > > > register accelerators such as AI operation, encryption or compression to the
> > > > RDMA framework:)
> > >
> > > Assuming that you did everything right and still failed to use RDMA
> > > framework, you was supposed to fix it and not to reinvent new exactly
> > > same one. It is how we develop kernel, by reusing existing code.
> >
> > Yes, but we don't force other system such as NIC or GPU into RDMA, do we?
>
> You don't introduce new NIC or GPU, but proposing another interface to
> directly access HW memory and bypass kernel for the data path. This is
> whole idea of RDMA and this is why it is already present in the kernel.
>
> Various hardware devices are supported in our stack allow a ton of crazy
> stuff, including GPUs interconnections and NIC functionalities.

Yes. We don't want to invent new wheel. That is why we did it behind VFIO in RFC
v1 and v2. But finally we were persuaded by Mr. Jerome Glisse that VFIO was not
a good place to solve the problem.

And currently, as you see, IB is bound with devices doing RDMA. The register
function, ib_register_device() hint that it is a netdev (get_netdev() callback), it know
about gid, pkey, and Memory Window. IB is not simply a address space management
framework. And verbs to IB are not transparent. If we start to add
compression/decompression, AI (RNN, CNN stuff) operations, and encryption/decryption
to the verbs set. It will become very complexity. Or maybe I misunderstand the
IB idea? But I don't see compression hardware is integrated in the mainline
Kernel. Could you directly point out which one I can used as a reference?

>
> >
> > I assume you would not agree to register a zip accelerator to infiniband? :)
>
> "infiniband" name in the "drivers/infiniband/" is legacy one and the
> current code supports IB, RoCE, iWARP and OmniPath as a transport layers.
> For a lone time, we wanted to rename that folder to be "drivers/rdma",
> but didn't find enough brave men/women to do it, due to backport mess
> for such move.
>
> The addition of zip accelerator to RDMA is possible and depends on how
> you will model such new functionality - new driver, or maybe new ULP.
>
> >
> > Further, I don't think it is wise to break an exist system (RDMA) to fulfill a
> > totally new scenario. The better choice is to let them run in parallel for some
> > time and try to merge them accordingly.
>
> Awesome, so please run your code out-of-tree for now and once you are ready
> for submission let's try to merge it.

Yes, yes. We know trust need time to gain. But the fact is that there is no
accelerator user driver can be added to mainline kernel. We should raise the
topic time to time. So to help the communication to fix the gap, right?

We are also opened to cooperate with IB to do it within the IB framework. But
please let me know where to start. I feel it is quite wired to make a
ib_register_device for a zip or RSA accelerator.

>
> >
> > >
> > > >
> > > > Another problem we tried to address is the way to pin the memory for dma
> > > > operation. The RDMA way to pin the memory cannot avoid the page lost due to
> > > > copy-on-write operation during the memory is used by the device. This may
> > > > not be important to RDMA library. But it is important to accelerator.
> > >
> > > Such support exists in drivers/infiniband/ from late 2014 and
> > > it is called ODP (on demand paging).
> >
> > I reviewed ODP and I think it is a solution bound to infiniband. It is part of
> > MR semantics and required a infiniband specific hook
> > (ucontext->invalidate_range()). And the hook requires the device to be able to
> > stop using the page for a while for the copying. It is ok for infiniband
> > (actually, only mlx5 uses it). I don't think most accelerators can support
> > this mode. But WarpDrive works fully on top of IOMMU interface, it has no this
> > limitation.
>
> 1. It has nothing to do with infiniband.

But it must be a ib_dev first.

> 2. MR and uncontext are verbs semantics and needed to ensure that host
> memory exposed to user is properly protected from security point of view.
> 3. "stop using the page for a while for the copying" - I'm not fully
> understand this claim, maybe this article will help you to better
> describe : https://lwn.net/Articles/753027/

This topic was being discussed in RFCv2. The key problem here is that:

The device need to hold the memory for its own calculation, but the CPU/software
want to stop it for a while for synchronizing with disk or COW.

If the hardware support SVM/SVA (Shared Virtual Memory/Address), it is easy, the
device share page table with CPU, the device will raise a page fault when the
CPU downgrade the PTE to read-only.

If the hardware cannot share page table with the CPU, we then need to have
some way to change the device page table. This is what happen in ODP. It
invalidates the page table in device upon mmu_notifier call back. But this cannot
solve the COW problem: if the user process A share a page P with device, and A
forks a new process B, and it continue to write to the page. By COW, the
process B will keep the page P, while A will get a new page P'. But you have
no way to let the device know it should use P' rather than P.

This may be OK for RDMA application. Because RDMA is a big thing and we can ask
the programmer to avoid the situation. But for a accelerator, I don't think we
can ask a programmer to care for this when use a zlib.

In WarpDrive/uacce, we make this simple. If you support IOMMU and it support
SVM/SVA. Everything will be fine just like ODP implicit mode. And you don't need
to write any code for that. Because it has been done by IOMMU framework. If it
dose not, you have to use the kernel allocated memory which has the same IOVA as
the VA in user space. So we can still maintain a unify address space among the
devices and the applicatin.

> 4. mlx5 supports ODP not because of being partially IB device,
> but because HW performance oriented implementation is not an easy task.
>
> >
> > >
> > > >
> > > > Hope this can help the understanding.
> > >
> > > Yes, it helped me a lot.
> > > Now, I'm more than before convinced that this whole patchset shouldn't
> > > exist in the first place.
> >
> > Then maybe you can tell me how I can register my accelerator to the user space?
>
> Write kernel driver and write user space part of it.
> https://github.com/linux-rdma/rdma-core/
>
> I have no doubts that your colleagues who wrote and maintain
> drivers/infiniband/hw/hns driver know best how to do it.
> They did it very successfully.
>
> Thanks
>
> >
> > >
> > > To be clear, NAK.
> > >
> > > Thanks
> > >
> > > >
> > > > Cheers
> > > >
> > > > >
> > > > > Hard NAK from RDMA side.
> > > > >
> > > > > Thanks
> > > > >
> > > > > > Signed-off-by: Kenneth Lee <[email protected]>
> > > > > > ---
> > > > > > Documentation/warpdrive/warpdrive.rst | 260 +++++++
> > > > > > Documentation/warpdrive/wd-arch.svg | 764 ++++++++++++++++++++
> > > > > > Documentation/warpdrive/wd.svg | 526 ++++++++++++++
> > > > > > Documentation/warpdrive/wd_q_addr_space.svg | 359 +++++++++
> > > > > > 4 files changed, 1909 insertions(+)
> > > > > > create mode 100644 Documentation/warpdrive/warpdrive.rst
> > > > > > create mode 100644 Documentation/warpdrive/wd-arch.svg
> > > > > > create mode 100644 Documentation/warpdrive/wd.svg
> > > > > > create mode 100644 Documentation/warpdrive/wd_q_addr_space.svg
> > > > > >
> > > > > > diff --git a/Documentation/warpdrive/warpdrive.rst b/Documentation/warpdrive/warpdrive.rst
> > > > > > new file mode 100644
> > > > > > index 000000000000..ef84d3a2d462
> > > > > > --- /dev/null
> > > > > > +++ b/Documentation/warpdrive/warpdrive.rst
> > > > > > @@ -0,0 +1,260 @@
> > > > > > +Introduction of WarpDrive
> > > > > > +=========================
> > > > > > +
> > > > > > +*WarpDrive* is a general accelerator framework for the user application to
> > > > > > +access the hardware without going through the kernel in data path.
> > > > > > +
> > > > > > +It can be used as the quick channel for accelerators, network adaptors or
> > > > > > +other hardware for application in user space.
> > > > > > +
> > > > > > +This may make some implementation simpler. E.g. you can reuse most of the
> > > > > > +*netdev* driver in kernel and just share some ring buffer to the user space
> > > > > > +driver for *DPDK* [4] or *ODP* [5]. Or you can combine the RSA accelerator with
> > > > > > +the *netdev* in the user space as a https reversed proxy, etc.
> > > > > > +
> > > > > > +*WarpDrive* takes the hardware accelerator as a heterogeneous processor which
> > > > > > +can share particular load from the CPU:
> > > > > > +
> > > > > > +.. image:: wd.svg
> > > > > > + :alt: WarpDrive Concept
> > > > > > +
> > > > > > +The virtual concept, queue, is used to manage the requests sent to the
> > > > > > +accelerator. The application send requests to the queue by writing to some
> > > > > > +particular address, while the hardware takes the requests directly from the
> > > > > > +address and send feedback accordingly.
> > > > > > +
> > > > > > +The format of the queue may differ from hardware to hardware. But the
> > > > > > +application need not to make any system call for the communication.
> > > > > > +
> > > > > > +*WarpDrive* tries to create a shared virtual address space for all involved
> > > > > > +accelerators. Within this space, the requests sent to queue can refer to any
> > > > > > +virtual address, which will be valid to the application and all involved
> > > > > > +accelerators.
> > > > > > +
> > > > > > +The name *WarpDrive* is simply a cool and general name meaning the framework
> > > > > > +makes the application faster. It includes general user library, kernel
> > > > > > +management module and drivers for the hardware. In kernel, the management
> > > > > > +module is called *uacce*, meaning "Unified/User-space-access-intended
> > > > > > +Accelerator Framework".
> > > > > > +
> > > > > > +
> > > > > > +How does it work
> > > > > > +================
> > > > > > +
> > > > > > +*WarpDrive* uses *mmap* and *IOMMU* to play the trick.
> > > > > > +
> > > > > > +*Uacce* creates a chrdev for the device registered to it. A "queue" will be
> > > > > > +created when the chrdev is opened. The application access the queue by mmap
> > > > > > +different address region of the queue file.
> > > > > > +
> > > > > > +The following figure demonstrated the queue file address space:
> > > > > > +
> > > > > > +.. image:: wd_q_addr_space.svg
> > > > > > + :alt: WarpDrive Queue Address Space
> > > > > > +
> > > > > > +The first region of the space, device region, is used for the application to
> > > > > > +write request or read answer to or from the hardware.
> > > > > > +
> > > > > > +Normally, there can be three types of device regions mmio and memory regions.
> > > > > > +It is recommended to use common memory for request/answer descriptors and use
> > > > > > +the mmio space for device notification, such as doorbell. But of course, this
> > > > > > +is all up to the interface designer.
> > > > > > +
> > > > > > +There can be two types of device memory regions, kernel-only and user-shared.
> > > > > > +This will be explained in the "kernel APIs" section.
> > > > > > +
> > > > > > +The Static Share Virtual Memory region is necessary only when the device IOMMU
> > > > > > +does not support "Share Virtual Memory". This will be explained after the
> > > > > > +*IOMMU* idea.
> > > > > > +
> > > > > > +
> > > > > > +Architecture
> > > > > > +------------
> > > > > > +
> > > > > > +The full *WarpDrive* architecture is represented in the following class
> > > > > > +diagram:
> > > > > > +
> > > > > > +.. image:: wd-arch.svg
> > > > > > + :alt: WarpDrive Architecture
> > > > > > +
> > > > > > +
> > > > > > +The user API
> > > > > > +------------
> > > > > > +
> > > > > > +We adopt a polling style interface in the user space: ::
> > > > > > +
> > > > > > + int wd_request_queue(struct wd_queue *q);
> > > > > > + void wd_release_queue(struct wd_queue *q);
> > > > > > +
> > > > > > + int wd_send(struct wd_queue *q, void *req);
> > > > > > + int wd_recv(struct wd_queue *q, void **req);
> > > > > > + int wd_recv_sync(struct wd_queue *q, void **req);
> > > > > > + void wd_flush(struct wd_queue *q);
> > > > > > +
> > > > > > +wd_recv_sync() is a wrapper to its non-sync version. It will trapped into
> > > > > > +kernel and waits until the queue become available.
> > > > > > +
> > > > > > +If the queue do not support SVA/SVM. The following helper function
> > > > > > +can be used to create Static Virtual Share Memory: ::
> > > > > > +
> > > > > > + void *wd_preserve_share_memory(struct wd_queue *q, size_t size);
> > > > > > +
> > > > > > +The user API is not mandatory. It is simply a suggestion and hint what the
> > > > > > +kernel interface is supposed to support.
> > > > > > +
> > > > > > +
> > > > > > +The user driver
> > > > > > +---------------
> > > > > > +
> > > > > > +The queue file mmap space will need a user driver to wrap the communication
> > > > > > +protocol. *UACCE* provides some attributes in sysfs for the user driver to
> > > > > > +match the right accelerator accordingly.
> > > > > > +
> > > > > > +The *UACCE* device attribute is under the following directory:
> > > > > > +
> > > > > > +/sys/class/uacce/<dev-name>/params
> > > > > > +
> > > > > > +The following attributes is supported:
> > > > > > +
> > > > > > +nr_queue_remained (ro)
> > > > > > + number of queue remained
> > > > > > +
> > > > > > +api_version (ro)
> > > > > > + a string to identify the queue mmap space format and its version
> > > > > > +
> > > > > > +device_attr (ro)
> > > > > > + attributes of the device, see UACCE_DEV_xxx flag defined in uacce.h
> > > > > > +
> > > > > > +numa_node (ro)
> > > > > > + id of numa node
> > > > > > +
> > > > > > +priority (rw)
> > > > > > + Priority or the device, bigger is higher
> > > > > > +
> > > > > > +(This is not yet implemented in RFC version)
> > > > > > +
> > > > > > +
> > > > > > +The kernel API
> > > > > > +--------------
> > > > > > +
> > > > > > +The *uacce* kernel API is defined in uacce.h. If the hardware support SVM/SVA,
> > > > > > +The driver need only the following API functions: ::
> > > > > > +
> > > > > > + int uacce_register(uacce);
> > > > > > + void uacce_unregister(uacce);
> > > > > > + void uacce_wake_up(q);
> > > > > > +
> > > > > > +*uacce_wake_up* is used to notify the process who epoll() on the queue file.
> > > > > > +
> > > > > > +According to the IOMMU capability, *uacce* categories the devices as follow:
> > > > > > +
> > > > > > +UACCE_DEV_NOIOMMU
> > > > > > + The device has no IOMMU. The user process cannot use VA on the hardware
> > > > > > + This mode is not recommended.
> > > > > > +
> > > > > > +UACCE_DEV_SVA (UACCE_DEV_PASID | UACCE_DEV_FAULT_FROM_DEV)
> > > > > > + The device has IOMMU which can share the same page table with user
> > > > > > + process
> > > > > > +
> > > > > > +UACCE_DEV_SHARE_DOMAIN
> > > > > > + The device has IOMMU which has no multiple page table and device page
> > > > > > + fault support
> > > > > > +
> > > > > > +If the device works in mode other than UACCE_DEV_NOIOMMU, *uacce* will set its
> > > > > > +IOMMU to IOMMU_DOMAIN_UNMANAGED. So the driver must not use any kernel
> > > > > > +DMA API but the following ones from *uacce* instead: ::
> > > > > > +
> > > > > > + uacce_dma_map(q, va, size, prot);
> > > > > > + uacce_dma_unmap(q, va, size, prot);
> > > > > > +
> > > > > > +*uacce_dma_map/unmap* is valid only for UACCE_DEV_SVA device. It creates a
> > > > > > +particular PASID and page table for the kernel in the IOMMU (Not yet
> > > > > > +implemented in the RFC)
> > > > > > +
> > > > > > +For the UACCE_DEV_SHARE_DOMAIN device, uacce_dma_map/unmap is not valid.
> > > > > > +*Uacce* call back start_queue only when the DUS and DKO region is mmapped. The
> > > > > > +accelerator driver must use those dma buffer, via uacce_queue->qfrs[], on
> > > > > > +start_queue call back. The size of the queue file region is defined by
> > > > > > +uacce->ops->qf_pg_start[].
> > > > > > +
> > > > > > +We have to do it this way because most of current IOMMU cannot support the
> > > > > > +kernel and user virtual address at the same time. So we have to let them both
> > > > > > +share the same user virtual address space.
> > > > > > +
> > > > > > +If the device have to support kernel and user at the same time, both kernel
> > > > > > +and the user should use these DMA API. This is not convenient. A better
> > > > > > +solution is to change the future DMA/IOMMU design to let them separate the
> > > > > > +address space between the user and kernel space. But it is not going to be in
> > > > > > +a short time.
> > > > > > +
> > > > > > +
> > > > > > +Multiple processes support
> > > > > > +==========================
> > > > > > +
> > > > > > +In the latest mainline kernel (4.19) when this document is written, the IOMMU
> > > > > > +subsystem do not support multiple process page tables yet.
> > > > > > +
> > > > > > +Most IOMMU hardware implementation support multi-process with the concept
> > > > > > +of PASID. But they may use different name, e.g. it is call sub-stream-id in
> > > > > > +SMMU of ARM. With PASID or similar design, multi page table can be added to
> > > > > > +the IOMMU and referred by its PASID.
> > > > > > +
> > > > > > +*JPB* has a patchset to enable this[1]_. We have tested it with our hardware
> > > > > > +(which is known as *D06*). It works well. *WarpDrive* rely on them to support
> > > > > > +UACCE_DEV_SVA. If it is not enabled, *WarpDrive* can still work. But it
> > > > > > +support only one process, the device will be set to UACCE_DEV_SHARE_DOMAIN
> > > > > > +even it is set to UACCE_DEV_SVA initially.
> > > > > > +
> > > > > > +Static Share Virtual Memory is mainly used by UACCE_DEV_SHARE_DOMAIN device.
> > > > > > +
> > > > > > +
> > > > > > +Legacy Mode Support
> > > > > > +===================
> > > > > > +For the hardware without IOMMU, WarpDrive can still work, the only problem is
> > > > > > +VA cannot be used in the device. The driver should adopt another strategy for
> > > > > > +the shared memory. It is only for testing, and not recommended.
> > > > > > +
> > > > > > +
> > > > > > +The Folk Scenario
> > > > > > +=================
> > > > > > +For a process with allocated queues and shared memory, what happen if it forks
> > > > > > +a child?
> > > > > > +
> > > > > > +The fd of the queue will be duplicated on folk, so the child can send request
> > > > > > +to the same queue as its parent. But the requests which is sent from processes
> > > > > > +except for the one who open the queue will be blocked.
> > > > > > +
> > > > > > +It is recommended to add O_CLOEXEC to the queue file.
> > > > > > +
> > > > > > +The queue mmap space has a VM_DONTCOPY in its VMA. So the child will lost all
> > > > > > +those VMAs.
> > > > > > +
> > > > > > +This is why *WarpDrive* does not adopt the mode used in *VFIO* and *InfiniBand*.
> > > > > > +Both solutions can set any user pointer for hardware sharing. But they cannot
> > > > > > +support fork when the dma is in process. Or the "Copy-On-Write" procedure will
> > > > > > +make the parent process lost its physical pages.
> > > > > > +
> > > > > > +
> > > > > > +The Sample Code
> > > > > > +===============
> > > > > > +There is a sample user land implementation with a simple driver for Hisilicon
> > > > > > +Hi1620 ZIP Accelerator.
> > > > > > +
> > > > > > +To test, do the following in samples/warpdrive (for the case of PC host): ::
> > > > > > + ./autogen.sh
> > > > > > + ./conf.sh # or simply ./configure if you build on target system
> > > > > > + make
> > > > > > +
> > > > > > +Then you can get test_hisi_zip in the test subdirectory. Copy it to the target
> > > > > > +system and make sure the hisi_zip driver is enabled (the major and minor of
> > > > > > +the uacce chrdev can be gotten from the dmesg or sysfs), and run: ::
> > > > > > + mknod /dev/ua1 c <major> <minior>
> > > > > > + test/test_hisi_zip -z < data > data.zip
> > > > > > + test/test_hisi_zip -g < data > data.gzip
> > > > > > +
> > > > > > +
> > > > > > +References
> > > > > > +==========
> > > > > > +.. [1] https://patchwork.kernel.org/patch/10394851/
> > > > > > +
> > > > > > +.. vim: tw=78
> > [...]
> > > > > > --
> > > > > > 2.17.1
> > > > > >

2018-11-20 13:32:20

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote:
> Date: Mon, 19 Nov 2018 11:49:54 -0700
> From: Jason Gunthorpe <[email protected]>
> To: Kenneth Lee <[email protected]>
> CC: Leon Romanovsky <[email protected]>, Kenneth Lee <[email protected]>,
> Tim Sell <[email protected]>, [email protected], Alexander
> Shishkin <[email protected]>, Zaibo Xu
> <[email protected]>, [email protected], [email protected],
> [email protected], Christoph Lameter <[email protected]>, Hao Fang
> <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> list <[email protected]>, Zhou Wang <[email protected]>,
> Doug Ledford <[email protected]>, Uwe Kleine-K?nig
> <[email protected]>, David Kershner
> <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> Pitchen <[email protected]>, Sagar Dharia
> <[email protected]>, Jens Axboe <[email protected]>,
> [email protected], linux-netdev <[email protected]>, Randy Dunlap
> <[email protected]>, [email protected], Vinod Koul
> <[email protected]>, [email protected], Philippe Ombredanne
> <[email protected]>, Sanyog Kale <[email protected]>, "David S.
> Miller" <[email protected]>, [email protected]
> Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> User-Agent: Mutt/1.9.4 (2018-02-28)
> Message-ID: <[email protected]>
>
> On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
>
> > If the hardware cannot share page table with the CPU, we then need to have
> > some way to change the device page table. This is what happen in ODP. It
> > invalidates the page table in device upon mmu_notifier call back. But this cannot
> > solve the COW problem: if the user process A share a page P with device, and A
> > forks a new process B, and it continue to write to the page. By COW, the
> > process B will keep the page P, while A will get a new page P'. But you have
> > no way to let the device know it should use P' rather than P.
>
> Is this true? I thought mmu_notifiers covered all these cases.
>
> The mm_notifier for A should fire if B causes the physical address of
> A's pages to change via COW.
>
> And this causes the device page tables to re-synchronize.

I don't see such code. The current do_cow_fault() implemenation has nothing to
do with mm_notifer.

>
> > In WarpDrive/uacce, we make this simple. If you support IOMMU and it support
> > SVM/SVA. Everything will be fine just like ODP implicit mode. And you don't need
> > to write any code for that. Because it has been done by IOMMU framework. If it
>
> Looks like the IOMMU code uses mmu_notifier, so it is identical to
> IB's ODP. The only difference is that IB tends to have the IOMMU page
> table in the device, not in the CPU.
>
> The only case I know if that is different is the new-fangled CAPI
> stuff where the IOMMU can directly use the CPU's page table and the
> IOMMU page table (in device or CPU) is eliminated.
>

Yes. We are not focusing on the current implementation. As mentioned in the
cover letter. We are expecting Jean Philips' SVA patch:
git://linux-arm.org/linux-jpb.

> Anyhow, I don't think a single instance of hardware should justify an
> entire new subsystem. Subsystems are hard to make and without multiple
> hardware examples there is no way to expect that it would cover any
> future use cases.

Yes. That's our first expectation. We can keep it with our driver. But because
there is no user driver support for any accelerator in mainline kernel. Even the
well known QuickAssit has to be maintained out of tree. So we try to see if
people is interested in working together to solve the problem.

>
> If all your driver needs is to mmap some PCI bar space, route
> interrupts and do DMA mapping then mediated VFIO is probably a good
> choice.

Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion and
try not to add complexity to the mm subsystem.

>
> If it needs to do a bunch of other stuff, not related to PCI bar
> space, interrupts and DMA mapping (ie special code for compression,
> crypto, AI, whatever) then you should probably do what Jerome said and
> make a drivers/char/hisillicon_foo_bar.c that exposes just what your
> hardware does.

Yes. If no other accelerator driver writer is interested. That is the
expectation:)

But we really like to have a public solution here. Consider this scenario:

You create some connections (queues) to NIC, RSA, and AI engine. Then you got
data direct from the NIC and pass the pointer to RSA engine for decryption. The
CPU then finish some data taking or operation and then pass through to the AI
engine for CNN calculation....This will need a place to maintain the same
address space by some means.

It is not complex, but it is helpful.

>
> If you have networking involved in here then consider RDMA,
> particularly if this functionality is already part of the same
> hardware that the hns infiniband driver is servicing.
>
> 'computational MRs' are a reasonable approach to a side-car offload of
> already existing RDMA support.

OK. Thanks. I will spend some time on it. But personally, I really don't like
RDMA's complexity. I cannot even try one single function without a...some
expensive hardwares and complexity connection in the lab. This is not like a
open source way.

>
> Jason

2018-11-12 17:51:46

by Kenneth Lee

[permalink] [raw]
Subject: [RFCv3 PATCH 2/6] uacce: add uacce module

From: Kenneth Lee <[email protected]>

Uacce is the kernel component to support WarpDrive accelerator
framework. It provides register/unregister interface for device drivers
to expose their hardware resource to the user space. The resource is
taken as "queue" in WarpDrive.

Uacce create a chrdev for every registration, the queue is allocated to
the process when the chrdev is opened. Then the process can access the
hardware resource by interact with the queue file. By mmap the queue
file space to user space, the process can directly put requests to the
hardware without syscall to the kernel space.

Uacce also manages unify addresses between the hardware and user space
of the process. So they can share the same virtual address in the
communication.

Please see Documentation/warpdrive/warpdrive.rst for detail.

Signed-off-by: Kenneth Lee <[email protected]>
Signed-off-by: Zaibo Xu <[email protected]>
Signed-off-by: Zhou Wang <[email protected]>
---
drivers/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/uacce/Kconfig | 11 +
drivers/uacce/Makefile | 2 +
drivers/uacce/uacce.c | 902 +++++++++++++++++++++++++++++++++++++
include/linux/uacce.h | 117 +++++
include/uapi/linux/uacce.h | 33 ++
7 files changed, 1068 insertions(+)
create mode 100644 drivers/uacce/Kconfig
create mode 100644 drivers/uacce/Makefile
create mode 100644 drivers/uacce/uacce.c
create mode 100644 include/linux/uacce.h
create mode 100644 include/uapi/linux/uacce.h

diff --git a/drivers/Kconfig b/drivers/Kconfig
index ab4d43923c4d..b8782be0e7e5 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -219,4 +219,6 @@ source "drivers/siox/Kconfig"

source "drivers/slimbus/Kconfig"

+source "drivers/uacce/Kconfig"
+
endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 578f469f72fb..9416c49b7501 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -186,3 +186,4 @@ obj-$(CONFIG_MULTIPLEXER) += mux/
obj-$(CONFIG_UNISYS_VISORBUS) += visorbus/
obj-$(CONFIG_SIOX) += siox/
obj-$(CONFIG_GNSS) += gnss/
+obj-y += uacce/
diff --git a/drivers/uacce/Kconfig b/drivers/uacce/Kconfig
new file mode 100644
index 000000000000..e0e6462f6a42
--- /dev/null
+++ b/drivers/uacce/Kconfig
@@ -0,0 +1,11 @@
+menuconfig UACCE
+ tristate "Accelerator Framework for User Land"
+ depends on IOMMU_API
+ select ANON_INODES
+ help
+ UACCE provides interface for the user process to access the hardware
+ without interaction with the kernel space in data path.
+
+ See Documentation/warpdrive/warpdrive.rst for more details.
+
+ If you don't know what to do here, say N.
diff --git a/drivers/uacce/Makefile b/drivers/uacce/Makefile
new file mode 100644
index 000000000000..5b4374e8b5f2
--- /dev/null
+++ b/drivers/uacce/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-or-later
+obj-$(CONFIG_UACCE) += uacce.o
diff --git a/drivers/uacce/uacce.c b/drivers/uacce/uacce.c
new file mode 100644
index 000000000000..07e3b9887f28
--- /dev/null
+++ b/drivers/uacce/uacce.c
@@ -0,0 +1,902 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#include <linux/compat.h>
+#include <linux/file.h>
+#include <linux/idr.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/poll.h>
+#include <linux/sched/signal.h>
+#include <linux/slab.h>
+#include <linux/uacce.h>
+#include <linux/wait.h>
+
+static struct class *uacce_class;
+static DEFINE_IDR(uacce_idr);
+static dev_t uacce_devt;
+static DEFINE_MUTEX(uacce_mutex); /* mutex to protect uacce */
+static DEFINE_RWLOCK(uacce_lock); /* lock to protect all queues management */
+
+static const struct file_operations uacce_fops;
+
+static const char *const qfrt_str[] = {
+ "dko",
+ "dus",
+ "ss",
+ "mmio",
+ "invalid"
+};
+
+const char *uacce_qfrt_str(struct uacce_qfile_region *qfr)
+{
+ enum uacce_qfrt type = qfr->type;
+
+ if (type >= UACCE_QFRT_INVALID)
+ type = UACCE_QFRT_INVALID;
+
+ return qfrt_str[type];
+}
+EXPORT_SYMBOL_GPL(uacce_qfrt_str);
+
+/**
+ * uacce_wake_up - Wake up the process who is waiting this queue
+ * @q the accelerator queue to wake up
+ */
+void uacce_wake_up(struct uacce_queue *q)
+{
+ dev_dbg(q->uacce->dev, "wake up\n");
+ wake_up_interruptible(&q->wait);
+}
+EXPORT_SYMBOL_GPL(uacce_wake_up);
+
+static void uacce_cls_release(struct device *dev) { }
+
+static inline int uacce_iommu_map_qfr(struct uacce_queue *q,
+ struct uacce_qfile_region *qfr)
+{
+ struct device *dev = q->uacce->dev;
+ struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+ int i, j, ret;
+
+ if (!domain)
+ return -ENODEV;
+
+ for (i = 0; i < qfr->nr_pages; i++) {
+ get_page(qfr->pages[i]);
+ ret = iommu_map(domain, qfr->iova + i * PAGE_SIZE,
+ page_to_phys(qfr->pages[i]),
+ PAGE_SIZE, qfr->prot | q->uacce->prot);
+ if (ret) {
+ dev_err(dev, "iommu_map page %i fail %d\n", i, ret);
+ goto err_with_map_pages;
+ }
+ }
+
+ return 0;
+
+err_with_map_pages:
+ for (j = i-1; j >= 0; j--) {
+ iommu_unmap(domain, qfr->iova + j * PAGE_SIZE, PAGE_SIZE);
+ put_page(qfr->pages[j]);
+ }
+ return ret;
+}
+
+static inline void uacce_iommu_unmap_qfr(struct uacce_queue *q,
+ struct uacce_qfile_region *qfr)
+{
+ struct device *dev = q->uacce->dev;
+ struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+ int i;
+
+ if (!domain || !qfr)
+ return;
+
+ for (i = qfr->nr_pages-1; i >= 0; i--) {
+ iommu_unmap(domain, qfr->iova + i * PAGE_SIZE, PAGE_SIZE);
+ put_page(qfr->pages[i]);
+ }
+}
+
+static int uacce_queue_map_qfr(struct uacce_queue *q,
+ struct uacce_qfile_region *qfr)
+{
+ if (!(qfr->flags & UACCE_QFRF_MAP))
+ return 0;
+
+ dev_dbg(q->uacce->dev, "queue map %s qfr(npage=%d, iova=%lx)\n",
+ uacce_qfrt_str(qfr), qfr->nr_pages, qfr->iova);
+
+ if (q->uacce->ops->flags & UACCE_DEV_NOIOMMU)
+ return q->uacce->ops->map(q, qfr);
+
+ return uacce_iommu_map_qfr(q, qfr);
+}
+
+static void uacce_queue_unmap_qfr(struct uacce_queue *q,
+ struct uacce_qfile_region *qfr)
+{
+ if (!(qfr->flags & UACCE_QFRF_MAP))
+ return;
+
+ if (q->uacce->ops->flags & UACCE_DEV_NOIOMMU)
+ q->uacce->ops->unmap(q, qfr);
+ else
+ uacce_iommu_unmap_qfr(q, qfr);
+}
+
+static vm_fault_t uacce_shm_vm_fault(struct vm_fault *vmf)
+{
+ struct vm_area_struct *vma = vmf->vma;
+ struct uacce_qfile_region *qfr;
+ pgoff_t page_offset = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
+ int ret;
+
+ read_lock_irq(&uacce_lock);
+
+ qfr = vma->vm_private_data;
+ if (!qfr) {
+ pr_info("this page is not valid to user space\n");
+ ret = VM_FAULT_SIGBUS;
+ goto out;
+ }
+
+ pr_debug("uacce: fault on %s qfr page %ld/%d\n", uacce_qfrt_str(qfr),
+ page_offset, qfr->nr_pages);
+
+ if (page_offset >= qfr->nr_pages) {
+ ret = VM_FAULT_SIGBUS;
+ goto out;
+ }
+
+ get_page(qfr->pages[page_offset]);
+ vmf->page = qfr->pages[page_offset];
+ ret = 0;
+
+out:
+ read_unlock_irq(&uacce_lock);
+ return ret;
+}
+
+static const struct vm_operations_struct uacce_shm_vm_ops = {
+ .fault = uacce_shm_vm_fault,
+};
+
+static struct uacce_qfile_region *uacce_create_region(struct uacce_queue *q,
+ struct vm_area_struct *vma, enum uacce_qfrt type, int flags)
+{
+ struct uacce_qfile_region *qfr;
+ int i, j, ret = -ENOMEM;
+
+ qfr = kzalloc(sizeof(*qfr), GFP_KERNEL | GFP_ATOMIC);
+ if (!qfr)
+ return ERR_PTR(-ENOMEM);
+
+ qfr->type = type;
+ qfr->flags = flags;
+ qfr->iova = vma->vm_start;
+ qfr->nr_pages = vma_pages(vma);
+
+ if (vma->vm_flags & VM_READ)
+ qfr->prot |= IOMMU_READ;
+
+ if (vma->vm_flags & VM_WRITE)
+ qfr->prot |= IOMMU_WRITE;
+
+ qfr->pages = kcalloc(qfr->nr_pages, sizeof(*qfr->pages),
+ GFP_KERNEL | GFP_ATOMIC);
+ if (!qfr->pages)
+ goto err_with_qfr;
+
+ for (i = 0; i < qfr->nr_pages; i++) {
+ qfr->pages[i] = alloc_page(GFP_KERNEL | GFP_ATOMIC |
+ __GFP_ZERO);
+ if (!qfr->pages[i])
+ goto err_with_pages;
+ }
+
+ ret = uacce_queue_map_qfr(q, qfr);
+ if (ret)
+ goto err_with_pages;
+
+ if (flags & UACCE_QFRF_KMAP) {
+ qfr->kaddr = vmap(qfr->pages, qfr->nr_pages, VM_MAP,
+ PAGE_KERNEL);
+ if (!qfr->kaddr) {
+ ret = -ENOMEM;
+ goto err_with_q_map;
+ }
+
+ dev_dbg(q->uacce->dev, "kmap %s qfr to %p\n",
+ uacce_qfrt_str(qfr), qfr->kaddr);
+ }
+
+ if (flags & UACCE_QFRF_MMAP) {
+ vma->vm_private_data = qfr;
+ vma->vm_ops = &uacce_shm_vm_ops;
+ }
+
+ return qfr;
+
+err_with_q_map:
+ uacce_queue_unmap_qfr(q, qfr);
+err_with_pages:
+ for (j = i-1; j >= 0; j--)
+ put_page(qfr->pages[j]);
+
+ kfree(qfr->pages);
+err_with_qfr:
+ kfree(qfr);
+
+ return ERR_PTR(ret);
+}
+
+static void uacce_destroy_region(struct uacce_qfile_region *qfr)
+{
+ int i;
+
+ if (qfr->pages) {
+ for (i = 0; i < qfr->nr_pages; i++)
+ put_page(qfr->pages[i]);
+
+ if (qfr->flags & UACCE_QFRF_KMAP)
+ vunmap(qfr->kaddr);
+
+ kfree(qfr->pages);
+ }
+ kfree(qfr);
+}
+
+static long uacce_cmd_share_qfr(struct uacce_queue *tgt, int fd)
+{
+ struct file *filep = fget(fd);
+ struct uacce_queue *src;
+ int ret;
+
+ if (!filep || filep->f_op != &uacce_fops)
+ return -EINVAL;
+
+ src = (struct uacce_queue *)filep->private_data;
+ if (!src)
+ return -EINVAL;
+
+ /* no ssva is needed if the dev can do fault-from-dev */
+ if (tgt->uacce->ops->flags | UACCE_DEV_FAULT_FROM_DEV)
+ return -EINVAL;
+
+ write_lock(&uacce_lock);
+ if (!src->qfrs[UACCE_QFRT_SS] || tgt->qfrs[UACCE_QFRT_SS]) {
+ ret = -EINVAL;
+ goto out_with_lock;
+ }
+
+ ret = uacce_queue_map_qfr(tgt, src->qfrs[UACCE_QFRT_SS]);
+ if (ret)
+ goto out_with_lock;
+
+ tgt->qfrs[UACCE_QFRT_SS] = src->qfrs[UACCE_QFRT_SS];
+ list_add(&tgt->list, &src->qfrs[UACCE_QFRT_SS]->qs);
+
+out_with_lock:
+ write_unlock(&uacce_lock);
+ return ret;
+}
+
+static long uacce_fops_unl_ioctl(struct file *filep,
+ unsigned int cmd, unsigned long arg)
+{
+ struct uacce_queue *q = (struct uacce_queue *)filep->private_data;
+ struct uacce *uacce = q->uacce;
+
+ switch (cmd) {
+ case UACCE_CMD_SHARE_SVAS:
+ return uacce_cmd_share_qfr(q, arg);
+
+ default:
+ if (uacce->ops->ioctl)
+ return uacce->ops->ioctl(q, cmd, arg);
+
+ dev_err(uacce->dev, "ioctl cmd (%d) is not supported!\n", cmd);
+ return -EINVAL;
+ }
+}
+
+#ifdef CONFIG_COMPAT
+static long uacce_fops_compat_ioctl(struct file *filep,
+ unsigned int cmd, unsigned long arg)
+{
+ arg = (unsigned long)compat_ptr(arg);
+ return uacce_fops_unl_ioctl(filep, cmd, arg);
+}
+#endif
+
+static int uacce_dev_check(struct uacce *uacce)
+{
+ if (uacce->ops->flags & UACCE_DEV_NOIOMMU)
+ return 0;
+
+ /*
+ * The device can be opened once if it dose not support multiple page
+ * table. The better way to check this is count it per iommu_domain,
+ * this is just a temporary solution
+ */
+ if (!(uacce->ops->flags & UACCE_DEV_PASID))
+ if (atomic_cmpxchg(&uacce->state,
+ UACCE_ST_INIT, UACCE_ST_OPENNED))
+ return -EBUSY;
+
+ return 0;
+}
+
+static int uacce_start_queue(struct uacce_queue *q)
+{
+ int ret;
+
+ ret = q->uacce->ops->start_queue(q);
+ if (ret)
+ return ret;
+
+ dev_dbg(q->uacce->dev, "queue started\n");
+ atomic_set(&q->uacce->state, UACCE_ST_STARTED);
+ return 0;
+}
+
+static int uacce_fops_open(struct inode *inode, struct file *filep)
+{
+ struct uacce_queue *q;
+ struct uacce *uacce;
+ int ret;
+ int pasid = 0;
+
+ uacce = idr_find(&uacce_idr, iminor(inode));
+ if (!uacce)
+ return -ENODEV;
+
+ if (!uacce->ops->get_queue)
+ return -EINVAL;
+
+ ret = uacce_dev_check(uacce);
+
+#ifdef CONFIG_IOMMU_SVA
+ if (uacce->ops->flags & UACCE_DEV_PASID)
+ ret = __iommu_sva_bind_device(uacce->dev, current->mm, &pasid,
+ IOMMU_SVA_FEAT_IOPF, NULL);
+#endif
+
+ if (ret)
+ return ret;
+
+ ret = uacce->ops->get_queue(uacce, pasid, &q);
+ if (ret < 0)
+ return ret;
+
+ q->uacce = uacce;
+ q->mm = current->mm;
+ init_waitqueue_head(&q->wait);
+ filep->private_data = q;
+
+ /* if DKO or DSU is set, the q is started when they are ready */
+ if (uacce->ops->qf_pg_start[UACCE_QFRT_DKO] == UACCE_QFR_NA &&
+ uacce->ops->qf_pg_start[UACCE_QFRT_DUS] == UACCE_QFR_NA) {
+ ret = uacce_start_queue(q);
+ if (ret)
+ goto err_with_queue;
+ }
+
+ __module_get(uacce->ops->owner);
+
+ return 0;
+
+err_with_queue:
+ if (uacce->ops->put_queue)
+ uacce->ops->put_queue(q);
+ atomic_set(&uacce->state, UACCE_ST_INIT);
+ return ret;
+}
+
+static int uacce_fops_release(struct inode *inode, struct file *filep)
+{
+ struct uacce_queue *q = (struct uacce_queue *)filep->private_data;
+ struct uacce *uacce;
+ int i;
+ bool is_to_free_region;
+ int free_pages = 0;
+
+ uacce = q->uacce;
+
+ if (atomic_read(&uacce->state) == UACCE_ST_STARTED &&
+ uacce->ops->stop_queue)
+ uacce->ops->stop_queue(q);
+
+ write_lock_irq(&uacce_lock);
+
+ for (i = 0; i < UACCE_QFRT_MAX; i++) {
+ is_to_free_region = false;
+ if (q->qfrs[i]) {
+ uacce_queue_unmap_qfr(q, q->qfrs[i]);
+ if (i == UACCE_QFRT_SS) {
+ list_del(&q->list);
+ if (list_empty(&q->qfrs[i]->qs))
+ is_to_free_region = true;
+ } else
+ is_to_free_region = true;
+ }
+
+ if (is_to_free_region) {
+ free_pages += q->qfrs[i]->nr_pages;
+ uacce_destroy_region(q->qfrs[i]);
+ }
+
+ q->qfrs[i] = NULL;
+ }
+
+ write_unlock_irq(&uacce_lock);
+
+ down_write(&q->mm->mmap_sem);
+ q->mm->data_vm -= free_pages;
+ up_write(&q->mm->mmap_sem);
+
+#ifdef CONFIG_IOMMU_SVA
+ if (uacce->ops->flags & UACCE_DEV_SVA)
+ iommu_sva_unbind_device(uacce->dev, q->pasid);
+#endif
+
+ if (uacce->ops->put_queue)
+ uacce->ops->put_queue(q);
+
+ module_put(uacce->ops->owner);
+ atomic_set(&uacce->state, UACCE_ST_INIT);
+
+ return 0;
+}
+
+static enum uacce_qfrt uacce_get_region_type(struct uacce *uacce,
+ struct vm_area_struct *vma)
+{
+ enum uacce_qfrt type = UACCE_QFRT_MAX;
+ int i;
+ size_t next_size = UACCE_QFR_NA;
+
+ for (i = UACCE_QFRT_MAX - 1; i >= 0; i--) {
+ if (vma->vm_pgoff >= uacce->ops->qf_pg_start[i]) {
+ type = i;
+ break;
+ }
+ }
+
+ switch (type) {
+ case UACCE_QFRT_MMIO:
+ if (!uacce->ops->mmap) {
+ dev_err(uacce->dev, "no driver mmap!\n");
+ return UACCE_QFRT_INVALID;
+ }
+ break;
+
+ case UACCE_QFRT_DKO:
+ if (uacce->ops->flags & UACCE_DEV_PASID)
+ return UACCE_QFRT_INVALID;
+ break;
+
+ case UACCE_QFRT_DUS:
+ if (uacce->ops->flags & UACCE_DEV_FAULT_FROM_DEV)
+ return UACCE_QFRT_INVALID;
+ break;
+
+ case UACCE_QFRT_SS:
+ if (uacce->ops->flags & UACCE_DEV_FAULT_FROM_DEV)
+ return UACCE_QFRT_INVALID;
+ break;
+
+ default:
+ dev_err(uacce->dev, "uacce bug (%d)!\n", type);
+ break;
+ }
+
+ if (type < UACCE_QFRT_SS) {
+ for (i = type + 1; i < UACCE_QFRT_MAX; i++)
+ if (uacce->ops->qf_pg_start[i] != UACCE_QFR_NA) {
+ next_size = uacce->ops->qf_pg_start[i];
+ break;
+ }
+
+ if (next_size == UACCE_QFR_NA) {
+ dev_err(uacce->dev, "uacce config error. \
+ make sure setting SS offset properly\n");
+ return UACCE_QFRT_INVALID;
+ }
+
+ if (vma_pages(vma) !=
+ next_size - uacce->ops->qf_pg_start[type]) {
+ dev_err(uacce->dev,
+ "invalid mmap size (%ld page) for region %s.\n",
+ vma_pages(vma), qfrt_str[type]);
+ return UACCE_QFRT_INVALID;
+ }
+ }
+
+ return type;
+}
+
+static int uacce_fops_mmap(struct file *filep, struct vm_area_struct *vma)
+{
+ struct uacce_queue *q = (struct uacce_queue *)filep->private_data;
+ struct uacce *uacce = q->uacce;
+ enum uacce_qfrt type = uacce_get_region_type(uacce, vma);
+ struct uacce_qfile_region *qfr;
+ int flags, ret;
+ bool to_start = false;
+
+ dev_dbg(uacce->dev, "mmap q file(t=%s, off=%lx, start=%lx, end=%lx)\n",
+ qfrt_str[type], vma->vm_pgoff, vma->vm_start, vma->vm_end);
+
+ if (type == UACCE_QFRT_INVALID)
+ return -EINVAL;
+
+ vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND;
+
+ write_lock_irq(&uacce_lock);
+
+ if (q->mm->data_vm + vma_pages(vma) >
+ rlimit(RLIMIT_DATA) >> PAGE_SHIFT) {
+ ret = -ENOMEM;
+ goto out_with_lock;
+ }
+
+ if (type == UACCE_QFRT_MMIO) {
+ ret = uacce->ops->mmap(q, vma);
+ goto out_with_lock;
+ }
+
+ if (q->qfrs[type]) {
+ ret = -EBUSY;
+ goto out_with_lock;
+ }
+
+ switch (type) {
+ case UACCE_QFRT_SS:
+ if ((q->uacce->ops->flags & UACCE_DEV_FAULT_FROM_DEV) ||
+ (atomic_read(&uacce->state) != UACCE_ST_STARTED)) {
+ ret = -EINVAL;
+ goto out_with_lock;
+ }
+
+ flags = UACCE_QFRF_MAP | UACCE_QFRF_MMAP;
+ break;
+
+ case UACCE_QFRT_DKO:
+ flags = UACCE_QFRF_MAP | UACCE_QFRF_KMAP;
+ break;
+
+ case UACCE_QFRT_DUS:
+ flags = UACCE_QFRF_MAP | UACCE_QFRF_MMAP;
+ if (q->uacce->ops->flags & UACCE_DEV_KMAP_DUS)
+ flags |= UACCE_QFRF_KMAP;
+ break;
+
+ default:
+ dev_err(uacce->dev, "bug\n");
+ break;
+ }
+
+ qfr = q->qfrs[type] = uacce_create_region(q, vma, type, flags);
+ if (IS_ERR(qfr)) {
+ ret = PTR_ERR(qfr);
+ goto out_with_lock;
+ }
+
+ switch (type) {
+ case UACCE_QFRT_SS:
+ INIT_LIST_HEAD(&qfr->qs);
+ list_add(&q->list, &q->qfrs[type]->qs);
+ break;
+
+ case UACCE_QFRT_DKO:
+ case UACCE_QFRT_DUS:
+ if (q->uacce->ops->qf_pg_start[UACCE_QFRT_DUS] == UACCE_QFR_NA
+ &&
+ q->uacce->ops->qf_pg_start[UACCE_QFRT_DKO] == UACCE_QFR_NA)
+ break;
+
+ if ((q->uacce->ops->qf_pg_start[UACCE_QFRT_DUS] == UACCE_QFR_NA
+ || q->qfrs[UACCE_QFRT_DUS]) &&
+ (q->uacce->ops->qf_pg_start[UACCE_QFRT_DKO] == UACCE_QFR_NA
+ || q->qfrs[UACCE_QFRT_DKO]))
+ to_start = true;
+
+ break;
+
+ default:
+ break;
+ }
+
+ write_unlock_irq(&uacce_lock);
+
+ if (to_start) {
+ ret = uacce_start_queue(q);
+ if (ret) {
+ write_lock_irq(&uacce_lock);
+ goto err_with_region;
+ }
+ }
+
+ q->mm->data_vm += qfr->nr_pages;
+ return 0;
+
+err_with_region:
+ uacce_destroy_region(q->qfrs[type]);
+ q->qfrs[type] = NULL;
+out_with_lock:
+ write_unlock_irq(&uacce_lock);
+ return ret;
+}
+
+static __poll_t uacce_fops_poll(struct file *file, poll_table *wait)
+{
+ struct uacce_queue *q = (struct uacce_queue *)file->private_data;
+ struct uacce *uacce = q->uacce;
+
+ poll_wait(file, &q->wait, wait);
+ if (uacce->ops->is_q_updated && uacce->ops->is_q_updated(q))
+ return EPOLLIN | EPOLLRDNORM;
+ else
+ return 0;
+}
+
+static const struct file_operations uacce_fops = {
+ .owner = THIS_MODULE,
+ .open = uacce_fops_open,
+ .release = uacce_fops_release,
+ .unlocked_ioctl = uacce_fops_unl_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = uacce_fops_compat_ioctl,
+#endif
+ .mmap = uacce_fops_mmap,
+ .poll = uacce_fops_poll,
+};
+
+static int uacce_create_chrdev(struct uacce *uacce)
+{
+ int ret;
+
+ ret = idr_alloc(&uacce_idr, uacce, 0, 0, GFP_KERNEL);
+ if (ret < 0)
+ return ret;
+
+ uacce->dev_id = ret;
+ uacce->cdev = cdev_alloc();
+ if (!uacce->cdev) {
+ ret = -ENOMEM;
+ goto err_with_idr;
+ }
+
+ uacce->cdev->ops = &uacce_fops;
+ uacce->cdev->owner = uacce->ops->owner;
+ ret = cdev_add(uacce->cdev, MKDEV(MAJOR(uacce_devt), uacce->dev_id), 1);
+ if (ret)
+ goto err_with_cdev;
+
+ dev_dbg(uacce->dev, "create uacce minior=%d\n", uacce->dev_id);
+ return 0;
+
+err_with_cdev:
+ cdev_del(uacce->cdev);
+err_with_idr:
+ idr_remove(&uacce_idr, uacce->dev_id);
+ return ret;
+}
+
+static void uacce_destroy_chrdev(struct uacce *uacce)
+{
+ cdev_del(uacce->cdev);
+ idr_remove(&uacce_idr, uacce->dev_id);
+}
+
+static int uacce_default_map(struct uacce_queue *q,
+ struct uacce_qfile_region *qfr)
+{
+ dev_dbg(q->uacce->dev, "fake map %s qfr(npage=%d, iova=%lx)\n",
+ uacce_qfrt_str(qfr), qfr->nr_pages, qfr->iova);
+ return -ENODEV;
+}
+
+static int uacce_default_start_queue(struct uacce_queue *q)
+{
+ dev_dbg(q->uacce->dev, "fake start queue");
+ return 0;
+}
+
+static void uacce_default_unmap(struct uacce_queue *q,
+ struct uacce_qfile_region *qfr)
+{
+ dev_dbg(q->uacce->dev, "fake unmap %s qfr(npage=%d, iova=%lx)\n",
+ uacce_qfrt_str(qfr), qfr->nr_pages, qfr->iova);
+}
+
+static int uacce_dev_match(struct device *dev, void *data)
+{
+ if (dev->parent == data)
+ return -EBUSY;
+
+ return 0;
+}
+
+static int uacce_set_iommu_domain(struct uacce *uacce)
+{
+ struct iommu_domain *domain;
+ int ret;
+
+ if (uacce->ops->flags & UACCE_DEV_NOIOMMU)
+ return 0;
+
+ /*
+ * We don't support multiple register for the same dev in RFC version ,
+ * will add it in formal version
+ */
+ ret = class_for_each_device(uacce_class, NULL, uacce->dev,
+ uacce_dev_match);
+ if (ret)
+ return ret;
+
+ /* allocate and attach a unmanged domain */
+ domain = iommu_domain_alloc(uacce->dev->bus);
+ if (!domain)
+ return -ENODEV;
+
+ ret = iommu_attach_device(domain, uacce->dev);
+ if (ret)
+ goto err_with_domain;
+
+ if (iommu_capable(uacce->dev->bus, IOMMU_CAP_CACHE_COHERENCY)) {
+ uacce->prot |= IOMMU_CACHE;
+ dev_dbg(uacce->dev, "Enable uacce with c-coherent capa\n");
+ } else {
+ dev_dbg(uacce->dev, "Enable uacce without c-coherent cap\n");
+ }
+
+ return 0;
+
+err_with_domain:
+ iommu_domain_free(domain);
+ return ret;
+}
+
+void uacce_unset_iommu_domain(struct uacce *uacce)
+{
+ struct iommu_domain *domain;
+
+ domain = iommu_get_domain_for_dev(uacce->dev);
+ if (domain) {
+ iommu_detach_device(domain, uacce->dev);
+ iommu_domain_free(domain);
+ } else
+ dev_err(uacce->dev, "bug: no domain attached to device\n");
+}
+
+/**
+ * uacce_register - register an accelerator
+ * @uacce: the accelerator structure
+ */
+int uacce_register(struct uacce *uacce)
+{
+ int ret;
+
+ if (!uacce->dev)
+ return -ENODEV;
+
+ /* if dev support fault-from-dev, it should support pasid */
+ if ((uacce->ops->flags & UACCE_DEV_FAULT_FROM_DEV) &&
+ !(uacce->ops->flags & UACCE_DEV_PASID)) {
+ dev_warn(uacce->dev, "SVM/SAV device should support PASID\n");
+ return -EINVAL;
+ }
+
+ if (!uacce->ops->map)
+ uacce->ops->map = uacce_default_map;
+
+ if (!uacce->ops->unmap)
+ uacce->ops->unmap = uacce_default_unmap;
+
+ if (!uacce->ops->start_queue)
+ uacce->ops->start_queue = uacce_default_start_queue;
+
+ ret = uacce_set_iommu_domain(uacce);
+ if (ret)
+ return ret;
+
+ mutex_lock(&uacce_mutex);
+
+ ret = uacce_create_chrdev(uacce);
+ if (ret)
+ goto err_with_lock;
+
+ uacce->cls_dev.parent = uacce->dev;
+ uacce->cls_dev.class = uacce_class;
+ uacce->cls_dev.release = uacce_cls_release;
+ dev_set_name(&uacce->cls_dev, "%s", dev_name(uacce->dev));
+ ret = device_register(&uacce->cls_dev);
+ if (ret)
+ goto err_with_chrdev;
+
+#ifdef CONFIG_IOMMU_SVA
+ ret = iommu_sva_init_device(uacce->dev, IOMMU_SVA_FEAT_IOPF, 0, 0,
+ NULL);
+ if (ret) {
+ device_unregister(&uacce->cls_dev);
+ goto err_with_chrdev;
+ }
+#else
+ if (uacce->ops->flags & UACCE_DEV_PASID)
+ uacce->ops->flags &=
+ ~(UACCE_DEV_FAULT_FROM_DEV | UACCE_DEV_PASID);
+#endif
+
+ atomic_set(&uacce->state, UACCE_ST_INIT);
+ mutex_unlock(&uacce_mutex);
+ return 0;
+
+err_with_chrdev:
+ uacce_destroy_chrdev(uacce);
+err_with_lock:
+ mutex_unlock(&uacce_mutex);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(uacce_register);
+
+/**
+ * uacce_unregister - unregisters a uacce
+ * @uacce: the accelerator to unregister
+ *
+ * Unregister an accelerator that wat previously successully registered with
+ * uacce_register().
+ */
+void uacce_unregister(struct uacce *uacce)
+{
+ mutex_lock(&uacce_mutex);
+
+#ifdef CONFIG_IOMMU_SVA
+ iommu_sva_shutdown_device(uacce->dev);
+#endif
+ device_unregister(&uacce->cls_dev);
+ uacce_destroy_chrdev(uacce);
+ uacce_unset_iommu_domain(uacce);
+
+ mutex_unlock(&uacce_mutex);
+}
+EXPORT_SYMBOL_GPL(uacce_unregister);
+
+static int __init uacce_init(void)
+{
+ int ret;
+
+ uacce_class = class_create(THIS_MODULE, UACCE_CLASS_NAME);
+ if (IS_ERR(uacce_class)) {
+ ret = PTR_ERR(uacce_class);
+ goto err;
+ }
+
+ ret = alloc_chrdev_region(&uacce_devt, 0, MINORMASK, "uacce");
+ if (ret)
+ goto err_with_class;
+
+ pr_info("uacce init with major number:%d\n", MAJOR(uacce_devt));
+
+ return 0;
+
+err_with_class:
+ class_destroy(uacce_class);
+err:
+ return ret;
+}
+
+static __exit void uacce_exit(void)
+{
+ unregister_chrdev_region(uacce_devt, MINORMASK);
+ class_destroy(uacce_class);
+ idr_destroy(&uacce_idr);
+}
+
+subsys_initcall(uacce_init);
+module_exit(uacce_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Hisilicon Tech. Co., Ltd.");
+MODULE_DESCRIPTION("Accelerator interface for Userland applications");
diff --git a/include/linux/uacce.h b/include/linux/uacce.h
new file mode 100644
index 000000000000..7b7bc5821811
--- /dev/null
+++ b/include/linux/uacce.h
@@ -0,0 +1,117 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __UACCE_H
+#define __UACCE_H
+
+#include <linux/cdev.h>
+#include <linux/device.h>
+#include <linux/list.h>
+#include <linux/iommu.h>
+#include <uapi/linux/uacce.h>
+
+struct uacce_queue;
+struct uacce;
+
+#define UACCE_QFRF_MAP (1<<0) /* map to current queue */
+#define UACCE_QFRF_MMAP (1<<1) /* map to user space */
+#define UACCE_QFRF_KMAP (1<<2) /* map to kernel space */
+
+#define UACCE_QFR_NA ((unsigned long)-1)
+enum uacce_qfrt {
+ UACCE_QFRT_DKO = 0, /* device kernel-only */
+ UACCE_QFRT_DUS, /* device user share */
+ UACCE_QFRT_SS, /* static share memory */
+ UACCE_QFRT_MAX, /* used also for IO region */
+ UACCE_QFRT_INVALID
+};
+#define UACCE_QFRT_MMIO (UACCE_QFRT_MAX)
+
+struct uacce_qfile_region {
+ enum uacce_qfrt type;
+ unsigned long iova;
+ struct page **pages;
+ int nr_pages;
+ unsigned long prot;
+ int flags;
+ union {
+ struct list_head qs; /* qs sharing the same region, for ss */
+ void *kaddr; /* kernel addr, for dko */
+ };
+};
+
+/**
+ * struct uacce_ops - WD device operations
+ * @get_queue: get a queue from the device according to algorithm
+ * @put_queue: free a queue to the device
+ * @start_queue: make the queue start work after get_queue
+ * @stop_queue: make the queue stop work before put_queue
+ * @is_q_updated: check whether the task is finished
+ * @mask_notify: mask the task irq of queue
+ * @mmap: mmap addresses of queue to user space
+ * @map: map queue to device (for NOIOMMU device)
+ * @unmap: unmap queue to device (for NOIOMMU device)
+ * @reset: reset the WD device
+ * @reset_queue: reset the queue
+ * @ioctl: ioctl for user space users of the queue
+ */
+struct uacce_ops {
+ struct module *owner;
+ const char *api_ver;
+ int flags;
+ unsigned long qf_pg_start[UACCE_QFRT_MAX];
+
+ int (*get_queue)(struct uacce *uacce, unsigned long arg,
+ struct uacce_queue **q);
+ void (*put_queue)(struct uacce_queue *q);
+ int (*start_queue)(struct uacce_queue *q);
+ void (*stop_queue)(struct uacce_queue *q);
+ int (*is_q_updated)(struct uacce_queue *q);
+ void (*mask_notify)(struct uacce_queue *q, int event_mask);
+ int (*mmap)(struct uacce_queue *q, struct vm_area_struct *vma);
+ int (*map)(struct uacce_queue *q, struct uacce_qfile_region *qfr);
+ void (*unmap)(struct uacce_queue *q, struct uacce_qfile_region *qfr);
+ int (*reset)(struct uacce *uacce);
+ int (*reset_queue)(struct uacce_queue *q);
+ long (*ioctl)(struct uacce_queue *q, unsigned int cmd,
+ unsigned long arg);
+};
+
+struct uacce_queue {
+ struct uacce *uacce;
+ __u32 flags;
+ void *priv;
+ wait_queue_head_t wait;
+
+#ifdef CONFIG_IOMMU_SVA
+ int pasid;
+#endif
+ struct list_head list; /* as list for as->qs */
+
+ struct mm_struct *mm;
+
+ struct uacce_qfile_region *qfrs[UACCE_QFRT_MAX];
+};
+
+#define UACCE_ST_INIT 0
+#define UACCE_ST_OPENNED 1
+#define UACCE_ST_STARTED 2
+
+struct uacce {
+ const char *name;
+ int status;
+ struct uacce_ops *ops;
+ struct device *dev;
+ struct device cls_dev;
+ bool is_vf;
+ u32 dev_id;
+ struct cdev *cdev;
+ void *priv;
+ atomic_t state;
+ int prot;
+};
+
+int uacce_register(struct uacce *uacce);
+void uacce_unregister(struct uacce *uacce);
+void uacce_wake_up(struct uacce_queue *q);
+const char *uacce_qfrt_str(struct uacce_qfile_region *qfr);
+
+#endif
diff --git a/include/uapi/linux/uacce.h b/include/uapi/linux/uacce.h
new file mode 100644
index 000000000000..b30fd92dc07e
--- /dev/null
+++ b/include/uapi/linux/uacce.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _UAPIUUACCE_H
+#define _UAPIUUACCE_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+#define UACCE_CLASS_NAME "uacce"
+
+#define UACCE_CMD_SHARE_SVAS _IO('W', 0)
+
+/**
+ * UACCE Device Attributes:
+ *
+ * NOIOMMU: the device has no IOMMU support
+ * can do ssva, but no map to the dev
+ * PASID: the device has IOMMU which support PASID setting
+ * can do ssva, mapped to dev per process
+ * FAULT_FROM_DEV: the device has IOMMU which can do page fault request
+ * no need for ssva, should be used with PASID
+ * KMAP_DUS: map the Device user-shared space to kernel
+ * SVA: full function device
+ * SHARE_DOMAIN: no PASID, can do ssva only for one process and the kernel
+ */
+#define UACCE_DEV_NOIOMMU (1<<0)
+#define UACCE_DEV_PASID (1<<1)
+#define UACCE_DEV_FAULT_FROM_DEV (1<<2)
+#define UACCE_DEV_KMAP_DUS (1<<3)
+
+#define UACCE_DEV_SVA (UACCE_DEV_PASID | UACCE_DEV_FAULT_FROM_DEV)
+#define UACCE_DEV_SHARE_DOMAIN (0)
+
+#endif
--
2.17.1

2018-11-15 02:04:11

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote:
>
> 在 2018/11/13 上午8:23, Leon Romanovsky 写道:
> > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote:
> > > From: Kenneth Lee <[email protected]>
> > >
> > > WarpDrive is a general accelerator framework for the user application to
> > > access the hardware without going through the kernel in data path.
> > >
> > > The kernel component to provide kernel facility to driver for expose the
> > > user interface is called uacce. It a short name for
> > > "Unified/User-space-access-intended Accelerator Framework".
> > >
> > > This patch add document to explain how it works.
> > + RDMA and netdev folks
> >
> > Sorry, to be late in the game, I don't see other patches, but from
> > the description below it seems like you are reinventing RDMA verbs
> > model. I have hard time to see the differences in the proposed
> > framework to already implemented in drivers/infiniband/* for the kernel
> > space and for the https://github.com/linux-rdma/rdma-core/ for the user
> > space parts.
>
> Thanks Leon,
>
> Yes, we tried to solve similar problem in RDMA. We also learned a lot from
> the exist code of RDMA. But we we have to make a new one because we cannot
> register accelerators such as AI operation, encryption or compression to the
> RDMA framework:)

Assuming that you did everything right and still failed to use RDMA
framework, you was supposed to fix it and not to reinvent new exactly
same one. It is how we develop kernel, by reusing existing code.

>
> Another problem we tried to address is the way to pin the memory for dma
> operation. The RDMA way to pin the memory cannot avoid the page lost due to
> copy-on-write operation during the memory is used by the device. This may
> not be important to RDMA library. But it is important to accelerator.

Such support exists in drivers/infiniband/ from late 2014 and
it is called ODP (on demand paging).

>
> Hope this can help the understanding.

Yes, it helped me a lot.
Now, I'm more than before convinced that this whole patchset shouldn't
exist in the first place.

To be clear, NAK.

Thanks

>
> Cheers
>
> >
> > Hard NAK from RDMA side.
> >
> > Thanks
> >
> > > Signed-off-by: Kenneth Lee <[email protected]>
> > > ---
> > > Documentation/warpdrive/warpdrive.rst | 260 +++++++
> > > Documentation/warpdrive/wd-arch.svg | 764 ++++++++++++++++++++
> > > Documentation/warpdrive/wd.svg | 526 ++++++++++++++
> > > Documentation/warpdrive/wd_q_addr_space.svg | 359 +++++++++
> > > 4 files changed, 1909 insertions(+)
> > > create mode 100644 Documentation/warpdrive/warpdrive.rst
> > > create mode 100644 Documentation/warpdrive/wd-arch.svg
> > > create mode 100644 Documentation/warpdrive/wd.svg
> > > create mode 100644 Documentation/warpdrive/wd_q_addr_space.svg
> > >
> > > diff --git a/Documentation/warpdrive/warpdrive.rst b/Documentation/warpdrive/warpdrive.rst
> > > new file mode 100644
> > > index 000000000000..ef84d3a2d462
> > > --- /dev/null
> > > +++ b/Documentation/warpdrive/warpdrive.rst
> > > @@ -0,0 +1,260 @@
> > > +Introduction of WarpDrive
> > > +=========================
> > > +
> > > +*WarpDrive* is a general accelerator framework for the user application to
> > > +access the hardware without going through the kernel in data path.
> > > +
> > > +It can be used as the quick channel for accelerators, network adaptors or
> > > +other hardware for application in user space.
> > > +
> > > +This may make some implementation simpler. E.g. you can reuse most of the
> > > +*netdev* driver in kernel and just share some ring buffer to the user space
> > > +driver for *DPDK* [4] or *ODP* [5]. Or you can combine the RSA accelerator with
> > > +the *netdev* in the user space as a https reversed proxy, etc.
> > > +
> > > +*WarpDrive* takes the hardware accelerator as a heterogeneous processor which
> > > +can share particular load from the CPU:
> > > +
> > > +.. image:: wd.svg
> > > + :alt: WarpDrive Concept
> > > +
> > > +The virtual concept, queue, is used to manage the requests sent to the
> > > +accelerator. The application send requests to the queue by writing to some
> > > +particular address, while the hardware takes the requests directly from the
> > > +address and send feedback accordingly.
> > > +
> > > +The format of the queue may differ from hardware to hardware. But the
> > > +application need not to make any system call for the communication.
> > > +
> > > +*WarpDrive* tries to create a shared virtual address space for all involved
> > > +accelerators. Within this space, the requests sent to queue can refer to any
> > > +virtual address, which will be valid to the application and all involved
> > > +accelerators.
> > > +
> > > +The name *WarpDrive* is simply a cool and general name meaning the framework
> > > +makes the application faster. It includes general user library, kernel
> > > +management module and drivers for the hardware. In kernel, the management
> > > +module is called *uacce*, meaning "Unified/User-space-access-intended
> > > +Accelerator Framework".
> > > +
> > > +
> > > +How does it work
> > > +================
> > > +
> > > +*WarpDrive* uses *mmap* and *IOMMU* to play the trick.
> > > +
> > > +*Uacce* creates a chrdev for the device registered to it. A "queue" will be
> > > +created when the chrdev is opened. The application access the queue by mmap
> > > +different address region of the queue file.
> > > +
> > > +The following figure demonstrated the queue file address space:
> > > +
> > > +.. image:: wd_q_addr_space.svg
> > > + :alt: WarpDrive Queue Address Space
> > > +
> > > +The first region of the space, device region, is used for the application to
> > > +write request or read answer to or from the hardware.
> > > +
> > > +Normally, there can be three types of device regions mmio and memory regions.
> > > +It is recommended to use common memory for request/answer descriptors and use
> > > +the mmio space for device notification, such as doorbell. But of course, this
> > > +is all up to the interface designer.
> > > +
> > > +There can be two types of device memory regions, kernel-only and user-shared.
> > > +This will be explained in the "kernel APIs" section.
> > > +
> > > +The Static Share Virtual Memory region is necessary only when the device IOMMU
> > > +does not support "Share Virtual Memory". This will be explained after the
> > > +*IOMMU* idea.
> > > +
> > > +
> > > +Architecture
> > > +------------
> > > +
> > > +The full *WarpDrive* architecture is represented in the following class
> > > +diagram:
> > > +
> > > +.. image:: wd-arch.svg
> > > + :alt: WarpDrive Architecture
> > > +
> > > +
> > > +The user API
> > > +------------
> > > +
> > > +We adopt a polling style interface in the user space: ::
> > > +
> > > + int wd_request_queue(struct wd_queue *q);
> > > + void wd_release_queue(struct wd_queue *q);
> > > +
> > > + int wd_send(struct wd_queue *q, void *req);
> > > + int wd_recv(struct wd_queue *q, void **req);
> > > + int wd_recv_sync(struct wd_queue *q, void **req);
> > > + void wd_flush(struct wd_queue *q);
> > > +
> > > +wd_recv_sync() is a wrapper to its non-sync version. It will trapped into
> > > +kernel and waits until the queue become available.
> > > +
> > > +If the queue do not support SVA/SVM. The following helper function
> > > +can be used to create Static Virtual Share Memory: ::
> > > +
> > > + void *wd_preserve_share_memory(struct wd_queue *q, size_t size);
> > > +
> > > +The user API is not mandatory. It is simply a suggestion and hint what the
> > > +kernel interface is supposed to support.
> > > +
> > > +
> > > +The user driver
> > > +---------------
> > > +
> > > +The queue file mmap space will need a user driver to wrap the communication
> > > +protocol. *UACCE* provides some attributes in sysfs for the user driver to
> > > +match the right accelerator accordingly.
> > > +
> > > +The *UACCE* device attribute is under the following directory:
> > > +
> > > +/sys/class/uacce/<dev-name>/params
> > > +
> > > +The following attributes is supported:
> > > +
> > > +nr_queue_remained (ro)
> > > + number of queue remained
> > > +
> > > +api_version (ro)
> > > + a string to identify the queue mmap space format and its version
> > > +
> > > +device_attr (ro)
> > > + attributes of the device, see UACCE_DEV_xxx flag defined in uacce.h
> > > +
> > > +numa_node (ro)
> > > + id of numa node
> > > +
> > > +priority (rw)
> > > + Priority or the device, bigger is higher
> > > +
> > > +(This is not yet implemented in RFC version)
> > > +
> > > +
> > > +The kernel API
> > > +--------------
> > > +
> > > +The *uacce* kernel API is defined in uacce.h. If the hardware support SVM/SVA,
> > > +The driver need only the following API functions: ::
> > > +
> > > + int uacce_register(uacce);
> > > + void uacce_unregister(uacce);
> > > + void uacce_wake_up(q);
> > > +
> > > +*uacce_wake_up* is used to notify the process who epoll() on the queue file.
> > > +
> > > +According to the IOMMU capability, *uacce* categories the devices as follow:
> > > +
> > > +UACCE_DEV_NOIOMMU
> > > + The device has no IOMMU. The user process cannot use VA on the hardware
> > > + This mode is not recommended.
> > > +
> > > +UACCE_DEV_SVA (UACCE_DEV_PASID | UACCE_DEV_FAULT_FROM_DEV)
> > > + The device has IOMMU which can share the same page table with user
> > > + process
> > > +
> > > +UACCE_DEV_SHARE_DOMAIN
> > > + The device has IOMMU which has no multiple page table and device page
> > > + fault support
> > > +
> > > +If the device works in mode other than UACCE_DEV_NOIOMMU, *uacce* will set its
> > > +IOMMU to IOMMU_DOMAIN_UNMANAGED. So the driver must not use any kernel
> > > +DMA API but the following ones from *uacce* instead: ::
> > > +
> > > + uacce_dma_map(q, va, size, prot);
> > > + uacce_dma_unmap(q, va, size, prot);
> > > +
> > > +*uacce_dma_map/unmap* is valid only for UACCE_DEV_SVA device. It creates a
> > > +particular PASID and page table for the kernel in the IOMMU (Not yet
> > > +implemented in the RFC)
> > > +
> > > +For the UACCE_DEV_SHARE_DOMAIN device, uacce_dma_map/unmap is not valid.
> > > +*Uacce* call back start_queue only when the DUS and DKO region is mmapped. The
> > > +accelerator driver must use those dma buffer, via uacce_queue->qfrs[], on
> > > +start_queue call back. The size of the queue file region is defined by
> > > +uacce->ops->qf_pg_start[].
> > > +
> > > +We have to do it this way because most of current IOMMU cannot support the
> > > +kernel and user virtual address at the same time. So we have to let them both
> > > +share the same user virtual address space.
> > > +
> > > +If the device have to support kernel and user at the same time, both kernel
> > > +and the user should use these DMA API. This is not convenient. A better
> > > +solution is to change the future DMA/IOMMU design to let them separate the
> > > +address space between the user and kernel space. But it is not going to be in
> > > +a short time.
> > > +
> > > +
> > > +Multiple processes support
> > > +==========================
> > > +
> > > +In the latest mainline kernel (4.19) when this document is written, the IOMMU
> > > +subsystem do not support multiple process page tables yet.
> > > +
> > > +Most IOMMU hardware implementation support multi-process with the concept
> > > +of PASID. But they may use different name, e.g. it is call sub-stream-id in
> > > +SMMU of ARM. With PASID or similar design, multi page table can be added to
> > > +the IOMMU and referred by its PASID.
> > > +
> > > +*JPB* has a patchset to enable this[1]_. We have tested it with our hardware
> > > +(which is known as *D06*). It works well. *WarpDrive* rely on them to support
> > > +UACCE_DEV_SVA. If it is not enabled, *WarpDrive* can still work. But it
> > > +support only one process, the device will be set to UACCE_DEV_SHARE_DOMAIN
> > > +even it is set to UACCE_DEV_SVA initially.
> > > +
> > > +Static Share Virtual Memory is mainly used by UACCE_DEV_SHARE_DOMAIN device.
> > > +
> > > +
> > > +Legacy Mode Support
> > > +===================
> > > +For the hardware without IOMMU, WarpDrive can still work, the only problem is
> > > +VA cannot be used in the device. The driver should adopt another strategy for
> > > +the shared memory. It is only for testing, and not recommended.
> > > +
> > > +
> > > +The Folk Scenario
> > > +=================
> > > +For a process with allocated queues and shared memory, what happen if it forks
> > > +a child?
> > > +
> > > +The fd of the queue will be duplicated on folk, so the child can send request
> > > +to the same queue as its parent. But the requests which is sent from processes
> > > +except for the one who open the queue will be blocked.
> > > +
> > > +It is recommended to add O_CLOEXEC to the queue file.
> > > +
> > > +The queue mmap space has a VM_DONTCOPY in its VMA. So the child will lost all
> > > +those VMAs.
> > > +
> > > +This is why *WarpDrive* does not adopt the mode used in *VFIO* and *InfiniBand*.
> > > +Both solutions can set any user pointer for hardware sharing. But they cannot
> > > +support fork when the dma is in process. Or the "Copy-On-Write" procedure will
> > > +make the parent process lost its physical pages.
> > > +
> > > +
> > > +The Sample Code
> > > +===============
> > > +There is a sample user land implementation with a simple driver for Hisilicon
> > > +Hi1620 ZIP Accelerator.
> > > +
> > > +To test, do the following in samples/warpdrive (for the case of PC host): ::
> > > + ./autogen.sh
> > > + ./conf.sh # or simply ./configure if you build on target system
> > > + make
> > > +
> > > +Then you can get test_hisi_zip in the test subdirectory. Copy it to the target
> > > +system and make sure the hisi_zip driver is enabled (the major and minor of
> > > +the uacce chrdev can be gotten from the dmesg or sysfs), and run: ::
> > > + mknod /dev/ua1 c <major> <minior>
> > > + test/test_hisi_zip -z < data > data.zip
> > > + test/test_hisi_zip -g < data > data.gzip
> > > +
> > > +
> > > +References
> > > +==========
> > > +.. [1] https://patchwork.kernel.org/patch/10394851/
> > > +
> > > +.. vim: tw=78
> > > diff --git a/Documentation/warpdrive/wd-arch.svg b/Documentation/warpdrive/wd-arch.svg
> > > new file mode 100644
> > > index 000000000000..e59934188443
> > > --- /dev/null
> > > +++ b/Documentation/warpdrive/wd-arch.svg
> > > @@ -0,0 +1,764 @@
> > > +<?xml version="1.0" encoding="UTF-8" standalone="no"?>
> > > +<!-- Created with Inkscape (http://www.inkscape.org/) -->
> > > +
> > > +<svg
> > > + xmlns:dc="http://purl.org/dc/elements/1.1/"
> > > + xmlns:cc="http://creativecommons.org/ns#"
> > > + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> > > + xmlns:svg="http://www.w3.org/2000/svg"
> > > + xmlns="http://www.w3.org/2000/svg"
> > > + xmlns:xlink="http://www.w3.org/1999/xlink"
> > > + xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
> > > + xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
> > > + width="210mm"
> > > + height="193mm"
> > > + viewBox="0 0 744.09449 683.85823"
> > > + id="svg2"
> > > + version="1.1"
> > > + inkscape:version="0.92.3 (2405546, 2018-03-11)"
> > > + sodipodi:docname="wd-arch.svg">
> > > + <defs
> > > + id="defs4">
> > > + <linearGradient
> > > + inkscape:collect="always"
> > > + id="linearGradient6830">
> > > + <stop
> > > + style="stop-color:#000000;stop-opacity:1;"
> > > + offset="0"
> > > + id="stop6832" />
> > > + <stop
> > > + style="stop-color:#000000;stop-opacity:0;"
> > > + offset="1"
> > > + id="stop6834" />
> > > + </linearGradient>
> > > + <linearGradient
> > > + inkscape:collect="always"
> > > + xlink:href="#linearGradient5026"
> > > + id="linearGradient5032"
> > > + x1="353"
> > > + y1="211.3622"
> > > + x2="565.5"
> > > + y2="174.8622"
> > > + gradientUnits="userSpaceOnUse"
> > > + gradientTransform="translate(-89.949614,405.94594)" />
> > > + <linearGradient
> > > + inkscape:collect="always"
> > > + id="linearGradient5026">
> > > + <stop
> > > + style="stop-color:#f2f2f2;stop-opacity:1;"
> > > + offset="0"
> > > + id="stop5028" />
> > > + <stop
> > > + style="stop-color:#f2f2f2;stop-opacity:0;"
> > > + offset="1"
> > > + id="stop5030" />
> > > + </linearGradient>
> > > + <filter
> > > + inkscape:collect="always"
> > > + style="color-interpolation-filters:sRGB"
> > > + id="filter4169-3"
> > > + x="-0.031597666"
> > > + width="1.0631953"
> > > + y="-0.099812768"
> > > + height="1.1996255">
> > > + <feGaussianBlur
> > > + inkscape:collect="always"
> > > + stdDeviation="1.3307599"
> > > + id="feGaussianBlur4171-6" />
> > > + </filter>
> > > + <linearGradient
> > > + inkscape:collect="always"
> > > + xlink:href="#linearGradient5026"
> > > + id="linearGradient5032-1"
> > > + x1="353"
> > > + y1="211.3622"
> > > + x2="565.5"
> > > + y2="174.8622"
> > > + gradientUnits="userSpaceOnUse"
> > > + gradientTransform="translate(175.77842,400.29111)" />
> > > + <filter
> > > + inkscape:collect="always"
> > > + style="color-interpolation-filters:sRGB"
> > > + id="filter4169-3-0"
> > > + x="-0.031597666"
> > > + width="1.0631953"
> > > + y="-0.099812768"
> > > + height="1.1996255">
> > > + <feGaussianBlur
> > > + inkscape:collect="always"
> > > + stdDeviation="1.3307599"
> > > + id="feGaussianBlur4171-6-9" />
> > > + </filter>
> > > + <marker
> > > + markerWidth="18.960653"
> > > + markerHeight="11.194658"
> > > + refX="9.4803267"
> > > + refY="5.5973287"
> > > + orient="auto"
> > > + id="marker4613">
> > > + <rect
> > > + y="-5.1589785"
> > > + x="5.8504119"
> > > + height="10.317957"
> > > + width="10.317957"
> > > + id="rect4212"
> > > + style="fill:#ffffff;stroke:#000000;stroke-width:0.69143367;stroke-miterlimit:4;stroke-dasharray:none"
> > > + transform="matrix(0.86111274,0.50841405,-0.86111274,0.50841405,0,0)">
> > > + <title
> > > + id="title4262">generation</title>
> > > + </rect>
> > > + </marker>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <linearGradient
> > > + inkscape:collect="always"
> > > + xlink:href="#linearGradient5026"
> > > + id="linearGradient5032-3-9"
> > > + x1="353"
> > > + y1="211.3622"
> > > + x2="565.5"
> > > + y2="174.8622"
> > > + gradientUnits="userSpaceOnUse"
> > > + gradientTransform="matrix(1.2452511,0,0,0.98513016,-190.95632,540.33156)" />
> > > + <filter
> > > + inkscape:collect="always"
> > > + style="color-interpolation-filters:sRGB"
> > > + id="filter4169-3-5-8"
> > > + x="-0.031597666"
> > > + width="1.0631953"
> > > + y="-0.099812768"
> > > + height="1.1996255">
> > > + <feGaussianBlur
> > > + inkscape:collect="always"
> > > + stdDeviation="1.3307599"
> > > + id="feGaussianBlur4171-6-3-9" />
> > > + </filter>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-2">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-9"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-2-1">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-9-9"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <linearGradient
> > > + inkscape:collect="always"
> > > + xlink:href="#linearGradient5026"
> > > + id="linearGradient5032-3-9-7"
> > > + x1="353"
> > > + y1="211.3622"
> > > + x2="565.5"
> > > + y2="174.8622"
> > > + gradientUnits="userSpaceOnUse"
> > > + gradientTransform="matrix(1.3742742,0,0,0.97786398,-234.52617,654.63367)" />
> > > + <filter
> > > + inkscape:collect="always"
> > > + style="color-interpolation-filters:sRGB"
> > > + id="filter4169-3-5-8-5"
> > > + x="-0.031597666"
> > > + width="1.0631953"
> > > + y="-0.099812768"
> > > + height="1.1996255">
> > > + <feGaussianBlur
> > > + inkscape:collect="always"
> > > + stdDeviation="1.3307599"
> > > + id="feGaussianBlur4171-6-3-9-0" />
> > > + </filter>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-2-6">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-9-1"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <linearGradient
> > > + inkscape:collect="always"
> > > + xlink:href="#linearGradient5026"
> > > + id="linearGradient5032-3-9-4"
> > > + x1="353"
> > > + y1="211.3622"
> > > + x2="565.5"
> > > + y2="174.8622"
> > > + gradientUnits="userSpaceOnUse"
> > > + gradientTransform="matrix(1.3742912,0,0,2.0035845,-468.34428,342.56603)" />
> > > + <filter
> > > + inkscape:collect="always"
> > > + style="color-interpolation-filters:sRGB"
> > > + id="filter4169-3-5-8-54"
> > > + x="-0.031597666"
> > > + width="1.0631953"
> > > + y="-0.099812768"
> > > + height="1.1996255">
> > > + <feGaussianBlur
> > > + inkscape:collect="always"
> > > + stdDeviation="1.3307599"
> > > + id="feGaussianBlur4171-6-3-9-7" />
> > > + </filter>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-2-1-8">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-9-9-6"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-2-1-8-8">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-9-9-6-9"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-0">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-93"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-0-2">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-93-6"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <filter
> > > + inkscape:collect="always"
> > > + style="color-interpolation-filters:sRGB"
> > > + id="filter5382"
> > > + x="-0.089695387"
> > > + width="1.1793908"
> > > + y="-0.10052069"
> > > + height="1.2010413">
> > > + <feGaussianBlur
> > > + inkscape:collect="always"
> > > + stdDeviation="0.86758925"
> > > + id="feGaussianBlur5384" />
> > > + </filter>
> > > + <linearGradient
> > > + inkscape:collect="always"
> > > + xlink:href="#linearGradient6830"
> > > + id="linearGradient6836"
> > > + x1="362.73923"
> > > + y1="700.04059"
> > > + x2="340.4751"
> > > + y2="678.25488"
> > > + gradientUnits="userSpaceOnUse"
> > > + gradientTransform="translate(-23.771026,-135.76835)" />
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-2-6-2">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-9-1-9"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <linearGradient
> > > + inkscape:collect="always"
> > > + xlink:href="#linearGradient5026"
> > > + id="linearGradient5032-3-9-7-3"
> > > + x1="353"
> > > + y1="211.3622"
> > > + x2="565.5"
> > > + y2="174.8622"
> > > + gradientUnits="userSpaceOnUse"
> > > + gradientTransform="matrix(1.3742742,0,0,0.97786395,-57.357186,649.55786)" />
> > > + <filter
> > > + inkscape:collect="always"
> > > + style="color-interpolation-filters:sRGB"
> > > + id="filter4169-3-5-8-5-0"
> > > + x="-0.031597666"
> > > + width="1.0631953"
> > > + y="-0.099812768"
> > > + height="1.1996255">
> > > + <feGaussianBlur
> > > + inkscape:collect="always"
> > > + stdDeviation="1.3307599"
> > > + id="feGaussianBlur4171-6-3-9-0-2" />
> > > + </filter>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-2-1-1">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-9-9-0"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + </defs>
> > > + <sodipodi:namedview
> > > + id="base"
> > > + pagecolor="#ffffff"
> > > + bordercolor="#666666"
> > > + borderopacity="1.0"
> > > + inkscape:pageopacity="0.0"
> > > + inkscape:pageshadow="2"
> > > + inkscape:zoom="0.98994949"
> > > + inkscape:cx="222.32868"
> > > + inkscape:cy="370.44492"
> > > + inkscape:document-units="px"
> > > + inkscape:current-layer="layer1"
> > > + showgrid="false"
> > > + inkscape:window-width="1916"
> > > + inkscape:window-height="1033"
> > > + inkscape:window-x="0"
> > > + inkscape:window-y="22"
> > > + inkscape:window-maximized="0"
> > > + fit-margin-right="0.3"
> > > + inkscape:snap-global="false" />
> > > + <metadata
> > > + id="metadata7">
> > > + <rdf:RDF>
> > > + <cc:Work
> > > + rdf:about="">
> > > + <dc:format>image/svg+xml</dc:format>
> > > + <dc:type
> > > + rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
> > > + <dc:title />
> > > + </cc:Work>
> > > + </rdf:RDF>
> > > + </metadata>
> > > + <g
> > > + inkscape:label="Layer 1"
> > > + inkscape:groupmode="layer"
> > > + id="layer1"
> > > + transform="translate(0,-368.50374)">
> > > + <rect
> > > + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3)"
> > > + id="rect4136-3-6"
> > > + width="101.07784"
> > > + height="31.998148"
> > > + x="283.01144"
> > > + y="588.80896" />
> > > + <rect
> > > + style="fill:url(#linearGradient5032);fill-opacity:1;stroke:#000000;stroke-width:0.6465112"
> > > + id="rect4136-2"
> > > + width="101.07784"
> > > + height="31.998148"
> > > + x="281.63498"
> > > + y="586.75739" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="294.21747"
> > > + y="612.50073"
> > > + id="text4138-6"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan4140-1"
> > > + x="294.21747"
> > > + y="612.50073"
> > > + style="font-size:15px;line-height:1.25">WarpDrive</tspan></text>
> > > + <rect
> > > + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-0)"
> > > + id="rect4136-3-6-3"
> > > + width="101.07784"
> > > + height="31.998148"
> > > + x="548.7395"
> > > + y="583.15417" />
> > > + <rect
> > > + style="fill:url(#linearGradient5032-1);fill-opacity:1;stroke:#000000;stroke-width:0.6465112"
> > > + id="rect4136-2-60"
> > > + width="101.07784"
> > > + height="31.998148"
> > > + x="547.36304"
> > > + y="581.1026" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="557.83484"
> > > + y="602.32745"
> > > + id="text4138-6-6"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan4140-1-2"
> > > + x="557.83484"
> > > + y="602.32745"
> > > + style="font-size:15px;line-height:1.25">user_driver</tspan></text>
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4613)"
> > > + d="m 547.36304,600.78954 -156.58203,0.0691"
> > > + id="path4855"
> > > + inkscape:connector-curvature="0"
> > > + sodipodi:nodetypes="cc" />
> > > + <rect
> > > + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8)"
> > > + id="rect4136-3-6-5-7"
> > > + width="101.07784"
> > > + height="31.998148"
> > > + x="128.74678"
> > > + y="80.648842"
> > > + transform="matrix(1.2452511,0,0,0.98513016,113.15182,641.02594)" />
> > > + <rect
> > > + style="fill:url(#linearGradient5032-3-9);fill-opacity:1;stroke:#000000;stroke-width:0.71606314"
> > > + id="rect4136-2-6-3"
> > > + width="125.86729"
> > > + height="31.522341"
> > > + x="271.75983"
> > > + y="718.45435" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="309.13705"
> > > + y="745.55371"
> > > + id="text4138-6-2-6"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan4140-1-9-1"
> > > + x="309.13705"
> > > + y="745.55371"
> > > + style="font-size:15px;line-height:1.25">uacce</tspan></text>
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2)"
> > > + d="m 329.57309,619.72453 5.0373,97.14447"
> > > + id="path4661-3"
> > > + inkscape:connector-curvature="0"
> > > + sodipodi:nodetypes="cc" />
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-1)"
> > > + d="m 342.57219,830.63108 -5.67699,-79.2841"
> > > + id="path4661-3-4"
> > > + inkscape:connector-curvature="0"
> > > + sodipodi:nodetypes="cc" />
> > > + <rect
> > > + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-5)"
> > > + id="rect4136-3-6-5-7-3"
> > > + width="101.07784"
> > > + height="31.998148"
> > > + x="128.74678"
> > > + y="80.648842"
> > > + transform="matrix(1.3742742,0,0,0.97786398,101.09126,754.58534)" />
> > > + <rect
> > > + style="fill:url(#linearGradient5032-3-9-7);fill-opacity:1;stroke:#000000;stroke-width:0.74946606"
> > > + id="rect4136-2-6-3-6"
> > > + width="138.90866"
> > > + height="31.289837"
> > > + x="276.13297"
> > > + y="831.44263" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="295.67819"
> > > + y="852.98224"
> > > + id="text4138-6-2-6-1"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan4140-1-9-1-0"
> > > + x="295.67819"
> > > + y="852.98224"
> > > + style="font-size:15px;line-height:1.25">Device Driver</tspan></text>
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-6)"
> > > + d="m 623.05084,615.00104 0.51369,333.80219"
> > > + id="path4661-3-5"
> > > + inkscape:connector-curvature="0"
> > > + sodipodi:nodetypes="cc" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="392.63568"
> > > + y="660.83667"
> > > + id="text4138-6-2-6-1-6-2-5"><tspan
> > > + sodipodi:role="line"
> > > + x="392.63568"
> > > + y="660.83667"
> > > + id="tspan4305"
> > > + style="font-size:15px;line-height:1.25">&lt;&lt;anom_file&gt;&gt;</tspan><tspan
> > > + sodipodi:role="line"
> > > + x="392.63568"
> > > + y="679.58667"
> > > + style="font-size:15px;line-height:1.25"
> > > + id="tspan1139">Queue FD</tspan></text>
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="389.92969"
> > > + y="587.44836"
> > > + id="text4138-6-2-6-1-6-2-56"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan4140-1-9-1-0-3-0-9"
> > > + x="389.92969"
> > > + y="587.44836"
> > > + style="font-size:15px;line-height:1.25">1</tspan></text>
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="528.64813"
> > > + y="600.08429"
> > > + id="text4138-6-2-6-1-6-3"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan4140-1-9-1-0-3-7"
> > > + x="528.64813"
> > > + y="600.08429"
> > > + style="font-size:15px;line-height:1.25">*</tspan></text>
> > > + <rect
> > > + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-54)"
> > > + id="rect4136-3-6-5-7-4"
> > > + width="101.07784"
> > > + height="31.998148"
> > > + x="128.74678"
> > > + y="80.648842"
> > > + transform="matrix(1.3745874,0,0,1.8929066,-132.7754,556.04505)" />
> > > + <rect
> > > + style="fill:url(#linearGradient5032-3-9-4);fill-opacity:1;stroke:#000000;stroke-width:1.07280123"
> > > + id="rect4136-2-6-3-4"
> > > + width="138.91039"
> > > + height="64.111"
> > > + x="42.321312"
> > > + y="704.8371" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="110.30745"
> > > + y="722.94025"
> > > + id="text4138-6-2-6-3"><tspan
> > > + sodipodi:role="line"
> > > + x="111.99202"
> > > + y="722.94025"
> > > + id="tspan4366"
> > > + style="font-size:15px;line-height:1.25;text-align:center;text-anchor:middle">other standard </tspan><tspan
> > > + sodipodi:role="line"
> > > + x="110.30745"
> > > + y="741.69025"
> > > + id="tspan4368"
> > > + style="font-size:15px;line-height:1.25;text-align:center;text-anchor:middle">framework</tspan><tspan
> > > + sodipodi:role="line"
> > > + x="110.30745"
> > > + y="760.44025"
> > > + style="font-size:15px;line-height:1.25;text-align:center;text-anchor:middle"
> > > + id="tspan6840">(crypto/nic/others)</tspan></text>
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-1-8)"
> > > + d="M 276.29661,849.04109 134.04449,771.90853"
> > > + id="path4661-3-4-8"
> > > + inkscape:connector-curvature="0"
> > > + sodipodi:nodetypes="cc" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="313.70813"
> > > + y="730.06366"
> > > + id="text4138-6-2-6-36"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan4140-1-9-1-7"
> > > + x="313.70813"
> > > + y="730.06366"
> > > + style="font-size:10px;line-height:1.25">&lt;&lt;lkm&gt;&gt;</tspan></text>
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:start;letter-spacing:0px;word-spacing:0px;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="259.53165"
> > > + y="797.8056"
> > > + id="text4138-6-2-6-1-6-2-5-7-5"><tspan
> > > + sodipodi:role="line"
> > > + x="259.53165"
> > > + y="797.8056"
> > > + style="font-size:15px;line-height:1.25;text-align:start;text-anchor:start"
> > > + id="tspan2357">uacce register api</tspan></text>
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="29.145819"
> > > + y="833.44244"
> > > + id="text4138-6-2-6-1-6-2-5-7-5-2"><tspan
> > > + sodipodi:role="line"
> > > + x="29.145819"
> > > + y="833.44244"
> > > + id="tspan4301"
> > > + style="font-size:15px;line-height:1.25">register to other subsystem</tspan></text>
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="301.20813"
> > > + y="597.29437"
> > > + id="text4138-6-2-6-36-1"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan4140-1-9-1-7-2"
> > > + x="301.20813"
> > > + y="597.29437"
> > > + style="font-size:10px;line-height:1.25">&lt;&lt;user_lib&gt;&gt;</tspan></text>
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="615.9505"
> > > + y="739.44012"
> > > + id="text4138-6-2-6-1-6-2-5-3"><tspan
> > > + sodipodi:role="line"
> > > + x="615.9505"
> > > + y="739.44012"
> > > + id="tspan4274-7"
> > > + style="font-size:15px;line-height:1.25">mmapped memory r/w interface</tspan></text>
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="371.01291"
> > > + y="529.23682"
> > > + id="text4138-6-2-6-1-6-2-5-36"><tspan
> > > + sodipodi:role="line"
> > > + x="371.01291"
> > > + y="529.23682"
> > > + id="tspan4305-3"
> > > + style="font-size:15px;line-height:1.25">wd user api</tspan></text>
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + d="m 328.19325,585.87943 0,-23.57142"
> > > + id="path4348"
> > > + inkscape:connector-curvature="0" />
> > > + <ellipse
> > > + style="opacity:1;fill:#ffffff;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0"
> > > + id="path4350"
> > > + cx="328.01468"
> > > + cy="551.95081"
> > > + rx="11.607142"
> > > + ry="10.357142" />
> > > + <path
> > > + style="opacity:0.444;fill:url(#linearGradient6836);fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;filter:url(#filter5382)"
> > > + id="path4350-2"
> > > + sodipodi:type="arc"
> > > + sodipodi:cx="329.44327"
> > > + sodipodi:cy="553.37933"
> > > + sodipodi:rx="11.607142"
> > > + sodipodi:ry="10.357142"
> > > + sodipodi:start="0"
> > > + sodipodi:end="6.2509098"
> > > + d="m 341.05041,553.37933 a 11.607142,10.357142 0 0 1 -11.51349,10.35681 11.607142,10.357142 0 0 1 -11.69928,-10.18967 11.607142,10.357142 0 0 1 11.32469,-10.52124 11.607142,10.357142 0 0 1 11.88204,10.01988"
> > > + sodipodi:open="true" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="619.67596"
> > > + y="978.22363"
> > > + id="text4138-6-2-6-1-6-2-5-36-3"><tspan
> > > + sodipodi:role="line"
> > > + x="619.67596"
> > > + y="978.22363"
> > > + id="tspan4305-3-67"
> > > + style="font-size:15px;line-height:1.25">Device(Hardware)</tspan></text>
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-6-2)"
> > > + d="m 347.51164,865.4527 193.91929,99.10053"
> > > + id="path4661-3-5-1"
> > > + inkscape:connector-curvature="0"
> > > + sodipodi:nodetypes="cc" />
> > > + <rect
> > > + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-5-0)"
> > > + id="rect4136-3-6-5-7-3-1"
> > > + width="101.07784"
> > > + height="31.998148"
> > > + x="128.74678"
> > > + y="80.648842"
> > > + transform="matrix(1.3742742,0,0,0.97786395,278.26025,749.50952)" />
> > > + <rect
> > > + style="fill:url(#linearGradient5032-3-9-7-3);fill-opacity:1;stroke:#000000;stroke-width:0.74946606"
> > > + id="rect4136-2-6-3-6-0"
> > > + width="138.90868"
> > > + height="31.289839"
> > > + x="453.30197"
> > > + y="826.36682" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="493.68158"
> > > + y="847.90643"
> > > + id="text4138-6-2-6-1-5"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan4140-1-9-1-0-1"
> > > + x="493.68158"
> > > + y="847.90643"
> > > + style="font-size:15px;line-height:1.25;stroke-width:1px">IOMMU</tspan></text>
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-1-1)"
> > > + d="m 389.49372,755.46667 111.75324,68.4507"
> > > + id="path4661-3-4-85"
> > > + inkscape:connector-curvature="0"
> > > + sodipodi:nodetypes="cc" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;text-align:start;letter-spacing:0px;word-spacing:0px;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="427.70282"
> > > + y="776.91418"
> > > + id="text4138-6-2-6-1-6-2-5-7-5-0"><tspan
> > > + sodipodi:role="line"
> > > + x="427.70282"
> > > + y="776.91418"
> > > + style="font-size:15px;line-height:1.25;text-align:start;text-anchor:start;stroke-width:1px"
> > > + id="tspan2357-6">manage the driver iommu state</tspan></text>
> > > + </g>
> > > +</svg>
> > > diff --git a/Documentation/warpdrive/wd.svg b/Documentation/warpdrive/wd.svg
> > > new file mode 100644
> > > index 000000000000..87ab92ebfbc6
> > > --- /dev/null
> > > +++ b/Documentation/warpdrive/wd.svg
> > > @@ -0,0 +1,526 @@
> > > +<?xml version="1.0" encoding="UTF-8" standalone="no"?>
> > > +<!-- Created with Inkscape (http://www.inkscape.org/) -->
> > > +
> > > +<svg
> > > + xmlns:dc="http://purl.org/dc/elements/1.1/"
> > > + xmlns:cc="http://creativecommons.org/ns#"
> > > + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> > > + xmlns:svg="http://www.w3.org/2000/svg"
> > > + xmlns="http://www.w3.org/2000/svg"
> > > + xmlns:xlink="http://www.w3.org/1999/xlink"
> > > + xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
> > > + xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
> > > + width="210mm"
> > > + height="116mm"
> > > + viewBox="0 0 744.09449 411.02338"
> > > + id="svg2"
> > > + version="1.1"
> > > + inkscape:version="0.92.3 (2405546, 2018-03-11)"
> > > + sodipodi:docname="wd.svg">
> > > + <defs
> > > + id="defs4">
> > > + <linearGradient
> > > + inkscape:collect="always"
> > > + id="linearGradient5026">
> > > + <stop
> > > + style="stop-color:#f2f2f2;stop-opacity:1;"
> > > + offset="0"
> > > + id="stop5028" />
> > > + <stop
> > > + style="stop-color:#f2f2f2;stop-opacity:0;"
> > > + offset="1"
> > > + id="stop5030" />
> > > + </linearGradient>
> > > + <linearGradient
> > > + inkscape:collect="always"
> > > + xlink:href="#linearGradient5026"
> > > + id="linearGradient5032-3"
> > > + x1="353"
> > > + y1="211.3622"
> > > + x2="565.5"
> > > + y2="174.8622"
> > > + gradientUnits="userSpaceOnUse"
> > > + gradientTransform="matrix(2.7384117,0,0,0.91666329,-952.8283,571.10143)" />
> > > + <filter
> > > + inkscape:collect="always"
> > > + style="color-interpolation-filters:sRGB"
> > > + id="filter4169-3-5"
> > > + x="-0.031597666"
> > > + width="1.0631953"
> > > + y="-0.099812768"
> > > + height="1.1996255">
> > > + <feGaussianBlur
> > > + inkscape:collect="always"
> > > + stdDeviation="1.3307599"
> > > + id="feGaussianBlur4171-6-3" />
> > > + </filter>
> > > + <marker
> > > + markerWidth="18.960653"
> > > + markerHeight="11.194658"
> > > + refX="9.4803267"
> > > + refY="5.5973287"
> > > + orient="auto"
> > > + id="marker4613">
> > > + <rect
> > > + y="-5.1589785"
> > > + x="5.8504119"
> > > + height="10.317957"
> > > + width="10.317957"
> > > + id="rect4212"
> > > + style="fill:#ffffff;stroke:#000000;stroke-width:0.69143367;stroke-miterlimit:4;stroke-dasharray:none"
> > > + transform="matrix(0.86111274,0.50841405,-0.86111274,0.50841405,0,0)">
> > > + <title
> > > + id="title4262">generation</title>
> > > + </rect>
> > > + </marker>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <linearGradient
> > > + inkscape:collect="always"
> > > + xlink:href="#linearGradient5026"
> > > + id="linearGradient5032-3-9"
> > > + x1="353"
> > > + y1="211.3622"
> > > + x2="565.5"
> > > + y2="174.8622"
> > > + gradientUnits="userSpaceOnUse"
> > > + gradientTransform="matrix(1.2452511,0,0,0.98513016,-190.95632,540.33156)" />
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-2">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-9"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-2-1">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-9-9"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-2-6">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-9-1"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-2-1-8">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-9-9-6"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-2-1-8-8">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-9-9-6-9"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-0">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-93"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-0-2">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-93-6"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-2-6-2">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-9-1-9"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <linearGradient
> > > + inkscape:collect="always"
> > > + xlink:href="#linearGradient5026"
> > > + id="linearGradient5032-3-8"
> > > + x1="353"
> > > + y1="211.3622"
> > > + x2="565.5"
> > > + y2="174.8622"
> > > + gradientUnits="userSpaceOnUse"
> > > + gradientTransform="matrix(1.0104674,0,0,1.0052679,-218.642,661.15448)" />
> > > + <filter
> > > + inkscape:collect="always"
> > > + style="color-interpolation-filters:sRGB"
> > > + id="filter4169-3-5-8"
> > > + x="-0.031597666"
> > > + width="1.0631953"
> > > + y="-0.099812768"
> > > + height="1.1996255">
> > > + <feGaussianBlur
> > > + inkscape:collect="always"
> > > + stdDeviation="1.3307599"
> > > + id="feGaussianBlur4171-6-3-9" />
> > > + </filter>
> > > + <linearGradient
> > > + inkscape:collect="always"
> > > + xlink:href="#linearGradient5026"
> > > + id="linearGradient5032-3-8-2"
> > > + x1="353"
> > > + y1="211.3622"
> > > + x2="565.5"
> > > + y2="174.8622"
> > > + gradientUnits="userSpaceOnUse"
> > > + gradientTransform="matrix(2.1450559,0,0,1.0052679,-521.97704,740.76422)" />
> > > + <filter
> > > + inkscape:collect="always"
> > > + style="color-interpolation-filters:sRGB"
> > > + id="filter4169-3-5-8-5"
> > > + x="-0.031597666"
> > > + width="1.0631953"
> > > + y="-0.099812768"
> > > + height="1.1996255">
> > > + <feGaussianBlur
> > > + inkscape:collect="always"
> > > + stdDeviation="1.3307599"
> > > + id="feGaussianBlur4171-6-3-9-1" />
> > > + </filter>
> > > + <linearGradient
> > > + inkscape:collect="always"
> > > + xlink:href="#linearGradient5026"
> > > + id="linearGradient5032-3-8-0"
> > > + x1="353"
> > > + y1="211.3622"
> > > + x2="565.5"
> > > + y2="174.8622"
> > > + gradientUnits="userSpaceOnUse"
> > > + gradientTransform="matrix(1.0104674,0,0,1.0052679,83.456748,660.20747)" />
> > > + <filter
> > > + inkscape:collect="always"
> > > + style="color-interpolation-filters:sRGB"
> > > + id="filter4169-3-5-8-6"
> > > + x="-0.031597666"
> > > + width="1.0631953"
> > > + y="-0.099812768"
> > > + height="1.1996255">
> > > + <feGaussianBlur
> > > + inkscape:collect="always"
> > > + stdDeviation="1.3307599"
> > > + id="feGaussianBlur4171-6-3-9-2" />
> > > + </filter>
> > > + <linearGradient
> > > + inkscape:collect="always"
> > > + xlink:href="#linearGradient5026"
> > > + id="linearGradient5032-3-84"
> > > + x1="353"
> > > + y1="211.3622"
> > > + x2="565.5"
> > > + y2="174.8622"
> > > + gradientUnits="userSpaceOnUse"
> > > + gradientTransform="matrix(1.9884948,0,0,0.94903536,-318.42665,564.37696)" />
> > > + <filter
> > > + inkscape:collect="always"
> > > + style="color-interpolation-filters:sRGB"
> > > + id="filter4169-3-5-4"
> > > + x="-0.031597666"
> > > + width="1.0631953"
> > > + y="-0.099812768"
> > > + height="1.1996255">
> > > + <feGaussianBlur
> > > + inkscape:collect="always"
> > > + stdDeviation="1.3307599"
> > > + id="feGaussianBlur4171-6-3-0" />
> > > + </filter>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-0-0">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-93-8"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + <marker
> > > + markerWidth="11.227358"
> > > + markerHeight="12.355258"
> > > + refX="10"
> > > + refY="6.177629"
> > > + orient="auto"
> > > + id="marker4825-6-3">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path4757-1-1"
> > > + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> > > + </marker>
> > > + </defs>
> > > + <sodipodi:namedview
> > > + id="base"
> > > + pagecolor="#ffffff"
> > > + bordercolor="#666666"
> > > + borderopacity="1.0"
> > > + inkscape:pageopacity="0.0"
> > > + inkscape:pageshadow="2"
> > > + inkscape:zoom="0.98994949"
> > > + inkscape:cx="457.47339"
> > > + inkscape:cy="250.14781"
> > > + inkscape:document-units="px"
> > > + inkscape:current-layer="layer1"
> > > + showgrid="false"
> > > + inkscape:window-width="1916"
> > > + inkscape:window-height="1033"
> > > + inkscape:window-x="0"
> > > + inkscape:window-y="22"
> > > + inkscape:window-maximized="0"
> > > + fit-margin-right="0.3" />
> > > + <metadata
> > > + id="metadata7">
> > > + <rdf:RDF>
> > > + <cc:Work
> > > + rdf:about="">
> > > + <dc:format>image/svg+xml</dc:format>
> > > + <dc:type
> > > + rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
> > > + <dc:title></dc:title>
> > > + </cc:Work>
> > > + </rdf:RDF>
> > > + </metadata>
> > > + <g
> > > + inkscape:label="Layer 1"
> > > + inkscape:groupmode="layer"
> > > + id="layer1"
> > > + transform="translate(0,-641.33861)">
> > > + <rect
> > > + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5)"
> > > + id="rect4136-3-6-5"
> > > + width="101.07784"
> > > + height="31.998148"
> > > + x="128.74678"
> > > + y="80.648842"
> > > + transform="matrix(2.7384116,0,0,0.91666328,-284.06895,664.79751)" />
> > > + <rect
> > > + style="fill:url(#linearGradient5032-3);fill-opacity:1;stroke:#000000;stroke-width:1.02430749"
> > > + id="rect4136-2-6"
> > > + width="276.79272"
> > > + height="29.331528"
> > > + x="64.723419"
> > > + y="736.84473" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="78.223282"
> > > + y="756.79803"
> > > + id="text4138-6-2"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan4140-1-9"
> > > + x="78.223282"
> > > + y="756.79803"
> > > + style="font-size:15px;line-height:1.25">user application (running by the CPU</tspan></text>
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6)"
> > > + d="m 217.67507,876.6738 113.40331,45.0758"
> > > + id="path4661"
> > > + inkscape:connector-curvature="0"
> > > + sodipodi:nodetypes="cc" />
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-0)"
> > > + d="m 208.10197,767.69811 0.29362,76.03656"
> > > + id="path4661-6"
> > > + inkscape:connector-curvature="0"
> > > + sodipodi:nodetypes="cc" />
> > > + <rect
> > > + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8)"
> > > + id="rect4136-3-6-5-3"
> > > + width="101.07784"
> > > + height="31.998148"
> > > + x="128.74678"
> > > + y="80.648842"
> > > + transform="matrix(1.0104673,0,0,1.0052679,28.128628,763.90722)" />
> > > + <rect
> > > + style="fill:url(#linearGradient5032-3-8);fill-opacity:1;stroke:#000000;stroke-width:0.65159565"
> > > + id="rect4136-2-6-6"
> > > + width="102.13586"
> > > + height="32.16671"
> > > + x="156.83217"
> > > + y="842.91852" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="188.58519"
> > > + y="864.47125"
> > > + id="text4138-6-2-8"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan4140-1-9-0"
> > > + x="188.58519"
> > > + y="864.47125"
> > > + style="font-size:15px;line-height:1.25;stroke-width:1px">MMU</tspan></text>
> > > + <rect
> > > + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-5)"
> > > + id="rect4136-3-6-5-3-1"
> > > + width="101.07784"
> > > + height="31.998148"
> > > + x="128.74678"
> > > + y="80.648842"
> > > + transform="matrix(2.1450556,0,0,1.0052679,1.87637,843.51696)" />
> > > + <rect
> > > + style="fill:url(#linearGradient5032-3-8-2);fill-opacity:1;stroke:#000000;stroke-width:0.94937181"
> > > + id="rect4136-2-6-6-0"
> > > + width="216.8176"
> > > + height="32.16671"
> > > + x="275.09283"
> > > + y="922.5282" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="347.81482"
> > > + y="943.23291"
> > > + id="text4138-6-2-8-8"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan4140-1-9-0-5"
> > > + x="347.81482"
> > > + y="943.23291"
> > > + style="font-size:15px;line-height:1.25;stroke-width:1px">Memory</tspan></text>
> > > + <rect
> > > + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-6)"
> > > + id="rect4136-3-6-5-3-5"
> > > + width="101.07784"
> > > + height="31.998148"
> > > + x="128.74678"
> > > + y="80.648842"
> > > + transform="matrix(1.0104673,0,0,1.0052679,330.22737,762.9602)" />
> > > + <rect
> > > + style="fill:url(#linearGradient5032-3-8-0);fill-opacity:1;stroke:#000000;stroke-width:0.65159565"
> > > + id="rect4136-2-6-6-8"
> > > + width="102.13586"
> > > + height="32.16671"
> > > + x="458.93091"
> > > + y="841.9715" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="490.68393"
> > > + y="863.52423"
> > > + id="text4138-6-2-8-6"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan4140-1-9-0-2"
> > > + x="490.68393"
> > > + y="863.52423"
> > > + style="font-size:15px;line-height:1.25;stroke-width:1px">IOMMU</tspan></text>
> > > + <rect
> > > + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-4)"
> > > + id="rect4136-3-6-5-6"
> > > + width="101.07784"
> > > + height="31.998148"
> > > + x="128.74678"
> > > + y="80.648842"
> > > + transform="matrix(1.9884947,0,0,0.94903537,167.19229,661.38193)" />
> > > + <rect
> > > + style="fill:url(#linearGradient5032-3-84);fill-opacity:1;stroke:#000000;stroke-width:0.88813609"
> > > + id="rect4136-2-6-2"
> > > + width="200.99274"
> > > + height="30.367374"
> > > + x="420.4675"
> > > + y="735.97351" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="441.95297"
> > > + y="755.9068"
> > > + id="text4138-6-2-9"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan4140-1-9-9"
> > > + x="441.95297"
> > > + y="755.9068"
> > > + style="font-size:15px;line-height:1.25;stroke-width:1px">Hardware Accelerator</tspan></text>
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-0-0)"
> > > + d="m 508.2914,766.55885 0.29362,76.03656"
> > > + id="path4661-6-1"
> > > + inkscape:connector-curvature="0"
> > > + sodipodi:nodetypes="cc" />
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-3)"
> > > + d="M 499.70201,876.47297 361.38296,920.80258"
> > > + id="path4661-1"
> > > + inkscape:connector-curvature="0"
> > > + sodipodi:nodetypes="cc" />
> > > + </g>
> > > +</svg>
> > > diff --git a/Documentation/warpdrive/wd_q_addr_space.svg b/Documentation/warpdrive/wd_q_addr_space.svg
> > > new file mode 100644
> > > index 000000000000..5e6cf8e89908
> > > --- /dev/null
> > > +++ b/Documentation/warpdrive/wd_q_addr_space.svg
> > > @@ -0,0 +1,359 @@
> > > +<?xml version="1.0" encoding="UTF-8" standalone="no"?>
> > > +<!-- Created with Inkscape (http://www.inkscape.org/) -->
> > > +
> > > +<svg
> > > + xmlns:dc="http://purl.org/dc/elements/1.1/"
> > > + xmlns:cc="http://creativecommons.org/ns#"
> > > + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> > > + xmlns:svg="http://www.w3.org/2000/svg"
> > > + xmlns="http://www.w3.org/2000/svg"
> > > + xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
> > > + xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
> > > + width="210mm"
> > > + height="124mm"
> > > + viewBox="0 0 210 124"
> > > + version="1.1"
> > > + id="svg8"
> > > + inkscape:version="0.92.3 (2405546, 2018-03-11)"
> > > + sodipodi:docname="wd_q_addr_space.svg">
> > > + <defs
> > > + id="defs2">
> > > + <marker
> > > + inkscape:stockid="Arrow1Mend"
> > > + orient="auto"
> > > + refY="0"
> > > + refX="0"
> > > + id="marker5428"
> > > + style="overflow:visible"
> > > + inkscape:isstock="true">
> > > + <path
> > > + id="path5426"
> > > + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> > > + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> > > + transform="matrix(-0.4,0,0,-0.4,-4,0)"
> > > + inkscape:connector-curvature="0" />
> > > + </marker>
> > > + <marker
> > > + inkscape:isstock="true"
> > > + style="overflow:visible"
> > > + id="marker2922"
> > > + refX="0"
> > > + refY="0"
> > > + orient="auto"
> > > + inkscape:stockid="Arrow1Mend"
> > > + inkscape:collect="always">
> > > + <path
> > > + transform="matrix(-0.4,0,0,-0.4,-4,0)"
> > > + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> > > + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> > > + id="path2920"
> > > + inkscape:connector-curvature="0" />
> > > + </marker>
> > > + <marker
> > > + inkscape:stockid="Arrow1Mstart"
> > > + orient="auto"
> > > + refY="0"
> > > + refX="0"
> > > + id="Arrow1Mstart"
> > > + style="overflow:visible"
> > > + inkscape:isstock="true">
> > > + <path
> > > + id="path840"
> > > + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> > > + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> > > + transform="matrix(0.4,0,0,0.4,4,0)"
> > > + inkscape:connector-curvature="0" />
> > > + </marker>
> > > + <marker
> > > + inkscape:stockid="Arrow1Mend"
> > > + orient="auto"
> > > + refY="0"
> > > + refX="0"
> > > + id="Arrow1Mend"
> > > + style="overflow:visible"
> > > + inkscape:isstock="true"
> > > + inkscape:collect="always">
> > > + <path
> > > + id="path843"
> > > + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> > > + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> > > + transform="matrix(-0.4,0,0,-0.4,-4,0)"
> > > + inkscape:connector-curvature="0" />
> > > + </marker>
> > > + <marker
> > > + inkscape:stockid="Arrow1Mstart"
> > > + orient="auto"
> > > + refY="0"
> > > + refX="0"
> > > + id="Arrow1Mstart-5"
> > > + style="overflow:visible"
> > > + inkscape:isstock="true">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path840-1"
> > > + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> > > + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> > > + transform="matrix(0.4,0,0,0.4,4,0)" />
> > > + </marker>
> > > + <marker
> > > + inkscape:stockid="Arrow1Mend"
> > > + orient="auto"
> > > + refY="0"
> > > + refX="0"
> > > + id="Arrow1Mend-1"
> > > + style="overflow:visible"
> > > + inkscape:isstock="true">
> > > + <path
> > > + inkscape:connector-curvature="0"
> > > + id="path843-0"
> > > + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> > > + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> > > + transform="matrix(-0.4,0,0,-0.4,-4,0)" />
> > > + </marker>
> > > + <marker
> > > + inkscape:isstock="true"
> > > + style="overflow:visible"
> > > + id="marker2922-2"
> > > + refX="0"
> > > + refY="0"
> > > + orient="auto"
> > > + inkscape:stockid="Arrow1Mend"
> > > + inkscape:collect="always">
> > > + <path
> > > + transform="matrix(-0.4,0,0,-0.4,-4,0)"
> > > + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> > > + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> > > + id="path2920-9"
> > > + inkscape:connector-curvature="0" />
> > > + </marker>
> > > + <marker
> > > + inkscape:isstock="true"
> > > + style="overflow:visible"
> > > + id="marker2922-27"
> > > + refX="0"
> > > + refY="0"
> > > + orient="auto"
> > > + inkscape:stockid="Arrow1Mend"
> > > + inkscape:collect="always">
> > > + <path
> > > + transform="matrix(-0.4,0,0,-0.4,-4,0)"
> > > + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> > > + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> > > + id="path2920-0"
> > > + inkscape:connector-curvature="0" />
> > > + </marker>
> > > + <marker
> > > + inkscape:isstock="true"
> > > + style="overflow:visible"
> > > + id="marker2922-27-8"
> > > + refX="0"
> > > + refY="0"
> > > + orient="auto"
> > > + inkscape:stockid="Arrow1Mend"
> > > + inkscape:collect="always">
> > > + <path
> > > + transform="matrix(-0.4,0,0,-0.4,-4,0)"
> > > + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> > > + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> > > + id="path2920-0-0"
> > > + inkscape:connector-curvature="0" />
> > > + </marker>
> > > + </defs>
> > > + <sodipodi:namedview
> > > + id="base"
> > > + pagecolor="#ffffff"
> > > + bordercolor="#666666"
> > > + borderopacity="1.0"
> > > + inkscape:pageopacity="0.0"
> > > + inkscape:pageshadow="2"
> > > + inkscape:zoom="1.4"
> > > + inkscape:cx="401.66654"
> > > + inkscape:cy="218.12255"
> > > + inkscape:document-units="mm"
> > > + inkscape:current-layer="layer1"
> > > + showgrid="false"
> > > + inkscape:window-width="1916"
> > > + inkscape:window-height="1033"
> > > + inkscape:window-x="0"
> > > + inkscape:window-y="22"
> > > + inkscape:window-maximized="0" />
> > > + <metadata
> > > + id="metadata5">
> > > + <rdf:RDF>
> > > + <cc:Work
> > > + rdf:about="">
> > > + <dc:format>image/svg+xml</dc:format>
> > > + <dc:type
> > > + rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
> > > + <dc:title />
> > > + </cc:Work>
> > > + </rdf:RDF>
> > > + </metadata>
> > > + <g
> > > + inkscape:label="Layer 1"
> > > + inkscape:groupmode="layer"
> > > + id="layer1"
> > > + transform="translate(0,-173)">
> > > + <rect
> > > + style="opacity:0.82999998;fill:none;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:0.4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:0.82745098"
> > > + id="rect815"
> > > + width="21.262758"
> > > + height="40.350552"
> > > + x="55.509361"
> > > + y="195.00098"
> > > + ry="0" />
> > > + <rect
> > > + style="opacity:0.82999998;fill:none;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:0.4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:0.82745098"
> > > + id="rect815-1"
> > > + width="21.24276"
> > > + height="43.732346"
> > > + x="55.519352"
> > > + y="235.26543"
> > > + ry="0" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="50.549229"
> > > + y="190.6078"
> > > + id="text1118"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan1116"
> > > + x="50.549229"
> > > + y="190.6078"
> > > + style="stroke-width:0.26458332px">queue file address space</tspan></text>
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + d="M 76.818568,194.95453 H 97.229281"
> > > + id="path1126"
> > > + inkscape:connector-curvature="0"
> > > + sodipodi:nodetypes="cc" />
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + d="M 76.818568,235.20899 H 96.095361"
> > > + id="path1126-8"
> > > + inkscape:connector-curvature="0" />
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + d="m 76.762111,278.99778 h 19.27678"
> > > + id="path1126-0"
> > > + inkscape:connector-curvature="0" />
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + d="m 55.519355,265.20165 v 19.27678"
> > > + id="path1126-2"
> > > + inkscape:connector-curvature="0" />
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + d="m 76.762111,265.20165 v 19.27678"
> > > + id="path1126-2-1"
> > > + inkscape:connector-curvature="0" />
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-start:url(#Arrow1Mstart);marker-end:url(#Arrow1Mend)"
> > > + d="m 87.590896,194.76554 0,39.87648"
> > > + id="path1126-2-1-0"
> > > + inkscape:connector-curvature="0"
> > > + sodipodi:nodetypes="cc" />
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-start:url(#Arrow1Mstart-5);marker-end:url(#Arrow1Mend-1)"
> > > + d="m 82.48822,235.77596 v 42.90029"
> > > + id="path1126-2-1-0-8"
> > > + inkscape:connector-curvature="0"
> > > + sodipodi:nodetypes="cc" />
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker2922)"
> > > + d="M 44.123633,195.3325 H 55.651907"
> > > + id="path2912"
> > > + inkscape:connector-curvature="0" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="32.217381"
> > > + y="196.27745"
> > > + id="text2968"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan2966"
> > > + x="32.217381"
> > > + y="196.27745"
> > > + style="stroke-width:0.26458332px">offset 0</tspan></text>
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="91.199554"
> > > + y="216.03946"
> > > + id="text1118-5"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan1116-0"
> > > + x="91.199554"
> > > + y="216.03946"
> > > + style="stroke-width:0.26458332px">device region (mapped to device mmio or shared kernel driver memory)</tspan></text>
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="86.188072"
> > > + y="244.50081"
> > > + id="text1118-5-6"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan1116-0-4"
> > > + x="86.188072"
> > > + y="244.50081"
> > > + style="stroke-width:0.26458332px">static share virtual memory region (for device without share virtual memory)</tspan></text>
> > > + <flowRoot
> > > + xml:space="preserve"
> > > + id="flowRoot5699"
> > > + style="font-style:normal;font-weight:normal;font-size:11.25px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"><flowRegion
> > > + id="flowRegion5701"><rect
> > > + id="rect5703"
> > > + width="5182.8569"
> > > + height="385.71429"
> > > + x="34.285713"
> > > + y="71.09111" /></flowRegion><flowPara
> > > + id="flowPara5705" /></flowRoot> <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker2922-2)"
> > > + d="M 43.679028,206.85268 H 55.207302"
> > > + id="path2912-1"
> > > + inkscape:connector-curvature="0" />
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker2922-27)"
> > > + d="M 44.057004,224.23959 H 55.585278"
> > > + id="path2912-9"
> > > + inkscape:connector-curvature="0" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="24.139778"
> > > + y="202.40636"
> > > + id="text1118-5-3"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan1116-0-6"
> > > + x="24.139778"
> > > + y="202.40636"
> > > + style="stroke-width:0.26458332px">device mmio region</tspan></text>
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="17.010948"
> > > + y="216.73672"
> > > + id="text1118-5-3-3"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan1116-0-6-6"
> > > + x="17.010948"
> > > + y="216.73672"
> > > + style="stroke-width:0.26458332px">device kernel only region</tspan></text>
> > > + <path
> > > + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker2922-27-8)"
> > > + d="M 43.981087,235.35153 H 55.509361"
> > > + id="path2912-9-2"
> > > + inkscape:connector-curvature="0" />
> > > + <text
> > > + xml:space="preserve"
> > > + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> > > + x="17.575975"
> > > + y="230.53285"
> > > + id="text1118-5-3-3-0"><tspan
> > > + sodipodi:role="line"
> > > + id="tspan1116-0-6-6-5"
> > > + x="17.575975"
> > > + y="230.53285"
> > > + style="stroke-width:0.26458332px">device user share region</tspan></text>
> > > + </g>
> > > +</svg>
> > > --
> > > 2.17.1
> > >


Attachments:
(No filename) (88.43 kB)
signature.asc (801.00 B)
Download all attachments

2018-11-22 13:35:59

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Wed, Nov 21, 2018 at 02:08:05PM +0800, Kenneth Lee wrote:

> > But considering Jean's SVA stuff seems based on mmu notifiers, I have
> > a hard time believing that it has any different behavior from RDMA's
> > ODP, and if it does have different behavior, then it is probably just
> > a bug in the ODP implementation.
>
> As Jean has explained, his solution is based on page table sharing. I think ODP
> should also consider this new feature.

Shared page tables would require the HW to walk the page table format
of the CPU directly, not sure how that would be possible for ODP?

Presumably the implementation for ARM relies on the IOMMU hardware
doing this?

> > > > If all your driver needs is to mmap some PCI bar space, route
> > > > interrupts and do DMA mapping then mediated VFIO is probably a good
> > > > choice.
> > >
> > > Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion and
> > > try not to add complexity to the mm subsystem.
> >
> > Why would a mediated VFIO driver touch the mm subsystem? Sounds like
> > you don't have a VFIO driver if it needs to do stuff like that...
>
> VFIO has no ODP-like solution, and if we want to solve the fork problem, we have
> to make some change to iommu and the fork procedure. Further, VFIO takes every
> queue as a independent device. This create a lot of trouble on resource
> management. For example, you will need a manager process to withdraw the unused
> device and you need to let the user process know about PASID of the queue, and
> so on.

Well, I would think you'd add SVA support to the VFIO driver as a
generic capability - it seems pretty useful for any VFIO user as it
avoids all the kernel upcalls to do memory pinning and DMA address
translation.

Once the VFIO driver knows about this as a generic capability then the
device it exposes to userspace would use CPU addresses instead of DMA
addresses.

The question is if your driver needs much more than the device
agnostic generic services VFIO provides.

I'm not sure what you have in mind with resource management.. It is
hard to revoke resources from userspace, unless you are doing
kernel syscalls, but then why do all this?

Jason

2018-11-20 05:07:10

by Jerome Glisse

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 11:27:52AM -0700, Jason Gunthorpe wrote:
> On Mon, Nov 19, 2018 at 11:48:54AM -0500, Jerome Glisse wrote:
>
> > Just to comment on this, any infiniband driver which use umem and do
> > not have ODP (here ODP for me means listening to mmu notifier so all
> > infiniband driver except mlx5) will be affected by same issue AFAICT.
> >
> > AFAICT there is no special thing happening after fork() inside any of
> > those driver. So if parent create a umem mr before fork() and program
> > hardware with it then after fork() the parent might start using new
> > page for the umem range while the old memory is use by the child. The
> > reverse is also true (parent using old memory and child new memory)
> > bottom line you can not predict which memory the child or the parent
> > will use for the range after fork().
> >
> > So no matter what you consider the child or the parent, what the hw
> > will use for the mr is unlikely to match what the CPU use for the
> > same virtual address. In other word:
> >
> > Before fork:
> > CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
> > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
> >
> > Case 1:
> > CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
> > CPU child: virtual addr ptr1 -> physical address = 0xDEAD
> > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
> >
> > Case 2:
> > CPU parent: virtual addr ptr1 -> physical address = 0xBEEF
> > CPU child: virtual addr ptr1 -> physical address = 0xCAFE
> > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
>
> IIRC this is solved in IB by automatically calling
> madvise(MADV_DONTFORK) before creating the MR.
>
> MADV_DONTFORK
> .. This is useful to prevent copy-on-write semantics from changing the
> physical location of a page if the parent writes to it after a
> fork(2) ..

This would work around the issue but this is not transparent ie
range marked with DONTFORK no longer behave as expected from the
application point of view.

Also it relies on userspace doing the right thing (which is not
something i usualy trust :)).

Cheers,
J?r?me

2018-11-16 01:03:08

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote:
> On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote:
> > Date: Wed, 14 Nov 2018 18:00:17 +0200
> > From: Leon Romanovsky <[email protected]>
> > To: Kenneth Lee <[email protected]>
> > CC: Tim Sell <[email protected]>, [email protected],
> > Alexander Shishkin <[email protected]>, Zaibo Xu
> > <[email protected]>, [email protected], [email protected],
> > [email protected], Christoph Lameter <[email protected]>, Hao Fang
> > <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> > list <[email protected]>, Zhou Wang <[email protected]>,
> > Jason Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> > Kleine-König <[email protected]>, David Kershner
> > <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> > Pitchen <[email protected]>, Sagar Dharia
> > <[email protected]>, Jens Axboe <[email protected]>,
> > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > <[email protected]>, [email protected], Vinod Koul
> > <[email protected]>, [email protected], Philippe Ombredanne
> > <[email protected]>, Sanyog Kale <[email protected]>, Kenneth Lee
> > <[email protected]>, "David S. Miller" <[email protected]>,
> > [email protected]
> > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > User-Agent: Mutt/1.10.1 (2018-07-13)
> > Message-ID: <[email protected]>
> >
> > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote:
> > >
> > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道:
> > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote:
> > > > > From: Kenneth Lee <[email protected]>
> > > > >
> > > > > WarpDrive is a general accelerator framework for the user application to
> > > > > access the hardware without going through the kernel in data path.
> > > > >
> > > > > The kernel component to provide kernel facility to driver for expose the
> > > > > user interface is called uacce. It a short name for
> > > > > "Unified/User-space-access-intended Accelerator Framework".
> > > > >
> > > > > This patch add document to explain how it works.
> > > > + RDMA and netdev folks
> > > >
> > > > Sorry, to be late in the game, I don't see other patches, but from
> > > > the description below it seems like you are reinventing RDMA verbs
> > > > model. I have hard time to see the differences in the proposed
> > > > framework to already implemented in drivers/infiniband/* for the kernel
> > > > space and for the https://github.com/linux-rdma/rdma-core/ for the user
> > > > space parts.
> > >
> > > Thanks Leon,
> > >
> > > Yes, we tried to solve similar problem in RDMA. We also learned a lot from
> > > the exist code of RDMA. But we we have to make a new one because we cannot
> > > register accelerators such as AI operation, encryption or compression to the
> > > RDMA framework:)
> >
> > Assuming that you did everything right and still failed to use RDMA
> > framework, you was supposed to fix it and not to reinvent new exactly
> > same one. It is how we develop kernel, by reusing existing code.
>
> Yes, but we don't force other system such as NIC or GPU into RDMA, do we?

You don't introduce new NIC or GPU, but proposing another interface to
directly access HW memory and bypass kernel for the data path. This is
whole idea of RDMA and this is why it is already present in the kernel.

Various hardware devices are supported in our stack allow a ton of crazy
stuff, including GPUs interconnections and NIC functionalities.

>
> I assume you would not agree to register a zip accelerator to infiniband? :)

"infiniband" name in the "drivers/infiniband/" is legacy one and the
current code supports IB, RoCE, iWARP and OmniPath as a transport layers.
For a lone time, we wanted to rename that folder to be "drivers/rdma",
but didn't find enough brave men/women to do it, due to backport mess
for such move.

The addition of zip accelerator to RDMA is possible and depends on how
you will model such new functionality - new driver, or maybe new ULP.

>
> Further, I don't think it is wise to break an exist system (RDMA) to fulfill a
> totally new scenario. The better choice is to let them run in parallel for some
> time and try to merge them accordingly.

Awesome, so please run your code out-of-tree for now and once you are ready
for submission let's try to merge it.

>
> >
> > >
> > > Another problem we tried to address is the way to pin the memory for dma
> > > operation. The RDMA way to pin the memory cannot avoid the page lost due to
> > > copy-on-write operation during the memory is used by the device. This may
> > > not be important to RDMA library. But it is important to accelerator.
> >
> > Such support exists in drivers/infiniband/ from late 2014 and
> > it is called ODP (on demand paging).
>
> I reviewed ODP and I think it is a solution bound to infiniband. It is part of
> MR semantics and required a infiniband specific hook
> (ucontext->invalidate_range()). And the hook requires the device to be able to
> stop using the page for a while for the copying. It is ok for infiniband
> (actually, only mlx5 uses it). I don't think most accelerators can support
> this mode. But WarpDrive works fully on top of IOMMU interface, it has no this
> limitation.

1. It has nothing to do with infiniband.
2. MR and uncontext are verbs semantics and needed to ensure that host
memory exposed to user is properly protected from security point of view.
3. "stop using the page for a while for the copying" - I'm not fully
understand this claim, maybe this article will help you to better
describe : https://lwn.net/Articles/753027/
4. mlx5 supports ODP not because of being partially IB device,
but because HW performance oriented implementation is not an easy task.

>
> >
> > >
> > > Hope this can help the understanding.
> >
> > Yes, it helped me a lot.
> > Now, I'm more than before convinced that this whole patchset shouldn't
> > exist in the first place.
>
> Then maybe you can tell me how I can register my accelerator to the user space?

Write kernel driver and write user space part of it.
https://github.com/linux-rdma/rdma-core/

I have no doubts that your colleagues who wrote and maintain
drivers/infiniband/hw/hns driver know best how to do it.
They did it very successfully.

Thanks

>
> >
> > To be clear, NAK.
> >
> > Thanks
> >
> > >
> > > Cheers
> > >
> > > >
> > > > Hard NAK from RDMA side.
> > > >
> > > > Thanks
> > > >
> > > > > Signed-off-by: Kenneth Lee <[email protected]>
> > > > > ---
> > > > > Documentation/warpdrive/warpdrive.rst | 260 +++++++
> > > > > Documentation/warpdrive/wd-arch.svg | 764 ++++++++++++++++++++
> > > > > Documentation/warpdrive/wd.svg | 526 ++++++++++++++
> > > > > Documentation/warpdrive/wd_q_addr_space.svg | 359 +++++++++
> > > > > 4 files changed, 1909 insertions(+)
> > > > > create mode 100644 Documentation/warpdrive/warpdrive.rst
> > > > > create mode 100644 Documentation/warpdrive/wd-arch.svg
> > > > > create mode 100644 Documentation/warpdrive/wd.svg
> > > > > create mode 100644 Documentation/warpdrive/wd_q_addr_space.svg
> > > > >
> > > > > diff --git a/Documentation/warpdrive/warpdrive.rst b/Documentation/warpdrive/warpdrive.rst
> > > > > new file mode 100644
> > > > > index 000000000000..ef84d3a2d462
> > > > > --- /dev/null
> > > > > +++ b/Documentation/warpdrive/warpdrive.rst
> > > > > @@ -0,0 +1,260 @@
> > > > > +Introduction of WarpDrive
> > > > > +=========================
> > > > > +
> > > > > +*WarpDrive* is a general accelerator framework for the user application to
> > > > > +access the hardware without going through the kernel in data path.
> > > > > +
> > > > > +It can be used as the quick channel for accelerators, network adaptors or
> > > > > +other hardware for application in user space.
> > > > > +
> > > > > +This may make some implementation simpler. E.g. you can reuse most of the
> > > > > +*netdev* driver in kernel and just share some ring buffer to the user space
> > > > > +driver for *DPDK* [4] or *ODP* [5]. Or you can combine the RSA accelerator with
> > > > > +the *netdev* in the user space as a https reversed proxy, etc.
> > > > > +
> > > > > +*WarpDrive* takes the hardware accelerator as a heterogeneous processor which
> > > > > +can share particular load from the CPU:
> > > > > +
> > > > > +.. image:: wd.svg
> > > > > + :alt: WarpDrive Concept
> > > > > +
> > > > > +The virtual concept, queue, is used to manage the requests sent to the
> > > > > +accelerator. The application send requests to the queue by writing to some
> > > > > +particular address, while the hardware takes the requests directly from the
> > > > > +address and send feedback accordingly.
> > > > > +
> > > > > +The format of the queue may differ from hardware to hardware. But the
> > > > > +application need not to make any system call for the communication.
> > > > > +
> > > > > +*WarpDrive* tries to create a shared virtual address space for all involved
> > > > > +accelerators. Within this space, the requests sent to queue can refer to any
> > > > > +virtual address, which will be valid to the application and all involved
> > > > > +accelerators.
> > > > > +
> > > > > +The name *WarpDrive* is simply a cool and general name meaning the framework
> > > > > +makes the application faster. It includes general user library, kernel
> > > > > +management module and drivers for the hardware. In kernel, the management
> > > > > +module is called *uacce*, meaning "Unified/User-space-access-intended
> > > > > +Accelerator Framework".
> > > > > +
> > > > > +
> > > > > +How does it work
> > > > > +================
> > > > > +
> > > > > +*WarpDrive* uses *mmap* and *IOMMU* to play the trick.
> > > > > +
> > > > > +*Uacce* creates a chrdev for the device registered to it. A "queue" will be
> > > > > +created when the chrdev is opened. The application access the queue by mmap
> > > > > +different address region of the queue file.
> > > > > +
> > > > > +The following figure demonstrated the queue file address space:
> > > > > +
> > > > > +.. image:: wd_q_addr_space.svg
> > > > > + :alt: WarpDrive Queue Address Space
> > > > > +
> > > > > +The first region of the space, device region, is used for the application to
> > > > > +write request or read answer to or from the hardware.
> > > > > +
> > > > > +Normally, there can be three types of device regions mmio and memory regions.
> > > > > +It is recommended to use common memory for request/answer descriptors and use
> > > > > +the mmio space for device notification, such as doorbell. But of course, this
> > > > > +is all up to the interface designer.
> > > > > +
> > > > > +There can be two types of device memory regions, kernel-only and user-shared.
> > > > > +This will be explained in the "kernel APIs" section.
> > > > > +
> > > > > +The Static Share Virtual Memory region is necessary only when the device IOMMU
> > > > > +does not support "Share Virtual Memory". This will be explained after the
> > > > > +*IOMMU* idea.
> > > > > +
> > > > > +
> > > > > +Architecture
> > > > > +------------
> > > > > +
> > > > > +The full *WarpDrive* architecture is represented in the following class
> > > > > +diagram:
> > > > > +
> > > > > +.. image:: wd-arch.svg
> > > > > + :alt: WarpDrive Architecture
> > > > > +
> > > > > +
> > > > > +The user API
> > > > > +------------
> > > > > +
> > > > > +We adopt a polling style interface in the user space: ::
> > > > > +
> > > > > + int wd_request_queue(struct wd_queue *q);
> > > > > + void wd_release_queue(struct wd_queue *q);
> > > > > +
> > > > > + int wd_send(struct wd_queue *q, void *req);
> > > > > + int wd_recv(struct wd_queue *q, void **req);
> > > > > + int wd_recv_sync(struct wd_queue *q, void **req);
> > > > > + void wd_flush(struct wd_queue *q);
> > > > > +
> > > > > +wd_recv_sync() is a wrapper to its non-sync version. It will trapped into
> > > > > +kernel and waits until the queue become available.
> > > > > +
> > > > > +If the queue do not support SVA/SVM. The following helper function
> > > > > +can be used to create Static Virtual Share Memory: ::
> > > > > +
> > > > > + void *wd_preserve_share_memory(struct wd_queue *q, size_t size);
> > > > > +
> > > > > +The user API is not mandatory. It is simply a suggestion and hint what the
> > > > > +kernel interface is supposed to support.
> > > > > +
> > > > > +
> > > > > +The user driver
> > > > > +---------------
> > > > > +
> > > > > +The queue file mmap space will need a user driver to wrap the communication
> > > > > +protocol. *UACCE* provides some attributes in sysfs for the user driver to
> > > > > +match the right accelerator accordingly.
> > > > > +
> > > > > +The *UACCE* device attribute is under the following directory:
> > > > > +
> > > > > +/sys/class/uacce/<dev-name>/params
> > > > > +
> > > > > +The following attributes is supported:
> > > > > +
> > > > > +nr_queue_remained (ro)
> > > > > + number of queue remained
> > > > > +
> > > > > +api_version (ro)
> > > > > + a string to identify the queue mmap space format and its version
> > > > > +
> > > > > +device_attr (ro)
> > > > > + attributes of the device, see UACCE_DEV_xxx flag defined in uacce.h
> > > > > +
> > > > > +numa_node (ro)
> > > > > + id of numa node
> > > > > +
> > > > > +priority (rw)
> > > > > + Priority or the device, bigger is higher
> > > > > +
> > > > > +(This is not yet implemented in RFC version)
> > > > > +
> > > > > +
> > > > > +The kernel API
> > > > > +--------------
> > > > > +
> > > > > +The *uacce* kernel API is defined in uacce.h. If the hardware support SVM/SVA,
> > > > > +The driver need only the following API functions: ::
> > > > > +
> > > > > + int uacce_register(uacce);
> > > > > + void uacce_unregister(uacce);
> > > > > + void uacce_wake_up(q);
> > > > > +
> > > > > +*uacce_wake_up* is used to notify the process who epoll() on the queue file.
> > > > > +
> > > > > +According to the IOMMU capability, *uacce* categories the devices as follow:
> > > > > +
> > > > > +UACCE_DEV_NOIOMMU
> > > > > + The device has no IOMMU. The user process cannot use VA on the hardware
> > > > > + This mode is not recommended.
> > > > > +
> > > > > +UACCE_DEV_SVA (UACCE_DEV_PASID | UACCE_DEV_FAULT_FROM_DEV)
> > > > > + The device has IOMMU which can share the same page table with user
> > > > > + process
> > > > > +
> > > > > +UACCE_DEV_SHARE_DOMAIN
> > > > > + The device has IOMMU which has no multiple page table and device page
> > > > > + fault support
> > > > > +
> > > > > +If the device works in mode other than UACCE_DEV_NOIOMMU, *uacce* will set its
> > > > > +IOMMU to IOMMU_DOMAIN_UNMANAGED. So the driver must not use any kernel
> > > > > +DMA API but the following ones from *uacce* instead: ::
> > > > > +
> > > > > + uacce_dma_map(q, va, size, prot);
> > > > > + uacce_dma_unmap(q, va, size, prot);
> > > > > +
> > > > > +*uacce_dma_map/unmap* is valid only for UACCE_DEV_SVA device. It creates a
> > > > > +particular PASID and page table for the kernel in the IOMMU (Not yet
> > > > > +implemented in the RFC)
> > > > > +
> > > > > +For the UACCE_DEV_SHARE_DOMAIN device, uacce_dma_map/unmap is not valid.
> > > > > +*Uacce* call back start_queue only when the DUS and DKO region is mmapped. The
> > > > > +accelerator driver must use those dma buffer, via uacce_queue->qfrs[], on
> > > > > +start_queue call back. The size of the queue file region is defined by
> > > > > +uacce->ops->qf_pg_start[].
> > > > > +
> > > > > +We have to do it this way because most of current IOMMU cannot support the
> > > > > +kernel and user virtual address at the same time. So we have to let them both
> > > > > +share the same user virtual address space.
> > > > > +
> > > > > +If the device have to support kernel and user at the same time, both kernel
> > > > > +and the user should use these DMA API. This is not convenient. A better
> > > > > +solution is to change the future DMA/IOMMU design to let them separate the
> > > > > +address space between the user and kernel space. But it is not going to be in
> > > > > +a short time.
> > > > > +
> > > > > +
> > > > > +Multiple processes support
> > > > > +==========================
> > > > > +
> > > > > +In the latest mainline kernel (4.19) when this document is written, the IOMMU
> > > > > +subsystem do not support multiple process page tables yet.
> > > > > +
> > > > > +Most IOMMU hardware implementation support multi-process with the concept
> > > > > +of PASID. But they may use different name, e.g. it is call sub-stream-id in
> > > > > +SMMU of ARM. With PASID or similar design, multi page table can be added to
> > > > > +the IOMMU and referred by its PASID.
> > > > > +
> > > > > +*JPB* has a patchset to enable this[1]_. We have tested it with our hardware
> > > > > +(which is known as *D06*). It works well. *WarpDrive* rely on them to support
> > > > > +UACCE_DEV_SVA. If it is not enabled, *WarpDrive* can still work. But it
> > > > > +support only one process, the device will be set to UACCE_DEV_SHARE_DOMAIN
> > > > > +even it is set to UACCE_DEV_SVA initially.
> > > > > +
> > > > > +Static Share Virtual Memory is mainly used by UACCE_DEV_SHARE_DOMAIN device.
> > > > > +
> > > > > +
> > > > > +Legacy Mode Support
> > > > > +===================
> > > > > +For the hardware without IOMMU, WarpDrive can still work, the only problem is
> > > > > +VA cannot be used in the device. The driver should adopt another strategy for
> > > > > +the shared memory. It is only for testing, and not recommended.
> > > > > +
> > > > > +
> > > > > +The Folk Scenario
> > > > > +=================
> > > > > +For a process with allocated queues and shared memory, what happen if it forks
> > > > > +a child?
> > > > > +
> > > > > +The fd of the queue will be duplicated on folk, so the child can send request
> > > > > +to the same queue as its parent. But the requests which is sent from processes
> > > > > +except for the one who open the queue will be blocked.
> > > > > +
> > > > > +It is recommended to add O_CLOEXEC to the queue file.
> > > > > +
> > > > > +The queue mmap space has a VM_DONTCOPY in its VMA. So the child will lost all
> > > > > +those VMAs.
> > > > > +
> > > > > +This is why *WarpDrive* does not adopt the mode used in *VFIO* and *InfiniBand*.
> > > > > +Both solutions can set any user pointer for hardware sharing. But they cannot
> > > > > +support fork when the dma is in process. Or the "Copy-On-Write" procedure will
> > > > > +make the parent process lost its physical pages.
> > > > > +
> > > > > +
> > > > > +The Sample Code
> > > > > +===============
> > > > > +There is a sample user land implementation with a simple driver for Hisilicon
> > > > > +Hi1620 ZIP Accelerator.
> > > > > +
> > > > > +To test, do the following in samples/warpdrive (for the case of PC host): ::
> > > > > + ./autogen.sh
> > > > > + ./conf.sh # or simply ./configure if you build on target system
> > > > > + make
> > > > > +
> > > > > +Then you can get test_hisi_zip in the test subdirectory. Copy it to the target
> > > > > +system and make sure the hisi_zip driver is enabled (the major and minor of
> > > > > +the uacce chrdev can be gotten from the dmesg or sysfs), and run: ::
> > > > > + mknod /dev/ua1 c <major> <minior>
> > > > > + test/test_hisi_zip -z < data > data.zip
> > > > > + test/test_hisi_zip -g < data > data.gzip
> > > > > +
> > > > > +
> > > > > +References
> > > > > +==========
> > > > > +.. [1] https://patchwork.kernel.org/patch/10394851/
> > > > > +
> > > > > +.. vim: tw=78
> [...]
> > > > > --
> > > > > 2.17.1
> > > > >


Attachments:
(No filename) (19.59 kB)
signature.asc (801.00 B)
Download all attachments

2018-11-20 06:37:12

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 02:46:32PM -0500, Jerome Glisse wrote:

> > ?? How can O_DIRECT be fine but RDMA not? They use exactly the same
> > get_user_pages flow, right? Can we do what O_DIRECT does in RDMA and
> > be fine too?
> >
> > AFAIK the only difference is the length of the race window. You'd have
> > to fork and fault during the shorter time O_DIRECT has get_user_pages
> > open.
>
> Well in O_DIRECT case there is only one page table, the CPU
> page table and it gets updated during fork() so there is an
> ordering there and the race window is small.

Not really, in O_DIRECT case there is another 'page table', we just
call it a DMA scatter/gather list and it is sent directly to the block
device's DMA HW. The sgl plays exactly the same role as the various HW
page list data structures that underly RDMA MRs.

It is not a page table that matters here, it is if the DMA address of
the page is active for DMA on HW.

Like you say, the only difference is that the race is hopefully small
with O_DIRECT (though that is not really small, NVMeof for instance
has windows as large as connection timeouts, if you try hard enough)

So we probably can trigger this trouble with O_DIRECT and fork(), and
I would call it a bug :(

> > Why? Keep track in each mm if there are any active get_user_pages
> > FOLL_WRITE pages in the mm, if yes then sweep the VMAs and fix the
> > issue for the FOLL_WRITE pages.
>
> This has a cost and you don't want to do it for O_DIRECT. I am pretty
> sure that any such patch to modify fork() code path would be rejected.
> At least i would not like it and vote against.

I was thinking the incremental cost on top of what John is already
doing would be very small in the common case and only be triggered in
cases that matter (which apps should avoid anyhow).

Jason

2018-11-20 19:45:28

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

+CC Jean-Phillipe and iommu list.


On Mon, 19 Nov 2018 20:29:39 -0700
Jason Gunthorpe <[email protected]> wrote:

> On Tue, Nov 20, 2018 at 11:07:02AM +0800, Kenneth Lee wrote:
> > On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote:
> > > Date: Mon, 19 Nov 2018 11:49:54 -0700
> > > From: Jason Gunthorpe <[email protected]>
> > > To: Kenneth Lee <[email protected]>
> > > CC: Leon Romanovsky <[email protected]>, Kenneth Lee <[email protected]>,
> > > Tim Sell <[email protected]>, [email protected], Alexander
> > > Shishkin <[email protected]>, Zaibo Xu
> > > <[email protected]>, [email protected], [email protected],
> > > [email protected], Christoph Lameter <[email protected]>, Hao Fang
> > > <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> > > list <[email protected]>, Zhou Wang <[email protected]>,
> > > Doug Ledford <[email protected]>, Uwe Kleine-K?nig
> > > <[email protected]>, David Kershner
> > > <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> > > Pitchen <[email protected]>, Sagar Dharia
> > > <[email protected]>, Jens Axboe <[email protected]>,
> > > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > > <[email protected]>, [email protected], Vinod Koul
> > > <[email protected]>, [email protected], Philippe Ombredanne
> > > <[email protected]>, Sanyog Kale <[email protected]>, "David S.
> > > Miller" <[email protected]>, [email protected]
> > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > User-Agent: Mutt/1.9.4 (2018-02-28)
> > > Message-ID: <[email protected]>
> > >
> > > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> > >
> > > > If the hardware cannot share page table with the CPU, we then need to have
> > > > some way to change the device page table. This is what happen in ODP. It
> > > > invalidates the page table in device upon mmu_notifier call back. But this cannot
> > > > solve the COW problem: if the user process A share a page P with device, and A
> > > > forks a new process B, and it continue to write to the page. By COW, the
> > > > process B will keep the page P, while A will get a new page P'. But you have
> > > > no way to let the device know it should use P' rather than P.
> > >
> > > Is this true? I thought mmu_notifiers covered all these cases.
> > >
> > > The mm_notifier for A should fire if B causes the physical address of
> > > A's pages to change via COW.
> > >
> > > And this causes the device page tables to re-synchronize.
> >
> > I don't see such code. The current do_cow_fault() implemenation has nothing to
> > do with mm_notifer.
>
> Well, that sure sounds like it would be a bug in mmu_notifiers..
>
> But considering Jean's SVA stuff seems based on mmu notifiers, I have
> a hard time believing that it has any different behavior from RDMA's
> ODP, and if it does have different behavior, then it is probably just
> a bug in the ODP implementation.
>
> > > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it support
> > > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you don't need
> > > > to write any code for that. Because it has been done by IOMMU framework. If it
> > >
> > > Looks like the IOMMU code uses mmu_notifier, so it is identical to
> > > IB's ODP. The only difference is that IB tends to have the IOMMU page
> > > table in the device, not in the CPU.
> > >
> > > The only case I know if that is different is the new-fangled CAPI
> > > stuff where the IOMMU can directly use the CPU's page table and the
> > > IOMMU page table (in device or CPU) is eliminated.
> >
> > Yes. We are not focusing on the current implementation. As mentioned in the
> > cover letter. We are expecting Jean Philips' SVA patch:
> > git://linux-arm.org/linux-jpb.
>
> This SVA stuff does not look comparable to CAPI as it still requires
> maintaining seperate IOMMU page tables.
>
> Also, those patches from Jean have a lot of references to
> mmu_notifiers (ie look at iommu_mmu_notifier).
>
> Are you really sure it is actually any different at all?
>
> > > Anyhow, I don't think a single instance of hardware should justify an
> > > entire new subsystem. Subsystems are hard to make and without multiple
> > > hardware examples there is no way to expect that it would cover any
> > > future use cases.
> >
> > Yes. That's our first expectation. We can keep it with our driver. But because
> > there is no user driver support for any accelerator in mainline kernel. Even the
> > well known QuickAssit has to be maintained out of tree. So we try to see if
> > people is interested in working together to solve the problem.
>
> Well, you should come with patches ack'ed by these other groups.
>
> > > If all your driver needs is to mmap some PCI bar space, route
> > > interrupts and do DMA mapping then mediated VFIO is probably a good
> > > choice.
> >
> > Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion and
> > try not to add complexity to the mm subsystem.
>
> Why would a mediated VFIO driver touch the mm subsystem? Sounds like
> you don't have a VFIO driver if it needs to do stuff like that...
>
> > > If it needs to do a bunch of other stuff, not related to PCI bar
> > > space, interrupts and DMA mapping (ie special code for compression,
> > > crypto, AI, whatever) then you should probably do what Jerome said and
> > > make a drivers/char/hisillicon_foo_bar.c that exposes just what your
> > > hardware does.
> >
> > Yes. If no other accelerator driver writer is interested. That is the
> > expectation:)
>
> I don't think it matters what other drivers do.
>
> If your driver does not need any other kernel code then VFIO is
> sensible. In this kind of world you will probably have a RDMA-like
> userspace driver that can bring this to a common user space API, even
> if one driver use VFIO and a different driver uses something else.
>
> > You create some connections (queues) to NIC, RSA, and AI engine. Then you got
> > data direct from the NIC and pass the pointer to RSA engine for decryption. The
> > CPU then finish some data taking or operation and then pass through to the AI
> > engine for CNN calculation....This will need a place to maintain the same
> > address space by some means.
>
> How is this any different from what we have today?
>
> SVA is not something even remotely new, IB has been doing various
> versions of it for 20 years.
>
> Jason

Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, 19 Nov 2018, Jerome Glisse wrote:

> > IIRC this is solved in IB by automatically calling
> > madvise(MADV_DONTFORK) before creating the MR.
> >
> > MADV_DONTFORK
> > .. This is useful to prevent copy-on-write semantics from changing the
> > physical location of a page if the parent writes to it after a
> > fork(2) ..
>
> This would work around the issue but this is not transparent ie
> range marked with DONTFORK no longer behave as expected from the
> application point of view.

Why would anyone expect a range registered via MR behave as normal? Device
I/O is going on into that range. Memory is already special.


> Also it relies on userspace doing the right thing (which is not
> something i usualy trust :)).

This has been established practice for 15 or so years in a couple of use
cases. Again user space already has to be doing special things in order to
handle RDMA is that area.

2018-11-14 12:59:44

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce


在 2018/11/13 上午8:23, Leon Romanovsky 写道:
> On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote:
>> From: Kenneth Lee <[email protected]>
>>
>> WarpDrive is a general accelerator framework for the user application to
>> access the hardware without going through the kernel in data path.
>>
>> The kernel component to provide kernel facility to driver for expose the
>> user interface is called uacce. It a short name for
>> "Unified/User-space-access-intended Accelerator Framework".
>>
>> This patch add document to explain how it works.
> + RDMA and netdev folks
>
> Sorry, to be late in the game, I don't see other patches, but from
> the description below it seems like you are reinventing RDMA verbs
> model. I have hard time to see the differences in the proposed
> framework to already implemented in drivers/infiniband/* for the kernel
> space and for the https://github.com/linux-rdma/rdma-core/ for the user
> space parts.

Thanks Leon,

Yes, we tried to solve similar problem in RDMA. We also learned a lot
from the exist code of RDMA. But we we have to make a new one because we
cannot register accelerators such as AI operation, encryption or
compression to the RDMA framework:)

Another problem we tried to address is the way to pin the memory for dma
operation. The RDMA way to pin the memory cannot avoid the page lost due
to copy-on-write operation during the memory is used by the device. This
may not be important to RDMA library. But it is important to accelerator.

Hope this can help the understanding.

Cheers

>
> Hard NAK from RDMA side.
>
> Thanks
>
>> Signed-off-by: Kenneth Lee <[email protected]>
>> ---
>> Documentation/warpdrive/warpdrive.rst | 260 +++++++
>> Documentation/warpdrive/wd-arch.svg | 764 ++++++++++++++++++++
>> Documentation/warpdrive/wd.svg | 526 ++++++++++++++
>> Documentation/warpdrive/wd_q_addr_space.svg | 359 +++++++++
>> 4 files changed, 1909 insertions(+)
>> create mode 100644 Documentation/warpdrive/warpdrive.rst
>> create mode 100644 Documentation/warpdrive/wd-arch.svg
>> create mode 100644 Documentation/warpdrive/wd.svg
>> create mode 100644 Documentation/warpdrive/wd_q_addr_space.svg
>>
>> diff --git a/Documentation/warpdrive/warpdrive.rst b/Documentation/warpdrive/warpdrive.rst
>> new file mode 100644
>> index 000000000000..ef84d3a2d462
>> --- /dev/null
>> +++ b/Documentation/warpdrive/warpdrive.rst
>> @@ -0,0 +1,260 @@
>> +Introduction of WarpDrive
>> +=========================
>> +
>> +*WarpDrive* is a general accelerator framework for the user application to
>> +access the hardware without going through the kernel in data path.
>> +
>> +It can be used as the quick channel for accelerators, network adaptors or
>> +other hardware for application in user space.
>> +
>> +This may make some implementation simpler. E.g. you can reuse most of the
>> +*netdev* driver in kernel and just share some ring buffer to the user space
>> +driver for *DPDK* [4] or *ODP* [5]. Or you can combine the RSA accelerator with
>> +the *netdev* in the user space as a https reversed proxy, etc.
>> +
>> +*WarpDrive* takes the hardware accelerator as a heterogeneous processor which
>> +can share particular load from the CPU:
>> +
>> +.. image:: wd.svg
>> + :alt: WarpDrive Concept
>> +
>> +The virtual concept, queue, is used to manage the requests sent to the
>> +accelerator. The application send requests to the queue by writing to some
>> +particular address, while the hardware takes the requests directly from the
>> +address and send feedback accordingly.
>> +
>> +The format of the queue may differ from hardware to hardware. But the
>> +application need not to make any system call for the communication.
>> +
>> +*WarpDrive* tries to create a shared virtual address space for all involved
>> +accelerators. Within this space, the requests sent to queue can refer to any
>> +virtual address, which will be valid to the application and all involved
>> +accelerators.
>> +
>> +The name *WarpDrive* is simply a cool and general name meaning the framework
>> +makes the application faster. It includes general user library, kernel
>> +management module and drivers for the hardware. In kernel, the management
>> +module is called *uacce*, meaning "Unified/User-space-access-intended
>> +Accelerator Framework".
>> +
>> +
>> +How does it work
>> +================
>> +
>> +*WarpDrive* uses *mmap* and *IOMMU* to play the trick.
>> +
>> +*Uacce* creates a chrdev for the device registered to it. A "queue" will be
>> +created when the chrdev is opened. The application access the queue by mmap
>> +different address region of the queue file.
>> +
>> +The following figure demonstrated the queue file address space:
>> +
>> +.. image:: wd_q_addr_space.svg
>> + :alt: WarpDrive Queue Address Space
>> +
>> +The first region of the space, device region, is used for the application to
>> +write request or read answer to or from the hardware.
>> +
>> +Normally, there can be three types of device regions mmio and memory regions.
>> +It is recommended to use common memory for request/answer descriptors and use
>> +the mmio space for device notification, such as doorbell. But of course, this
>> +is all up to the interface designer.
>> +
>> +There can be two types of device memory regions, kernel-only and user-shared.
>> +This will be explained in the "kernel APIs" section.
>> +
>> +The Static Share Virtual Memory region is necessary only when the device IOMMU
>> +does not support "Share Virtual Memory". This will be explained after the
>> +*IOMMU* idea.
>> +
>> +
>> +Architecture
>> +------------
>> +
>> +The full *WarpDrive* architecture is represented in the following class
>> +diagram:
>> +
>> +.. image:: wd-arch.svg
>> + :alt: WarpDrive Architecture
>> +
>> +
>> +The user API
>> +------------
>> +
>> +We adopt a polling style interface in the user space: ::
>> +
>> + int wd_request_queue(struct wd_queue *q);
>> + void wd_release_queue(struct wd_queue *q);
>> +
>> + int wd_send(struct wd_queue *q, void *req);
>> + int wd_recv(struct wd_queue *q, void **req);
>> + int wd_recv_sync(struct wd_queue *q, void **req);
>> + void wd_flush(struct wd_queue *q);
>> +
>> +wd_recv_sync() is a wrapper to its non-sync version. It will trapped into
>> +kernel and waits until the queue become available.
>> +
>> +If the queue do not support SVA/SVM. The following helper function
>> +can be used to create Static Virtual Share Memory: ::
>> +
>> + void *wd_preserve_share_memory(struct wd_queue *q, size_t size);
>> +
>> +The user API is not mandatory. It is simply a suggestion and hint what the
>> +kernel interface is supposed to support.
>> +
>> +
>> +The user driver
>> +---------------
>> +
>> +The queue file mmap space will need a user driver to wrap the communication
>> +protocol. *UACCE* provides some attributes in sysfs for the user driver to
>> +match the right accelerator accordingly.
>> +
>> +The *UACCE* device attribute is under the following directory:
>> +
>> +/sys/class/uacce/<dev-name>/params
>> +
>> +The following attributes is supported:
>> +
>> +nr_queue_remained (ro)
>> + number of queue remained
>> +
>> +api_version (ro)
>> + a string to identify the queue mmap space format and its version
>> +
>> +device_attr (ro)
>> + attributes of the device, see UACCE_DEV_xxx flag defined in uacce.h
>> +
>> +numa_node (ro)
>> + id of numa node
>> +
>> +priority (rw)
>> + Priority or the device, bigger is higher
>> +
>> +(This is not yet implemented in RFC version)
>> +
>> +
>> +The kernel API
>> +--------------
>> +
>> +The *uacce* kernel API is defined in uacce.h. If the hardware support SVM/SVA,
>> +The driver need only the following API functions: ::
>> +
>> + int uacce_register(uacce);
>> + void uacce_unregister(uacce);
>> + void uacce_wake_up(q);
>> +
>> +*uacce_wake_up* is used to notify the process who epoll() on the queue file.
>> +
>> +According to the IOMMU capability, *uacce* categories the devices as follow:
>> +
>> +UACCE_DEV_NOIOMMU
>> + The device has no IOMMU. The user process cannot use VA on the hardware
>> + This mode is not recommended.
>> +
>> +UACCE_DEV_SVA (UACCE_DEV_PASID | UACCE_DEV_FAULT_FROM_DEV)
>> + The device has IOMMU which can share the same page table with user
>> + process
>> +
>> +UACCE_DEV_SHARE_DOMAIN
>> + The device has IOMMU which has no multiple page table and device page
>> + fault support
>> +
>> +If the device works in mode other than UACCE_DEV_NOIOMMU, *uacce* will set its
>> +IOMMU to IOMMU_DOMAIN_UNMANAGED. So the driver must not use any kernel
>> +DMA API but the following ones from *uacce* instead: ::
>> +
>> + uacce_dma_map(q, va, size, prot);
>> + uacce_dma_unmap(q, va, size, prot);
>> +
>> +*uacce_dma_map/unmap* is valid only for UACCE_DEV_SVA device. It creates a
>> +particular PASID and page table for the kernel in the IOMMU (Not yet
>> +implemented in the RFC)
>> +
>> +For the UACCE_DEV_SHARE_DOMAIN device, uacce_dma_map/unmap is not valid.
>> +*Uacce* call back start_queue only when the DUS and DKO region is mmapped. The
>> +accelerator driver must use those dma buffer, via uacce_queue->qfrs[], on
>> +start_queue call back. The size of the queue file region is defined by
>> +uacce->ops->qf_pg_start[].
>> +
>> +We have to do it this way because most of current IOMMU cannot support the
>> +kernel and user virtual address at the same time. So we have to let them both
>> +share the same user virtual address space.
>> +
>> +If the device have to support kernel and user at the same time, both kernel
>> +and the user should use these DMA API. This is not convenient. A better
>> +solution is to change the future DMA/IOMMU design to let them separate the
>> +address space between the user and kernel space. But it is not going to be in
>> +a short time.
>> +
>> +
>> +Multiple processes support
>> +==========================
>> +
>> +In the latest mainline kernel (4.19) when this document is written, the IOMMU
>> +subsystem do not support multiple process page tables yet.
>> +
>> +Most IOMMU hardware implementation support multi-process with the concept
>> +of PASID. But they may use different name, e.g. it is call sub-stream-id in
>> +SMMU of ARM. With PASID or similar design, multi page table can be added to
>> +the IOMMU and referred by its PASID.
>> +
>> +*JPB* has a patchset to enable this[1]_. We have tested it with our hardware
>> +(which is known as *D06*). It works well. *WarpDrive* rely on them to support
>> +UACCE_DEV_SVA. If it is not enabled, *WarpDrive* can still work. But it
>> +support only one process, the device will be set to UACCE_DEV_SHARE_DOMAIN
>> +even it is set to UACCE_DEV_SVA initially.
>> +
>> +Static Share Virtual Memory is mainly used by UACCE_DEV_SHARE_DOMAIN device.
>> +
>> +
>> +Legacy Mode Support
>> +===================
>> +For the hardware without IOMMU, WarpDrive can still work, the only problem is
>> +VA cannot be used in the device. The driver should adopt another strategy for
>> +the shared memory. It is only for testing, and not recommended.
>> +
>> +
>> +The Folk Scenario
>> +=================
>> +For a process with allocated queues and shared memory, what happen if it forks
>> +a child?
>> +
>> +The fd of the queue will be duplicated on folk, so the child can send request
>> +to the same queue as its parent. But the requests which is sent from processes
>> +except for the one who open the queue will be blocked.
>> +
>> +It is recommended to add O_CLOEXEC to the queue file.
>> +
>> +The queue mmap space has a VM_DONTCOPY in its VMA. So the child will lost all
>> +those VMAs.
>> +
>> +This is why *WarpDrive* does not adopt the mode used in *VFIO* and *InfiniBand*.
>> +Both solutions can set any user pointer for hardware sharing. But they cannot
>> +support fork when the dma is in process. Or the "Copy-On-Write" procedure will
>> +make the parent process lost its physical pages.
>> +
>> +
>> +The Sample Code
>> +===============
>> +There is a sample user land implementation with a simple driver for Hisilicon
>> +Hi1620 ZIP Accelerator.
>> +
>> +To test, do the following in samples/warpdrive (for the case of PC host): ::
>> + ./autogen.sh
>> + ./conf.sh # or simply ./configure if you build on target system
>> + make
>> +
>> +Then you can get test_hisi_zip in the test subdirectory. Copy it to the target
>> +system and make sure the hisi_zip driver is enabled (the major and minor of
>> +the uacce chrdev can be gotten from the dmesg or sysfs), and run: ::
>> + mknod /dev/ua1 c <major> <minior>
>> + test/test_hisi_zip -z < data > data.zip
>> + test/test_hisi_zip -g < data > data.gzip
>> +
>> +
>> +References
>> +==========
>> +.. [1] https://patchwork.kernel.org/patch/10394851/
>> +
>> +.. vim: tw=78
>> diff --git a/Documentation/warpdrive/wd-arch.svg b/Documentation/warpdrive/wd-arch.svg
>> new file mode 100644
>> index 000000000000..e59934188443
>> --- /dev/null
>> +++ b/Documentation/warpdrive/wd-arch.svg
>> @@ -0,0 +1,764 @@
>> +<?xml version="1.0" encoding="UTF-8" standalone="no"?>
>> +<!-- Created with Inkscape (http://www.inkscape.org/) -->
>> +
>> +<svg
>> + xmlns:dc="http://purl.org/dc/elements/1.1/"
>> + xmlns:cc="http://creativecommons.org/ns#"
>> + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>> + xmlns:svg="http://www.w3.org/2000/svg"
>> + xmlns="http://www.w3.org/2000/svg"
>> + xmlns:xlink="http://www.w3.org/1999/xlink"
>> + xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
>> + xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
>> + width="210mm"
>> + height="193mm"
>> + viewBox="0 0 744.09449 683.85823"
>> + id="svg2"
>> + version="1.1"
>> + inkscape:version="0.92.3 (2405546, 2018-03-11)"
>> + sodipodi:docname="wd-arch.svg">
>> + <defs
>> + id="defs4">
>> + <linearGradient
>> + inkscape:collect="always"
>> + id="linearGradient6830">
>> + <stop
>> + style="stop-color:#000000;stop-opacity:1;"
>> + offset="0"
>> + id="stop6832" />
>> + <stop
>> + style="stop-color:#000000;stop-opacity:0;"
>> + offset="1"
>> + id="stop6834" />
>> + </linearGradient>
>> + <linearGradient
>> + inkscape:collect="always"
>> + xlink:href="#linearGradient5026"
>> + id="linearGradient5032"
>> + x1="353"
>> + y1="211.3622"
>> + x2="565.5"
>> + y2="174.8622"
>> + gradientUnits="userSpaceOnUse"
>> + gradientTransform="translate(-89.949614,405.94594)" />
>> + <linearGradient
>> + inkscape:collect="always"
>> + id="linearGradient5026">
>> + <stop
>> + style="stop-color:#f2f2f2;stop-opacity:1;"
>> + offset="0"
>> + id="stop5028" />
>> + <stop
>> + style="stop-color:#f2f2f2;stop-opacity:0;"
>> + offset="1"
>> + id="stop5030" />
>> + </linearGradient>
>> + <filter
>> + inkscape:collect="always"
>> + style="color-interpolation-filters:sRGB"
>> + id="filter4169-3"
>> + x="-0.031597666"
>> + width="1.0631953"
>> + y="-0.099812768"
>> + height="1.1996255">
>> + <feGaussianBlur
>> + inkscape:collect="always"
>> + stdDeviation="1.3307599"
>> + id="feGaussianBlur4171-6" />
>> + </filter>
>> + <linearGradient
>> + inkscape:collect="always"
>> + xlink:href="#linearGradient5026"
>> + id="linearGradient5032-1"
>> + x1="353"
>> + y1="211.3622"
>> + x2="565.5"
>> + y2="174.8622"
>> + gradientUnits="userSpaceOnUse"
>> + gradientTransform="translate(175.77842,400.29111)" />
>> + <filter
>> + inkscape:collect="always"
>> + style="color-interpolation-filters:sRGB"
>> + id="filter4169-3-0"
>> + x="-0.031597666"
>> + width="1.0631953"
>> + y="-0.099812768"
>> + height="1.1996255">
>> + <feGaussianBlur
>> + inkscape:collect="always"
>> + stdDeviation="1.3307599"
>> + id="feGaussianBlur4171-6-9" />
>> + </filter>
>> + <marker
>> + markerWidth="18.960653"
>> + markerHeight="11.194658"
>> + refX="9.4803267"
>> + refY="5.5973287"
>> + orient="auto"
>> + id="marker4613">
>> + <rect
>> + y="-5.1589785"
>> + x="5.8504119"
>> + height="10.317957"
>> + width="10.317957"
>> + id="rect4212"
>> + style="fill:#ffffff;stroke:#000000;stroke-width:0.69143367;stroke-miterlimit:4;stroke-dasharray:none"
>> + transform="matrix(0.86111274,0.50841405,-0.86111274,0.50841405,0,0)">
>> + <title
>> + id="title4262">generation</title>
>> + </rect>
>> + </marker>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <linearGradient
>> + inkscape:collect="always"
>> + xlink:href="#linearGradient5026"
>> + id="linearGradient5032-3-9"
>> + x1="353"
>> + y1="211.3622"
>> + x2="565.5"
>> + y2="174.8622"
>> + gradientUnits="userSpaceOnUse"
>> + gradientTransform="matrix(1.2452511,0,0,0.98513016,-190.95632,540.33156)" />
>> + <filter
>> + inkscape:collect="always"
>> + style="color-interpolation-filters:sRGB"
>> + id="filter4169-3-5-8"
>> + x="-0.031597666"
>> + width="1.0631953"
>> + y="-0.099812768"
>> + height="1.1996255">
>> + <feGaussianBlur
>> + inkscape:collect="always"
>> + stdDeviation="1.3307599"
>> + id="feGaussianBlur4171-6-3-9" />
>> + </filter>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-2">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-9"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-2-1">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-9-9"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <linearGradient
>> + inkscape:collect="always"
>> + xlink:href="#linearGradient5026"
>> + id="linearGradient5032-3-9-7"
>> + x1="353"
>> + y1="211.3622"
>> + x2="565.5"
>> + y2="174.8622"
>> + gradientUnits="userSpaceOnUse"
>> + gradientTransform="matrix(1.3742742,0,0,0.97786398,-234.52617,654.63367)" />
>> + <filter
>> + inkscape:collect="always"
>> + style="color-interpolation-filters:sRGB"
>> + id="filter4169-3-5-8-5"
>> + x="-0.031597666"
>> + width="1.0631953"
>> + y="-0.099812768"
>> + height="1.1996255">
>> + <feGaussianBlur
>> + inkscape:collect="always"
>> + stdDeviation="1.3307599"
>> + id="feGaussianBlur4171-6-3-9-0" />
>> + </filter>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-2-6">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-9-1"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <linearGradient
>> + inkscape:collect="always"
>> + xlink:href="#linearGradient5026"
>> + id="linearGradient5032-3-9-4"
>> + x1="353"
>> + y1="211.3622"
>> + x2="565.5"
>> + y2="174.8622"
>> + gradientUnits="userSpaceOnUse"
>> + gradientTransform="matrix(1.3742912,0,0,2.0035845,-468.34428,342.56603)" />
>> + <filter
>> + inkscape:collect="always"
>> + style="color-interpolation-filters:sRGB"
>> + id="filter4169-3-5-8-54"
>> + x="-0.031597666"
>> + width="1.0631953"
>> + y="-0.099812768"
>> + height="1.1996255">
>> + <feGaussianBlur
>> + inkscape:collect="always"
>> + stdDeviation="1.3307599"
>> + id="feGaussianBlur4171-6-3-9-7" />
>> + </filter>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-2-1-8">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-9-9-6"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-2-1-8-8">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-9-9-6-9"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-0">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-93"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-0-2">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-93-6"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <filter
>> + inkscape:collect="always"
>> + style="color-interpolation-filters:sRGB"
>> + id="filter5382"
>> + x="-0.089695387"
>> + width="1.1793908"
>> + y="-0.10052069"
>> + height="1.2010413">
>> + <feGaussianBlur
>> + inkscape:collect="always"
>> + stdDeviation="0.86758925"
>> + id="feGaussianBlur5384" />
>> + </filter>
>> + <linearGradient
>> + inkscape:collect="always"
>> + xlink:href="#linearGradient6830"
>> + id="linearGradient6836"
>> + x1="362.73923"
>> + y1="700.04059"
>> + x2="340.4751"
>> + y2="678.25488"
>> + gradientUnits="userSpaceOnUse"
>> + gradientTransform="translate(-23.771026,-135.76835)" />
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-2-6-2">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-9-1-9"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <linearGradient
>> + inkscape:collect="always"
>> + xlink:href="#linearGradient5026"
>> + id="linearGradient5032-3-9-7-3"
>> + x1="353"
>> + y1="211.3622"
>> + x2="565.5"
>> + y2="174.8622"
>> + gradientUnits="userSpaceOnUse"
>> + gradientTransform="matrix(1.3742742,0,0,0.97786395,-57.357186,649.55786)" />
>> + <filter
>> + inkscape:collect="always"
>> + style="color-interpolation-filters:sRGB"
>> + id="filter4169-3-5-8-5-0"
>> + x="-0.031597666"
>> + width="1.0631953"
>> + y="-0.099812768"
>> + height="1.1996255">
>> + <feGaussianBlur
>> + inkscape:collect="always"
>> + stdDeviation="1.3307599"
>> + id="feGaussianBlur4171-6-3-9-0-2" />
>> + </filter>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-2-1-1">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-9-9-0"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + </defs>
>> + <sodipodi:namedview
>> + id="base"
>> + pagecolor="#ffffff"
>> + bordercolor="#666666"
>> + borderopacity="1.0"
>> + inkscape:pageopacity="0.0"
>> + inkscape:pageshadow="2"
>> + inkscape:zoom="0.98994949"
>> + inkscape:cx="222.32868"
>> + inkscape:cy="370.44492"
>> + inkscape:document-units="px"
>> + inkscape:current-layer="layer1"
>> + showgrid="false"
>> + inkscape:window-width="1916"
>> + inkscape:window-height="1033"
>> + inkscape:window-x="0"
>> + inkscape:window-y="22"
>> + inkscape:window-maximized="0"
>> + fit-margin-right="0.3"
>> + inkscape:snap-global="false" />
>> + <metadata
>> + id="metadata7">
>> + <rdf:RDF>
>> + <cc:Work
>> + rdf:about="">
>> + <dc:format>image/svg+xml</dc:format>
>> + <dc:type
>> + rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
>> + <dc:title />
>> + </cc:Work>
>> + </rdf:RDF>
>> + </metadata>
>> + <g
>> + inkscape:label="Layer 1"
>> + inkscape:groupmode="layer"
>> + id="layer1"
>> + transform="translate(0,-368.50374)">
>> + <rect
>> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3)"
>> + id="rect4136-3-6"
>> + width="101.07784"
>> + height="31.998148"
>> + x="283.01144"
>> + y="588.80896" />
>> + <rect
>> + style="fill:url(#linearGradient5032);fill-opacity:1;stroke:#000000;stroke-width:0.6465112"
>> + id="rect4136-2"
>> + width="101.07784"
>> + height="31.998148"
>> + x="281.63498"
>> + y="586.75739" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="294.21747"
>> + y="612.50073"
>> + id="text4138-6"><tspan
>> + sodipodi:role="line"
>> + id="tspan4140-1"
>> + x="294.21747"
>> + y="612.50073"
>> + style="font-size:15px;line-height:1.25">WarpDrive</tspan></text>
>> + <rect
>> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-0)"
>> + id="rect4136-3-6-3"
>> + width="101.07784"
>> + height="31.998148"
>> + x="548.7395"
>> + y="583.15417" />
>> + <rect
>> + style="fill:url(#linearGradient5032-1);fill-opacity:1;stroke:#000000;stroke-width:0.6465112"
>> + id="rect4136-2-60"
>> + width="101.07784"
>> + height="31.998148"
>> + x="547.36304"
>> + y="581.1026" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="557.83484"
>> + y="602.32745"
>> + id="text4138-6-6"><tspan
>> + sodipodi:role="line"
>> + id="tspan4140-1-2"
>> + x="557.83484"
>> + y="602.32745"
>> + style="font-size:15px;line-height:1.25">user_driver</tspan></text>
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4613)"
>> + d="m 547.36304,600.78954 -156.58203,0.0691"
>> + id="path4855"
>> + inkscape:connector-curvature="0"
>> + sodipodi:nodetypes="cc" />
>> + <rect
>> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8)"
>> + id="rect4136-3-6-5-7"
>> + width="101.07784"
>> + height="31.998148"
>> + x="128.74678"
>> + y="80.648842"
>> + transform="matrix(1.2452511,0,0,0.98513016,113.15182,641.02594)" />
>> + <rect
>> + style="fill:url(#linearGradient5032-3-9);fill-opacity:1;stroke:#000000;stroke-width:0.71606314"
>> + id="rect4136-2-6-3"
>> + width="125.86729"
>> + height="31.522341"
>> + x="271.75983"
>> + y="718.45435" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="309.13705"
>> + y="745.55371"
>> + id="text4138-6-2-6"><tspan
>> + sodipodi:role="line"
>> + id="tspan4140-1-9-1"
>> + x="309.13705"
>> + y="745.55371"
>> + style="font-size:15px;line-height:1.25">uacce</tspan></text>
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2)"
>> + d="m 329.57309,619.72453 5.0373,97.14447"
>> + id="path4661-3"
>> + inkscape:connector-curvature="0"
>> + sodipodi:nodetypes="cc" />
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-1)"
>> + d="m 342.57219,830.63108 -5.67699,-79.2841"
>> + id="path4661-3-4"
>> + inkscape:connector-curvature="0"
>> + sodipodi:nodetypes="cc" />
>> + <rect
>> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-5)"
>> + id="rect4136-3-6-5-7-3"
>> + width="101.07784"
>> + height="31.998148"
>> + x="128.74678"
>> + y="80.648842"
>> + transform="matrix(1.3742742,0,0,0.97786398,101.09126,754.58534)" />
>> + <rect
>> + style="fill:url(#linearGradient5032-3-9-7);fill-opacity:1;stroke:#000000;stroke-width:0.74946606"
>> + id="rect4136-2-6-3-6"
>> + width="138.90866"
>> + height="31.289837"
>> + x="276.13297"
>> + y="831.44263" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="295.67819"
>> + y="852.98224"
>> + id="text4138-6-2-6-1"><tspan
>> + sodipodi:role="line"
>> + id="tspan4140-1-9-1-0"
>> + x="295.67819"
>> + y="852.98224"
>> + style="font-size:15px;line-height:1.25">Device Driver</tspan></text>
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-6)"
>> + d="m 623.05084,615.00104 0.51369,333.80219"
>> + id="path4661-3-5"
>> + inkscape:connector-curvature="0"
>> + sodipodi:nodetypes="cc" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="392.63568"
>> + y="660.83667"
>> + id="text4138-6-2-6-1-6-2-5"><tspan
>> + sodipodi:role="line"
>> + x="392.63568"
>> + y="660.83667"
>> + id="tspan4305"
>> + style="font-size:15px;line-height:1.25">&lt;&lt;anom_file&gt;&gt;</tspan><tspan
>> + sodipodi:role="line"
>> + x="392.63568"
>> + y="679.58667"
>> + style="font-size:15px;line-height:1.25"
>> + id="tspan1139">Queue FD</tspan></text>
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="389.92969"
>> + y="587.44836"
>> + id="text4138-6-2-6-1-6-2-56"><tspan
>> + sodipodi:role="line"
>> + id="tspan4140-1-9-1-0-3-0-9"
>> + x="389.92969"
>> + y="587.44836"
>> + style="font-size:15px;line-height:1.25">1</tspan></text>
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="528.64813"
>> + y="600.08429"
>> + id="text4138-6-2-6-1-6-3"><tspan
>> + sodipodi:role="line"
>> + id="tspan4140-1-9-1-0-3-7"
>> + x="528.64813"
>> + y="600.08429"
>> + style="font-size:15px;line-height:1.25">*</tspan></text>
>> + <rect
>> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-54)"
>> + id="rect4136-3-6-5-7-4"
>> + width="101.07784"
>> + height="31.998148"
>> + x="128.74678"
>> + y="80.648842"
>> + transform="matrix(1.3745874,0,0,1.8929066,-132.7754,556.04505)" />
>> + <rect
>> + style="fill:url(#linearGradient5032-3-9-4);fill-opacity:1;stroke:#000000;stroke-width:1.07280123"
>> + id="rect4136-2-6-3-4"
>> + width="138.91039"
>> + height="64.111"
>> + x="42.321312"
>> + y="704.8371" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="110.30745"
>> + y="722.94025"
>> + id="text4138-6-2-6-3"><tspan
>> + sodipodi:role="line"
>> + x="111.99202"
>> + y="722.94025"
>> + id="tspan4366"
>> + style="font-size:15px;line-height:1.25;text-align:center;text-anchor:middle">other standard </tspan><tspan
>> + sodipodi:role="line"
>> + x="110.30745"
>> + y="741.69025"
>> + id="tspan4368"
>> + style="font-size:15px;line-height:1.25;text-align:center;text-anchor:middle">framework</tspan><tspan
>> + sodipodi:role="line"
>> + x="110.30745"
>> + y="760.44025"
>> + style="font-size:15px;line-height:1.25;text-align:center;text-anchor:middle"
>> + id="tspan6840">(crypto/nic/others)</tspan></text>
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-1-8)"
>> + d="M 276.29661,849.04109 134.04449,771.90853"
>> + id="path4661-3-4-8"
>> + inkscape:connector-curvature="0"
>> + sodipodi:nodetypes="cc" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="313.70813"
>> + y="730.06366"
>> + id="text4138-6-2-6-36"><tspan
>> + sodipodi:role="line"
>> + id="tspan4140-1-9-1-7"
>> + x="313.70813"
>> + y="730.06366"
>> + style="font-size:10px;line-height:1.25">&lt;&lt;lkm&gt;&gt;</tspan></text>
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:start;letter-spacing:0px;word-spacing:0px;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="259.53165"
>> + y="797.8056"
>> + id="text4138-6-2-6-1-6-2-5-7-5"><tspan
>> + sodipodi:role="line"
>> + x="259.53165"
>> + y="797.8056"
>> + style="font-size:15px;line-height:1.25;text-align:start;text-anchor:start"
>> + id="tspan2357">uacce register api</tspan></text>
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="29.145819"
>> + y="833.44244"
>> + id="text4138-6-2-6-1-6-2-5-7-5-2"><tspan
>> + sodipodi:role="line"
>> + x="29.145819"
>> + y="833.44244"
>> + id="tspan4301"
>> + style="font-size:15px;line-height:1.25">register to other subsystem</tspan></text>
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="301.20813"
>> + y="597.29437"
>> + id="text4138-6-2-6-36-1"><tspan
>> + sodipodi:role="line"
>> + id="tspan4140-1-9-1-7-2"
>> + x="301.20813"
>> + y="597.29437"
>> + style="font-size:10px;line-height:1.25">&lt;&lt;user_lib&gt;&gt;</tspan></text>
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="615.9505"
>> + y="739.44012"
>> + id="text4138-6-2-6-1-6-2-5-3"><tspan
>> + sodipodi:role="line"
>> + x="615.9505"
>> + y="739.44012"
>> + id="tspan4274-7"
>> + style="font-size:15px;line-height:1.25">mmapped memory r/w interface</tspan></text>
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="371.01291"
>> + y="529.23682"
>> + id="text4138-6-2-6-1-6-2-5-36"><tspan
>> + sodipodi:role="line"
>> + x="371.01291"
>> + y="529.23682"
>> + id="tspan4305-3"
>> + style="font-size:15px;line-height:1.25">wd user api</tspan></text>
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + d="m 328.19325,585.87943 0,-23.57142"
>> + id="path4348"
>> + inkscape:connector-curvature="0" />
>> + <ellipse
>> + style="opacity:1;fill:#ffffff;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0"
>> + id="path4350"
>> + cx="328.01468"
>> + cy="551.95081"
>> + rx="11.607142"
>> + ry="10.357142" />
>> + <path
>> + style="opacity:0.444;fill:url(#linearGradient6836);fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;filter:url(#filter5382)"
>> + id="path4350-2"
>> + sodipodi:type="arc"
>> + sodipodi:cx="329.44327"
>> + sodipodi:cy="553.37933"
>> + sodipodi:rx="11.607142"
>> + sodipodi:ry="10.357142"
>> + sodipodi:start="0"
>> + sodipodi:end="6.2509098"
>> + d="m 341.05041,553.37933 a 11.607142,10.357142 0 0 1 -11.51349,10.35681 11.607142,10.357142 0 0 1 -11.69928,-10.18967 11.607142,10.357142 0 0 1 11.32469,-10.52124 11.607142,10.357142 0 0 1 11.88204,10.01988"
>> + sodipodi:open="true" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="619.67596"
>> + y="978.22363"
>> + id="text4138-6-2-6-1-6-2-5-36-3"><tspan
>> + sodipodi:role="line"
>> + x="619.67596"
>> + y="978.22363"
>> + id="tspan4305-3-67"
>> + style="font-size:15px;line-height:1.25">Device(Hardware)</tspan></text>
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-6-2)"
>> + d="m 347.51164,865.4527 193.91929,99.10053"
>> + id="path4661-3-5-1"
>> + inkscape:connector-curvature="0"
>> + sodipodi:nodetypes="cc" />
>> + <rect
>> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-5-0)"
>> + id="rect4136-3-6-5-7-3-1"
>> + width="101.07784"
>> + height="31.998148"
>> + x="128.74678"
>> + y="80.648842"
>> + transform="matrix(1.3742742,0,0,0.97786395,278.26025,749.50952)" />
>> + <rect
>> + style="fill:url(#linearGradient5032-3-9-7-3);fill-opacity:1;stroke:#000000;stroke-width:0.74946606"
>> + id="rect4136-2-6-3-6-0"
>> + width="138.90868"
>> + height="31.289839"
>> + x="453.30197"
>> + y="826.36682" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="493.68158"
>> + y="847.90643"
>> + id="text4138-6-2-6-1-5"><tspan
>> + sodipodi:role="line"
>> + id="tspan4140-1-9-1-0-1"
>> + x="493.68158"
>> + y="847.90643"
>> + style="font-size:15px;line-height:1.25;stroke-width:1px">IOMMU</tspan></text>
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-1-1)"
>> + d="m 389.49372,755.46667 111.75324,68.4507"
>> + id="path4661-3-4-85"
>> + inkscape:connector-curvature="0"
>> + sodipodi:nodetypes="cc" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;text-align:start;letter-spacing:0px;word-spacing:0px;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="427.70282"
>> + y="776.91418"
>> + id="text4138-6-2-6-1-6-2-5-7-5-0"><tspan
>> + sodipodi:role="line"
>> + x="427.70282"
>> + y="776.91418"
>> + style="font-size:15px;line-height:1.25;text-align:start;text-anchor:start;stroke-width:1px"
>> + id="tspan2357-6">manage the driver iommu state</tspan></text>
>> + </g>
>> +</svg>
>> diff --git a/Documentation/warpdrive/wd.svg b/Documentation/warpdrive/wd.svg
>> new file mode 100644
>> index 000000000000..87ab92ebfbc6
>> --- /dev/null
>> +++ b/Documentation/warpdrive/wd.svg
>> @@ -0,0 +1,526 @@
>> +<?xml version="1.0" encoding="UTF-8" standalone="no"?>
>> +<!-- Created with Inkscape (http://www.inkscape.org/) -->
>> +
>> +<svg
>> + xmlns:dc="http://purl.org/dc/elements/1.1/"
>> + xmlns:cc="http://creativecommons.org/ns#"
>> + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>> + xmlns:svg="http://www.w3.org/2000/svg"
>> + xmlns="http://www.w3.org/2000/svg"
>> + xmlns:xlink="http://www.w3.org/1999/xlink"
>> + xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
>> + xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
>> + width="210mm"
>> + height="116mm"
>> + viewBox="0 0 744.09449 411.02338"
>> + id="svg2"
>> + version="1.1"
>> + inkscape:version="0.92.3 (2405546, 2018-03-11)"
>> + sodipodi:docname="wd.svg">
>> + <defs
>> + id="defs4">
>> + <linearGradient
>> + inkscape:collect="always"
>> + id="linearGradient5026">
>> + <stop
>> + style="stop-color:#f2f2f2;stop-opacity:1;"
>> + offset="0"
>> + id="stop5028" />
>> + <stop
>> + style="stop-color:#f2f2f2;stop-opacity:0;"
>> + offset="1"
>> + id="stop5030" />
>> + </linearGradient>
>> + <linearGradient
>> + inkscape:collect="always"
>> + xlink:href="#linearGradient5026"
>> + id="linearGradient5032-3"
>> + x1="353"
>> + y1="211.3622"
>> + x2="565.5"
>> + y2="174.8622"
>> + gradientUnits="userSpaceOnUse"
>> + gradientTransform="matrix(2.7384117,0,0,0.91666329,-952.8283,571.10143)" />
>> + <filter
>> + inkscape:collect="always"
>> + style="color-interpolation-filters:sRGB"
>> + id="filter4169-3-5"
>> + x="-0.031597666"
>> + width="1.0631953"
>> + y="-0.099812768"
>> + height="1.1996255">
>> + <feGaussianBlur
>> + inkscape:collect="always"
>> + stdDeviation="1.3307599"
>> + id="feGaussianBlur4171-6-3" />
>> + </filter>
>> + <marker
>> + markerWidth="18.960653"
>> + markerHeight="11.194658"
>> + refX="9.4803267"
>> + refY="5.5973287"
>> + orient="auto"
>> + id="marker4613">
>> + <rect
>> + y="-5.1589785"
>> + x="5.8504119"
>> + height="10.317957"
>> + width="10.317957"
>> + id="rect4212"
>> + style="fill:#ffffff;stroke:#000000;stroke-width:0.69143367;stroke-miterlimit:4;stroke-dasharray:none"
>> + transform="matrix(0.86111274,0.50841405,-0.86111274,0.50841405,0,0)">
>> + <title
>> + id="title4262">generation</title>
>> + </rect>
>> + </marker>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <linearGradient
>> + inkscape:collect="always"
>> + xlink:href="#linearGradient5026"
>> + id="linearGradient5032-3-9"
>> + x1="353"
>> + y1="211.3622"
>> + x2="565.5"
>> + y2="174.8622"
>> + gradientUnits="userSpaceOnUse"
>> + gradientTransform="matrix(1.2452511,0,0,0.98513016,-190.95632,540.33156)" />
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-2">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-9"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-2-1">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-9-9"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-2-6">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-9-1"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-2-1-8">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-9-9-6"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-2-1-8-8">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-9-9-6-9"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-0">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-93"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-0-2">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-93-6"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-2-6-2">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-9-1-9"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <linearGradient
>> + inkscape:collect="always"
>> + xlink:href="#linearGradient5026"
>> + id="linearGradient5032-3-8"
>> + x1="353"
>> + y1="211.3622"
>> + x2="565.5"
>> + y2="174.8622"
>> + gradientUnits="userSpaceOnUse"
>> + gradientTransform="matrix(1.0104674,0,0,1.0052679,-218.642,661.15448)" />
>> + <filter
>> + inkscape:collect="always"
>> + style="color-interpolation-filters:sRGB"
>> + id="filter4169-3-5-8"
>> + x="-0.031597666"
>> + width="1.0631953"
>> + y="-0.099812768"
>> + height="1.1996255">
>> + <feGaussianBlur
>> + inkscape:collect="always"
>> + stdDeviation="1.3307599"
>> + id="feGaussianBlur4171-6-3-9" />
>> + </filter>
>> + <linearGradient
>> + inkscape:collect="always"
>> + xlink:href="#linearGradient5026"
>> + id="linearGradient5032-3-8-2"
>> + x1="353"
>> + y1="211.3622"
>> + x2="565.5"
>> + y2="174.8622"
>> + gradientUnits="userSpaceOnUse"
>> + gradientTransform="matrix(2.1450559,0,0,1.0052679,-521.97704,740.76422)" />
>> + <filter
>> + inkscape:collect="always"
>> + style="color-interpolation-filters:sRGB"
>> + id="filter4169-3-5-8-5"
>> + x="-0.031597666"
>> + width="1.0631953"
>> + y="-0.099812768"
>> + height="1.1996255">
>> + <feGaussianBlur
>> + inkscape:collect="always"
>> + stdDeviation="1.3307599"
>> + id="feGaussianBlur4171-6-3-9-1" />
>> + </filter>
>> + <linearGradient
>> + inkscape:collect="always"
>> + xlink:href="#linearGradient5026"
>> + id="linearGradient5032-3-8-0"
>> + x1="353"
>> + y1="211.3622"
>> + x2="565.5"
>> + y2="174.8622"
>> + gradientUnits="userSpaceOnUse"
>> + gradientTransform="matrix(1.0104674,0,0,1.0052679,83.456748,660.20747)" />
>> + <filter
>> + inkscape:collect="always"
>> + style="color-interpolation-filters:sRGB"
>> + id="filter4169-3-5-8-6"
>> + x="-0.031597666"
>> + width="1.0631953"
>> + y="-0.099812768"
>> + height="1.1996255">
>> + <feGaussianBlur
>> + inkscape:collect="always"
>> + stdDeviation="1.3307599"
>> + id="feGaussianBlur4171-6-3-9-2" />
>> + </filter>
>> + <linearGradient
>> + inkscape:collect="always"
>> + xlink:href="#linearGradient5026"
>> + id="linearGradient5032-3-84"
>> + x1="353"
>> + y1="211.3622"
>> + x2="565.5"
>> + y2="174.8622"
>> + gradientUnits="userSpaceOnUse"
>> + gradientTransform="matrix(1.9884948,0,0,0.94903536,-318.42665,564.37696)" />
>> + <filter
>> + inkscape:collect="always"
>> + style="color-interpolation-filters:sRGB"
>> + id="filter4169-3-5-4"
>> + x="-0.031597666"
>> + width="1.0631953"
>> + y="-0.099812768"
>> + height="1.1996255">
>> + <feGaussianBlur
>> + inkscape:collect="always"
>> + stdDeviation="1.3307599"
>> + id="feGaussianBlur4171-6-3-0" />
>> + </filter>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-0-0">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-93-8"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + <marker
>> + markerWidth="11.227358"
>> + markerHeight="12.355258"
>> + refX="10"
>> + refY="6.177629"
>> + orient="auto"
>> + id="marker4825-6-3">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path4757-1-1"
>> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
>> + </marker>
>> + </defs>
>> + <sodipodi:namedview
>> + id="base"
>> + pagecolor="#ffffff"
>> + bordercolor="#666666"
>> + borderopacity="1.0"
>> + inkscape:pageopacity="0.0"
>> + inkscape:pageshadow="2"
>> + inkscape:zoom="0.98994949"
>> + inkscape:cx="457.47339"
>> + inkscape:cy="250.14781"
>> + inkscape:document-units="px"
>> + inkscape:current-layer="layer1"
>> + showgrid="false"
>> + inkscape:window-width="1916"
>> + inkscape:window-height="1033"
>> + inkscape:window-x="0"
>> + inkscape:window-y="22"
>> + inkscape:window-maximized="0"
>> + fit-margin-right="0.3" />
>> + <metadata
>> + id="metadata7">
>> + <rdf:RDF>
>> + <cc:Work
>> + rdf:about="">
>> + <dc:format>image/svg+xml</dc:format>
>> + <dc:type
>> + rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
>> + <dc:title></dc:title>
>> + </cc:Work>
>> + </rdf:RDF>
>> + </metadata>
>> + <g
>> + inkscape:label="Layer 1"
>> + inkscape:groupmode="layer"
>> + id="layer1"
>> + transform="translate(0,-641.33861)">
>> + <rect
>> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5)"
>> + id="rect4136-3-6-5"
>> + width="101.07784"
>> + height="31.998148"
>> + x="128.74678"
>> + y="80.648842"
>> + transform="matrix(2.7384116,0,0,0.91666328,-284.06895,664.79751)" />
>> + <rect
>> + style="fill:url(#linearGradient5032-3);fill-opacity:1;stroke:#000000;stroke-width:1.02430749"
>> + id="rect4136-2-6"
>> + width="276.79272"
>> + height="29.331528"
>> + x="64.723419"
>> + y="736.84473" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="78.223282"
>> + y="756.79803"
>> + id="text4138-6-2"><tspan
>> + sodipodi:role="line"
>> + id="tspan4140-1-9"
>> + x="78.223282"
>> + y="756.79803"
>> + style="font-size:15px;line-height:1.25">user application (running by the CPU</tspan></text>
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6)"
>> + d="m 217.67507,876.6738 113.40331,45.0758"
>> + id="path4661"
>> + inkscape:connector-curvature="0"
>> + sodipodi:nodetypes="cc" />
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-0)"
>> + d="m 208.10197,767.69811 0.29362,76.03656"
>> + id="path4661-6"
>> + inkscape:connector-curvature="0"
>> + sodipodi:nodetypes="cc" />
>> + <rect
>> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8)"
>> + id="rect4136-3-6-5-3"
>> + width="101.07784"
>> + height="31.998148"
>> + x="128.74678"
>> + y="80.648842"
>> + transform="matrix(1.0104673,0,0,1.0052679,28.128628,763.90722)" />
>> + <rect
>> + style="fill:url(#linearGradient5032-3-8);fill-opacity:1;stroke:#000000;stroke-width:0.65159565"
>> + id="rect4136-2-6-6"
>> + width="102.13586"
>> + height="32.16671"
>> + x="156.83217"
>> + y="842.91852" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="188.58519"
>> + y="864.47125"
>> + id="text4138-6-2-8"><tspan
>> + sodipodi:role="line"
>> + id="tspan4140-1-9-0"
>> + x="188.58519"
>> + y="864.47125"
>> + style="font-size:15px;line-height:1.25;stroke-width:1px">MMU</tspan></text>
>> + <rect
>> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-5)"
>> + id="rect4136-3-6-5-3-1"
>> + width="101.07784"
>> + height="31.998148"
>> + x="128.74678"
>> + y="80.648842"
>> + transform="matrix(2.1450556,0,0,1.0052679,1.87637,843.51696)" />
>> + <rect
>> + style="fill:url(#linearGradient5032-3-8-2);fill-opacity:1;stroke:#000000;stroke-width:0.94937181"
>> + id="rect4136-2-6-6-0"
>> + width="216.8176"
>> + height="32.16671"
>> + x="275.09283"
>> + y="922.5282" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="347.81482"
>> + y="943.23291"
>> + id="text4138-6-2-8-8"><tspan
>> + sodipodi:role="line"
>> + id="tspan4140-1-9-0-5"
>> + x="347.81482"
>> + y="943.23291"
>> + style="font-size:15px;line-height:1.25;stroke-width:1px">Memory</tspan></text>
>> + <rect
>> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-6)"
>> + id="rect4136-3-6-5-3-5"
>> + width="101.07784"
>> + height="31.998148"
>> + x="128.74678"
>> + y="80.648842"
>> + transform="matrix(1.0104673,0,0,1.0052679,330.22737,762.9602)" />
>> + <rect
>> + style="fill:url(#linearGradient5032-3-8-0);fill-opacity:1;stroke:#000000;stroke-width:0.65159565"
>> + id="rect4136-2-6-6-8"
>> + width="102.13586"
>> + height="32.16671"
>> + x="458.93091"
>> + y="841.9715" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="490.68393"
>> + y="863.52423"
>> + id="text4138-6-2-8-6"><tspan
>> + sodipodi:role="line"
>> + id="tspan4140-1-9-0-2"
>> + x="490.68393"
>> + y="863.52423"
>> + style="font-size:15px;line-height:1.25;stroke-width:1px">IOMMU</tspan></text>
>> + <rect
>> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-4)"
>> + id="rect4136-3-6-5-6"
>> + width="101.07784"
>> + height="31.998148"
>> + x="128.74678"
>> + y="80.648842"
>> + transform="matrix(1.9884947,0,0,0.94903537,167.19229,661.38193)" />
>> + <rect
>> + style="fill:url(#linearGradient5032-3-84);fill-opacity:1;stroke:#000000;stroke-width:0.88813609"
>> + id="rect4136-2-6-2"
>> + width="200.99274"
>> + height="30.367374"
>> + x="420.4675"
>> + y="735.97351" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="441.95297"
>> + y="755.9068"
>> + id="text4138-6-2-9"><tspan
>> + sodipodi:role="line"
>> + id="tspan4140-1-9-9"
>> + x="441.95297"
>> + y="755.9068"
>> + style="font-size:15px;line-height:1.25;stroke-width:1px">Hardware Accelerator</tspan></text>
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-0-0)"
>> + d="m 508.2914,766.55885 0.29362,76.03656"
>> + id="path4661-6-1"
>> + inkscape:connector-curvature="0"
>> + sodipodi:nodetypes="cc" />
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-3)"
>> + d="M 499.70201,876.47297 361.38296,920.80258"
>> + id="path4661-1"
>> + inkscape:connector-curvature="0"
>> + sodipodi:nodetypes="cc" />
>> + </g>
>> +</svg>
>> diff --git a/Documentation/warpdrive/wd_q_addr_space.svg b/Documentation/warpdrive/wd_q_addr_space.svg
>> new file mode 100644
>> index 000000000000..5e6cf8e89908
>> --- /dev/null
>> +++ b/Documentation/warpdrive/wd_q_addr_space.svg
>> @@ -0,0 +1,359 @@
>> +<?xml version="1.0" encoding="UTF-8" standalone="no"?>
>> +<!-- Created with Inkscape (http://www.inkscape.org/) -->
>> +
>> +<svg
>> + xmlns:dc="http://purl.org/dc/elements/1.1/"
>> + xmlns:cc="http://creativecommons.org/ns#"
>> + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>> + xmlns:svg="http://www.w3.org/2000/svg"
>> + xmlns="http://www.w3.org/2000/svg"
>> + xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
>> + xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
>> + width="210mm"
>> + height="124mm"
>> + viewBox="0 0 210 124"
>> + version="1.1"
>> + id="svg8"
>> + inkscape:version="0.92.3 (2405546, 2018-03-11)"
>> + sodipodi:docname="wd_q_addr_space.svg">
>> + <defs
>> + id="defs2">
>> + <marker
>> + inkscape:stockid="Arrow1Mend"
>> + orient="auto"
>> + refY="0"
>> + refX="0"
>> + id="marker5428"
>> + style="overflow:visible"
>> + inkscape:isstock="true">
>> + <path
>> + id="path5426"
>> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
>> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
>> + transform="matrix(-0.4,0,0,-0.4,-4,0)"
>> + inkscape:connector-curvature="0" />
>> + </marker>
>> + <marker
>> + inkscape:isstock="true"
>> + style="overflow:visible"
>> + id="marker2922"
>> + refX="0"
>> + refY="0"
>> + orient="auto"
>> + inkscape:stockid="Arrow1Mend"
>> + inkscape:collect="always">
>> + <path
>> + transform="matrix(-0.4,0,0,-0.4,-4,0)"
>> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
>> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
>> + id="path2920"
>> + inkscape:connector-curvature="0" />
>> + </marker>
>> + <marker
>> + inkscape:stockid="Arrow1Mstart"
>> + orient="auto"
>> + refY="0"
>> + refX="0"
>> + id="Arrow1Mstart"
>> + style="overflow:visible"
>> + inkscape:isstock="true">
>> + <path
>> + id="path840"
>> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
>> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
>> + transform="matrix(0.4,0,0,0.4,4,0)"
>> + inkscape:connector-curvature="0" />
>> + </marker>
>> + <marker
>> + inkscape:stockid="Arrow1Mend"
>> + orient="auto"
>> + refY="0"
>> + refX="0"
>> + id="Arrow1Mend"
>> + style="overflow:visible"
>> + inkscape:isstock="true"
>> + inkscape:collect="always">
>> + <path
>> + id="path843"
>> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
>> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
>> + transform="matrix(-0.4,0,0,-0.4,-4,0)"
>> + inkscape:connector-curvature="0" />
>> + </marker>
>> + <marker
>> + inkscape:stockid="Arrow1Mstart"
>> + orient="auto"
>> + refY="0"
>> + refX="0"
>> + id="Arrow1Mstart-5"
>> + style="overflow:visible"
>> + inkscape:isstock="true">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path840-1"
>> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
>> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
>> + transform="matrix(0.4,0,0,0.4,4,0)" />
>> + </marker>
>> + <marker
>> + inkscape:stockid="Arrow1Mend"
>> + orient="auto"
>> + refY="0"
>> + refX="0"
>> + id="Arrow1Mend-1"
>> + style="overflow:visible"
>> + inkscape:isstock="true">
>> + <path
>> + inkscape:connector-curvature="0"
>> + id="path843-0"
>> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
>> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
>> + transform="matrix(-0.4,0,0,-0.4,-4,0)" />
>> + </marker>
>> + <marker
>> + inkscape:isstock="true"
>> + style="overflow:visible"
>> + id="marker2922-2"
>> + refX="0"
>> + refY="0"
>> + orient="auto"
>> + inkscape:stockid="Arrow1Mend"
>> + inkscape:collect="always">
>> + <path
>> + transform="matrix(-0.4,0,0,-0.4,-4,0)"
>> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
>> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
>> + id="path2920-9"
>> + inkscape:connector-curvature="0" />
>> + </marker>
>> + <marker
>> + inkscape:isstock="true"
>> + style="overflow:visible"
>> + id="marker2922-27"
>> + refX="0"
>> + refY="0"
>> + orient="auto"
>> + inkscape:stockid="Arrow1Mend"
>> + inkscape:collect="always">
>> + <path
>> + transform="matrix(-0.4,0,0,-0.4,-4,0)"
>> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
>> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
>> + id="path2920-0"
>> + inkscape:connector-curvature="0" />
>> + </marker>
>> + <marker
>> + inkscape:isstock="true"
>> + style="overflow:visible"
>> + id="marker2922-27-8"
>> + refX="0"
>> + refY="0"
>> + orient="auto"
>> + inkscape:stockid="Arrow1Mend"
>> + inkscape:collect="always">
>> + <path
>> + transform="matrix(-0.4,0,0,-0.4,-4,0)"
>> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
>> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
>> + id="path2920-0-0"
>> + inkscape:connector-curvature="0" />
>> + </marker>
>> + </defs>
>> + <sodipodi:namedview
>> + id="base"
>> + pagecolor="#ffffff"
>> + bordercolor="#666666"
>> + borderopacity="1.0"
>> + inkscape:pageopacity="0.0"
>> + inkscape:pageshadow="2"
>> + inkscape:zoom="1.4"
>> + inkscape:cx="401.66654"
>> + inkscape:cy="218.12255"
>> + inkscape:document-units="mm"
>> + inkscape:current-layer="layer1"
>> + showgrid="false"
>> + inkscape:window-width="1916"
>> + inkscape:window-height="1033"
>> + inkscape:window-x="0"
>> + inkscape:window-y="22"
>> + inkscape:window-maximized="0" />
>> + <metadata
>> + id="metadata5">
>> + <rdf:RDF>
>> + <cc:Work
>> + rdf:about="">
>> + <dc:format>image/svg+xml</dc:format>
>> + <dc:type
>> + rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
>> + <dc:title />
>> + </cc:Work>
>> + </rdf:RDF>
>> + </metadata>
>> + <g
>> + inkscape:label="Layer 1"
>> + inkscape:groupmode="layer"
>> + id="layer1"
>> + transform="translate(0,-173)">
>> + <rect
>> + style="opacity:0.82999998;fill:none;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:0.4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:0.82745098"
>> + id="rect815"
>> + width="21.262758"
>> + height="40.350552"
>> + x="55.509361"
>> + y="195.00098"
>> + ry="0" />
>> + <rect
>> + style="opacity:0.82999998;fill:none;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:0.4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:0.82745098"
>> + id="rect815-1"
>> + width="21.24276"
>> + height="43.732346"
>> + x="55.519352"
>> + y="235.26543"
>> + ry="0" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="50.549229"
>> + y="190.6078"
>> + id="text1118"><tspan
>> + sodipodi:role="line"
>> + id="tspan1116"
>> + x="50.549229"
>> + y="190.6078"
>> + style="stroke-width:0.26458332px">queue file address space</tspan></text>
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + d="M 76.818568,194.95453 H 97.229281"
>> + id="path1126"
>> + inkscape:connector-curvature="0"
>> + sodipodi:nodetypes="cc" />
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + d="M 76.818568,235.20899 H 96.095361"
>> + id="path1126-8"
>> + inkscape:connector-curvature="0" />
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + d="m 76.762111,278.99778 h 19.27678"
>> + id="path1126-0"
>> + inkscape:connector-curvature="0" />
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + d="m 55.519355,265.20165 v 19.27678"
>> + id="path1126-2"
>> + inkscape:connector-curvature="0" />
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + d="m 76.762111,265.20165 v 19.27678"
>> + id="path1126-2-1"
>> + inkscape:connector-curvature="0" />
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-start:url(#Arrow1Mstart);marker-end:url(#Arrow1Mend)"
>> + d="m 87.590896,194.76554 0,39.87648"
>> + id="path1126-2-1-0"
>> + inkscape:connector-curvature="0"
>> + sodipodi:nodetypes="cc" />
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-start:url(#Arrow1Mstart-5);marker-end:url(#Arrow1Mend-1)"
>> + d="m 82.48822,235.77596 v 42.90029"
>> + id="path1126-2-1-0-8"
>> + inkscape:connector-curvature="0"
>> + sodipodi:nodetypes="cc" />
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker2922)"
>> + d="M 44.123633,195.3325 H 55.651907"
>> + id="path2912"
>> + inkscape:connector-curvature="0" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="32.217381"
>> + y="196.27745"
>> + id="text2968"><tspan
>> + sodipodi:role="line"
>> + id="tspan2966"
>> + x="32.217381"
>> + y="196.27745"
>> + style="stroke-width:0.26458332px">offset 0</tspan></text>
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="91.199554"
>> + y="216.03946"
>> + id="text1118-5"><tspan
>> + sodipodi:role="line"
>> + id="tspan1116-0"
>> + x="91.199554"
>> + y="216.03946"
>> + style="stroke-width:0.26458332px">device region (mapped to device mmio or shared kernel driver memory)</tspan></text>
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="86.188072"
>> + y="244.50081"
>> + id="text1118-5-6"><tspan
>> + sodipodi:role="line"
>> + id="tspan1116-0-4"
>> + x="86.188072"
>> + y="244.50081"
>> + style="stroke-width:0.26458332px">static share virtual memory region (for device without share virtual memory)</tspan></text>
>> + <flowRoot
>> + xml:space="preserve"
>> + id="flowRoot5699"
>> + style="font-style:normal;font-weight:normal;font-size:11.25px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"><flowRegion
>> + id="flowRegion5701"><rect
>> + id="rect5703"
>> + width="5182.8569"
>> + height="385.71429"
>> + x="34.285713"
>> + y="71.09111" /></flowRegion><flowPara
>> + id="flowPara5705" /></flowRoot> <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker2922-2)"
>> + d="M 43.679028,206.85268 H 55.207302"
>> + id="path2912-1"
>> + inkscape:connector-curvature="0" />
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker2922-27)"
>> + d="M 44.057004,224.23959 H 55.585278"
>> + id="path2912-9"
>> + inkscape:connector-curvature="0" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="24.139778"
>> + y="202.40636"
>> + id="text1118-5-3"><tspan
>> + sodipodi:role="line"
>> + id="tspan1116-0-6"
>> + x="24.139778"
>> + y="202.40636"
>> + style="stroke-width:0.26458332px">device mmio region</tspan></text>
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="17.010948"
>> + y="216.73672"
>> + id="text1118-5-3-3"><tspan
>> + sodipodi:role="line"
>> + id="tspan1116-0-6-6"
>> + x="17.010948"
>> + y="216.73672"
>> + style="stroke-width:0.26458332px">device kernel only region</tspan></text>
>> + <path
>> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker2922-27-8)"
>> + d="M 43.981087,235.35153 H 55.509361"
>> + id="path2912-9-2"
>> + inkscape:connector-curvature="0" />
>> + <text
>> + xml:space="preserve"
>> + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
>> + x="17.575975"
>> + y="230.53285"
>> + id="text1118-5-3-3-0"><tspan
>> + sodipodi:role="line"
>> + id="tspan1116-0-6-6-5"
>> + x="17.575975"
>> + y="230.53285"
>> + style="stroke-width:0.26458332px">device user share region</tspan></text>
>> + </g>
>> +</svg>
>> --
>> 2.17.1
>>

2018-11-15 18:56:33

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote:
> Date: Wed, 14 Nov 2018 18:00:17 +0200
> From: Leon Romanovsky <[email protected]>
> To: Kenneth Lee <[email protected]>
> CC: Tim Sell <[email protected]>, [email protected],
> Alexander Shishkin <[email protected]>, Zaibo Xu
> <[email protected]>, [email protected], [email protected],
> [email protected], Christoph Lameter <[email protected]>, Hao Fang
> <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> list <[email protected]>, Zhou Wang <[email protected]>,
> Jason Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> Kleine-König <[email protected]>, David Kershner
> <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> Pitchen <[email protected]>, Sagar Dharia
> <[email protected]>, Jens Axboe <[email protected]>,
> [email protected], linux-netdev <[email protected]>, Randy Dunlap
> <[email protected]>, [email protected], Vinod Koul
> <[email protected]>, [email protected], Philippe Ombredanne
> <[email protected]>, Sanyog Kale <[email protected]>, Kenneth Lee
> <[email protected]>, "David S. Miller" <[email protected]>,
> [email protected]
> Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> User-Agent: Mutt/1.10.1 (2018-07-13)
> Message-ID: <[email protected]>
>
> On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote:
> >
> > 在 2018/11/13 上午8:23, Leon Romanovsky 写道:
> > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote:
> > > > From: Kenneth Lee <[email protected]>
> > > >
> > > > WarpDrive is a general accelerator framework for the user application to
> > > > access the hardware without going through the kernel in data path.
> > > >
> > > > The kernel component to provide kernel facility to driver for expose the
> > > > user interface is called uacce. It a short name for
> > > > "Unified/User-space-access-intended Accelerator Framework".
> > > >
> > > > This patch add document to explain how it works.
> > > + RDMA and netdev folks
> > >
> > > Sorry, to be late in the game, I don't see other patches, but from
> > > the description below it seems like you are reinventing RDMA verbs
> > > model. I have hard time to see the differences in the proposed
> > > framework to already implemented in drivers/infiniband/* for the kernel
> > > space and for the https://github.com/linux-rdma/rdma-core/ for the user
> > > space parts.
> >
> > Thanks Leon,
> >
> > Yes, we tried to solve similar problem in RDMA. We also learned a lot from
> > the exist code of RDMA. But we we have to make a new one because we cannot
> > register accelerators such as AI operation, encryption or compression to the
> > RDMA framework:)
>
> Assuming that you did everything right and still failed to use RDMA
> framework, you was supposed to fix it and not to reinvent new exactly
> same one. It is how we develop kernel, by reusing existing code.

Yes, but we don't force other system such as NIC or GPU into RDMA, do we?

I assume you would not agree to register a zip accelerator to infiniband? :)

Further, I don't think it is wise to break an exist system (RDMA) to fulfill a
totally new scenario. The better choice is to let them run in parallel for some
time and try to merge them accordingly.

>
> >
> > Another problem we tried to address is the way to pin the memory for dma
> > operation. The RDMA way to pin the memory cannot avoid the page lost due to
> > copy-on-write operation during the memory is used by the device. This may
> > not be important to RDMA library. But it is important to accelerator.
>
> Such support exists in drivers/infiniband/ from late 2014 and
> it is called ODP (on demand paging).

I reviewed ODP and I think it is a solution bound to infiniband. It is part of
MR semantics and required a infiniband specific hook
(ucontext->invalidate_range()). And the hook requires the device to be able to
stop using the page for a while for the copying. It is ok for infiniband
(actually, only mlx5 uses it). I don't think most accelerators can support
this mode. But WarpDrive works fully on top of IOMMU interface, it has no this
limitation.

>
> >
> > Hope this can help the understanding.
>
> Yes, it helped me a lot.
> Now, I'm more than before convinced that this whole patchset shouldn't
> exist in the first place.

Then maybe you can tell me how I can register my accelerator to the user space?

>
> To be clear, NAK.
>
> Thanks
>
> >
> > Cheers
> >
> > >
> > > Hard NAK from RDMA side.
> > >
> > > Thanks
> > >
> > > > Signed-off-by: Kenneth Lee <[email protected]>
> > > > ---
> > > > Documentation/warpdrive/warpdrive.rst | 260 +++++++
> > > > Documentation/warpdrive/wd-arch.svg | 764 ++++++++++++++++++++
> > > > Documentation/warpdrive/wd.svg | 526 ++++++++++++++
> > > > Documentation/warpdrive/wd_q_addr_space.svg | 359 +++++++++
> > > > 4 files changed, 1909 insertions(+)
> > > > create mode 100644 Documentation/warpdrive/warpdrive.rst
> > > > create mode 100644 Documentation/warpdrive/wd-arch.svg
> > > > create mode 100644 Documentation/warpdrive/wd.svg
> > > > create mode 100644 Documentation/warpdrive/wd_q_addr_space.svg
> > > >
> > > > diff --git a/Documentation/warpdrive/warpdrive.rst b/Documentation/warpdrive/warpdrive.rst
> > > > new file mode 100644
> > > > index 000000000000..ef84d3a2d462
> > > > --- /dev/null
> > > > +++ b/Documentation/warpdrive/warpdrive.rst
> > > > @@ -0,0 +1,260 @@
> > > > +Introduction of WarpDrive
> > > > +=========================
> > > > +
> > > > +*WarpDrive* is a general accelerator framework for the user application to
> > > > +access the hardware without going through the kernel in data path.
> > > > +
> > > > +It can be used as the quick channel for accelerators, network adaptors or
> > > > +other hardware for application in user space.
> > > > +
> > > > +This may make some implementation simpler. E.g. you can reuse most of the
> > > > +*netdev* driver in kernel and just share some ring buffer to the user space
> > > > +driver for *DPDK* [4] or *ODP* [5]. Or you can combine the RSA accelerator with
> > > > +the *netdev* in the user space as a https reversed proxy, etc.
> > > > +
> > > > +*WarpDrive* takes the hardware accelerator as a heterogeneous processor which
> > > > +can share particular load from the CPU:
> > > > +
> > > > +.. image:: wd.svg
> > > > + :alt: WarpDrive Concept
> > > > +
> > > > +The virtual concept, queue, is used to manage the requests sent to the
> > > > +accelerator. The application send requests to the queue by writing to some
> > > > +particular address, while the hardware takes the requests directly from the
> > > > +address and send feedback accordingly.
> > > > +
> > > > +The format of the queue may differ from hardware to hardware. But the
> > > > +application need not to make any system call for the communication.
> > > > +
> > > > +*WarpDrive* tries to create a shared virtual address space for all involved
> > > > +accelerators. Within this space, the requests sent to queue can refer to any
> > > > +virtual address, which will be valid to the application and all involved
> > > > +accelerators.
> > > > +
> > > > +The name *WarpDrive* is simply a cool and general name meaning the framework
> > > > +makes the application faster. It includes general user library, kernel
> > > > +management module and drivers for the hardware. In kernel, the management
> > > > +module is called *uacce*, meaning "Unified/User-space-access-intended
> > > > +Accelerator Framework".
> > > > +
> > > > +
> > > > +How does it work
> > > > +================
> > > > +
> > > > +*WarpDrive* uses *mmap* and *IOMMU* to play the trick.
> > > > +
> > > > +*Uacce* creates a chrdev for the device registered to it. A "queue" will be
> > > > +created when the chrdev is opened. The application access the queue by mmap
> > > > +different address region of the queue file.
> > > > +
> > > > +The following figure demonstrated the queue file address space:
> > > > +
> > > > +.. image:: wd_q_addr_space.svg
> > > > + :alt: WarpDrive Queue Address Space
> > > > +
> > > > +The first region of the space, device region, is used for the application to
> > > > +write request or read answer to or from the hardware.
> > > > +
> > > > +Normally, there can be three types of device regions mmio and memory regions.
> > > > +It is recommended to use common memory for request/answer descriptors and use
> > > > +the mmio space for device notification, such as doorbell. But of course, this
> > > > +is all up to the interface designer.
> > > > +
> > > > +There can be two types of device memory regions, kernel-only and user-shared.
> > > > +This will be explained in the "kernel APIs" section.
> > > > +
> > > > +The Static Share Virtual Memory region is necessary only when the device IOMMU
> > > > +does not support "Share Virtual Memory". This will be explained after the
> > > > +*IOMMU* idea.
> > > > +
> > > > +
> > > > +Architecture
> > > > +------------
> > > > +
> > > > +The full *WarpDrive* architecture is represented in the following class
> > > > +diagram:
> > > > +
> > > > +.. image:: wd-arch.svg
> > > > + :alt: WarpDrive Architecture
> > > > +
> > > > +
> > > > +The user API
> > > > +------------
> > > > +
> > > > +We adopt a polling style interface in the user space: ::
> > > > +
> > > > + int wd_request_queue(struct wd_queue *q);
> > > > + void wd_release_queue(struct wd_queue *q);
> > > > +
> > > > + int wd_send(struct wd_queue *q, void *req);
> > > > + int wd_recv(struct wd_queue *q, void **req);
> > > > + int wd_recv_sync(struct wd_queue *q, void **req);
> > > > + void wd_flush(struct wd_queue *q);
> > > > +
> > > > +wd_recv_sync() is a wrapper to its non-sync version. It will trapped into
> > > > +kernel and waits until the queue become available.
> > > > +
> > > > +If the queue do not support SVA/SVM. The following helper function
> > > > +can be used to create Static Virtual Share Memory: ::
> > > > +
> > > > + void *wd_preserve_share_memory(struct wd_queue *q, size_t size);
> > > > +
> > > > +The user API is not mandatory. It is simply a suggestion and hint what the
> > > > +kernel interface is supposed to support.
> > > > +
> > > > +
> > > > +The user driver
> > > > +---------------
> > > > +
> > > > +The queue file mmap space will need a user driver to wrap the communication
> > > > +protocol. *UACCE* provides some attributes in sysfs for the user driver to
> > > > +match the right accelerator accordingly.
> > > > +
> > > > +The *UACCE* device attribute is under the following directory:
> > > > +
> > > > +/sys/class/uacce/<dev-name>/params
> > > > +
> > > > +The following attributes is supported:
> > > > +
> > > > +nr_queue_remained (ro)
> > > > + number of queue remained
> > > > +
> > > > +api_version (ro)
> > > > + a string to identify the queue mmap space format and its version
> > > > +
> > > > +device_attr (ro)
> > > > + attributes of the device, see UACCE_DEV_xxx flag defined in uacce.h
> > > > +
> > > > +numa_node (ro)
> > > > + id of numa node
> > > > +
> > > > +priority (rw)
> > > > + Priority or the device, bigger is higher
> > > > +
> > > > +(This is not yet implemented in RFC version)
> > > > +
> > > > +
> > > > +The kernel API
> > > > +--------------
> > > > +
> > > > +The *uacce* kernel API is defined in uacce.h. If the hardware support SVM/SVA,
> > > > +The driver need only the following API functions: ::
> > > > +
> > > > + int uacce_register(uacce);
> > > > + void uacce_unregister(uacce);
> > > > + void uacce_wake_up(q);
> > > > +
> > > > +*uacce_wake_up* is used to notify the process who epoll() on the queue file.
> > > > +
> > > > +According to the IOMMU capability, *uacce* categories the devices as follow:
> > > > +
> > > > +UACCE_DEV_NOIOMMU
> > > > + The device has no IOMMU. The user process cannot use VA on the hardware
> > > > + This mode is not recommended.
> > > > +
> > > > +UACCE_DEV_SVA (UACCE_DEV_PASID | UACCE_DEV_FAULT_FROM_DEV)
> > > > + The device has IOMMU which can share the same page table with user
> > > > + process
> > > > +
> > > > +UACCE_DEV_SHARE_DOMAIN
> > > > + The device has IOMMU which has no multiple page table and device page
> > > > + fault support
> > > > +
> > > > +If the device works in mode other than UACCE_DEV_NOIOMMU, *uacce* will set its
> > > > +IOMMU to IOMMU_DOMAIN_UNMANAGED. So the driver must not use any kernel
> > > > +DMA API but the following ones from *uacce* instead: ::
> > > > +
> > > > + uacce_dma_map(q, va, size, prot);
> > > > + uacce_dma_unmap(q, va, size, prot);
> > > > +
> > > > +*uacce_dma_map/unmap* is valid only for UACCE_DEV_SVA device. It creates a
> > > > +particular PASID and page table for the kernel in the IOMMU (Not yet
> > > > +implemented in the RFC)
> > > > +
> > > > +For the UACCE_DEV_SHARE_DOMAIN device, uacce_dma_map/unmap is not valid.
> > > > +*Uacce* call back start_queue only when the DUS and DKO region is mmapped. The
> > > > +accelerator driver must use those dma buffer, via uacce_queue->qfrs[], on
> > > > +start_queue call back. The size of the queue file region is defined by
> > > > +uacce->ops->qf_pg_start[].
> > > > +
> > > > +We have to do it this way because most of current IOMMU cannot support the
> > > > +kernel and user virtual address at the same time. So we have to let them both
> > > > +share the same user virtual address space.
> > > > +
> > > > +If the device have to support kernel and user at the same time, both kernel
> > > > +and the user should use these DMA API. This is not convenient. A better
> > > > +solution is to change the future DMA/IOMMU design to let them separate the
> > > > +address space between the user and kernel space. But it is not going to be in
> > > > +a short time.
> > > > +
> > > > +
> > > > +Multiple processes support
> > > > +==========================
> > > > +
> > > > +In the latest mainline kernel (4.19) when this document is written, the IOMMU
> > > > +subsystem do not support multiple process page tables yet.
> > > > +
> > > > +Most IOMMU hardware implementation support multi-process with the concept
> > > > +of PASID. But they may use different name, e.g. it is call sub-stream-id in
> > > > +SMMU of ARM. With PASID or similar design, multi page table can be added to
> > > > +the IOMMU and referred by its PASID.
> > > > +
> > > > +*JPB* has a patchset to enable this[1]_. We have tested it with our hardware
> > > > +(which is known as *D06*). It works well. *WarpDrive* rely on them to support
> > > > +UACCE_DEV_SVA. If it is not enabled, *WarpDrive* can still work. But it
> > > > +support only one process, the device will be set to UACCE_DEV_SHARE_DOMAIN
> > > > +even it is set to UACCE_DEV_SVA initially.
> > > > +
> > > > +Static Share Virtual Memory is mainly used by UACCE_DEV_SHARE_DOMAIN device.
> > > > +
> > > > +
> > > > +Legacy Mode Support
> > > > +===================
> > > > +For the hardware without IOMMU, WarpDrive can still work, the only problem is
> > > > +VA cannot be used in the device. The driver should adopt another strategy for
> > > > +the shared memory. It is only for testing, and not recommended.
> > > > +
> > > > +
> > > > +The Folk Scenario
> > > > +=================
> > > > +For a process with allocated queues and shared memory, what happen if it forks
> > > > +a child?
> > > > +
> > > > +The fd of the queue will be duplicated on folk, so the child can send request
> > > > +to the same queue as its parent. But the requests which is sent from processes
> > > > +except for the one who open the queue will be blocked.
> > > > +
> > > > +It is recommended to add O_CLOEXEC to the queue file.
> > > > +
> > > > +The queue mmap space has a VM_DONTCOPY in its VMA. So the child will lost all
> > > > +those VMAs.
> > > > +
> > > > +This is why *WarpDrive* does not adopt the mode used in *VFIO* and *InfiniBand*.
> > > > +Both solutions can set any user pointer for hardware sharing. But they cannot
> > > > +support fork when the dma is in process. Or the "Copy-On-Write" procedure will
> > > > +make the parent process lost its physical pages.
> > > > +
> > > > +
> > > > +The Sample Code
> > > > +===============
> > > > +There is a sample user land implementation with a simple driver for Hisilicon
> > > > +Hi1620 ZIP Accelerator.
> > > > +
> > > > +To test, do the following in samples/warpdrive (for the case of PC host): ::
> > > > + ./autogen.sh
> > > > + ./conf.sh # or simply ./configure if you build on target system
> > > > + make
> > > > +
> > > > +Then you can get test_hisi_zip in the test subdirectory. Copy it to the target
> > > > +system and make sure the hisi_zip driver is enabled (the major and minor of
> > > > +the uacce chrdev can be gotten from the dmesg or sysfs), and run: ::
> > > > + mknod /dev/ua1 c <major> <minior>
> > > > + test/test_hisi_zip -z < data > data.zip
> > > > + test/test_hisi_zip -g < data > data.gzip
> > > > +
> > > > +
> > > > +References
> > > > +==========
> > > > +.. [1] https://patchwork.kernel.org/patch/10394851/
> > > > +
> > > > +.. vim: tw=78
[...]
> > > > --
> > > > 2.17.1
> > > >

2018-11-20 12:56:07

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 12:48:01PM +0200, Leon Romanovsky wrote:
> Date: Mon, 19 Nov 2018 12:48:01 +0200
> From: Leon Romanovsky <[email protected]>
> To: Kenneth Lee <[email protected]>
> CC: Tim Sell <[email protected]>, [email protected],
> Alexander Shishkin <[email protected]>, Zaibo Xu
> <[email protected]>, [email protected], [email protected],
> [email protected], Christoph Lameter <[email protected]>, Hao Fang
> <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> list <[email protected]>, Vinod Koul <[email protected]>, Jason
> Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> Kleine-König <[email protected]>, David Kershner
> <[email protected]>, Kenneth Lee <[email protected]>, Johan
> Hovold <[email protected]>, Cyrille Pitchen
> <[email protected]>, Sagar Dharia
> <[email protected]>, Jens Axboe <[email protected]>,
> [email protected], linux-netdev <[email protected]>, Randy Dunlap
> <[email protected]>, [email protected], Zhou Wang
> <[email protected]>, [email protected], Philippe
> Ombredanne <[email protected]>, Sanyog Kale <[email protected]>,
> "David S. Miller" <[email protected]>,
> [email protected], Jerome Glisse <[email protected]>
> Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> User-Agent: Mutt/1.10.1 (2018-07-13)
> Message-ID: <[email protected]>
>
> On Mon, Nov 19, 2018 at 05:19:10PM +0800, Kenneth Lee wrote:
> > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> > > Date: Mon, 19 Nov 2018 17:14:05 +0800
> > > From: Kenneth Lee <[email protected]>
> > > To: Leon Romanovsky <[email protected]>
> > > CC: Tim Sell <[email protected]>, [email protected],
> > > Alexander Shishkin <[email protected]>, Zaibo Xu
> > > <[email protected]>, [email protected], [email protected],
> > > [email protected], Christoph Lameter <[email protected]>, Hao Fang
> > > <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> > > list <[email protected]>, Vinod Koul <[email protected]>, Jason
> > > Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> > > Kleine-König <[email protected]>, David Kershner
> > > <[email protected]>, Kenneth Lee <[email protected]>, Johan
> > > Hovold <[email protected]>, Cyrille Pitchen
> > > <[email protected]>, Sagar Dharia
> > > <[email protected]>, Jens Axboe <[email protected]>,
> > > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > > <[email protected]>, [email protected], Zhou Wang
> > > <[email protected]>, [email protected], Philippe
> > > Ombredanne <[email protected]>, Sanyog Kale <[email protected]>,
> > > "David S. Miller" <[email protected]>,
> > > [email protected]
> > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > User-Agent: Mutt/1.5.21 (2010-09-15)
> > > Message-ID: <20181119091405.GE157308@Turing-Arch-b>
> > >
> > > On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote:
> > > > Date: Thu, 15 Nov 2018 16:54:55 +0200
> > > > From: Leon Romanovsky <[email protected]>
> > > > To: Kenneth Lee <[email protected]>
> > > > CC: Kenneth Lee <[email protected]>, Tim Sell <[email protected]>,
> > > > [email protected], Alexander Shishkin
> > > > <[email protected]>, Zaibo Xu <[email protected]>,
> > > > [email protected], [email protected], [email protected],
> > > > Christoph Lameter <[email protected]>, Hao Fang <[email protected]>, Gavin
> > > > Schenk <[email protected]>, RDMA mailing list
> > > > <[email protected]>, Zhou Wang <[email protected]>, Jason
> > > > Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> > > > Kleine-König <[email protected]>, David Kershner
> > > > <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> > > > Pitchen <[email protected]>, Sagar Dharia
> > > > <[email protected]>, Jens Axboe <[email protected]>,
> > > > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > > > <[email protected]>, [email protected], Vinod Koul
> > > > <[email protected]>, [email protected], Philippe Ombredanne
> > > > <[email protected]>, Sanyog Kale <[email protected]>, "David S.
> > > > Miller" <[email protected]>, [email protected]
> > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > > User-Agent: Mutt/1.10.1 (2018-07-13)
> > > > Message-ID: <[email protected]>
> > > >
> > > > On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote:
> > > > > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote:
> > > > > > Date: Wed, 14 Nov 2018 18:00:17 +0200
> > > > > > From: Leon Romanovsky <[email protected]>
> > > > > > To: Kenneth Lee <[email protected]>
> > > > > > CC: Tim Sell <[email protected]>, [email protected],
> > > > > > Alexander Shishkin <[email protected]>, Zaibo Xu
> > > > > > <[email protected]>, [email protected], [email protected],
> > > > > > [email protected], Christoph Lameter <[email protected]>, Hao Fang
> > > > > > <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> > > > > > list <[email protected]>, Zhou Wang <[email protected]>,
> > > > > > Jason Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> > > > > > Kleine-König <[email protected]>, David Kershner
> > > > > > <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> > > > > > Pitchen <[email protected]>, Sagar Dharia
> > > > > > <[email protected]>, Jens Axboe <[email protected]>,
> > > > > > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > > > > > <[email protected]>, [email protected], Vinod Koul
> > > > > > <[email protected]>, [email protected], Philippe Ombredanne
> > > > > > <[email protected]>, Sanyog Kale <[email protected]>, Kenneth Lee
> > > > > > <[email protected]>, "David S. Miller" <[email protected]>,
> > > > > > [email protected]
> > > > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > > > > User-Agent: Mutt/1.10.1 (2018-07-13)
> > > > > > Message-ID: <[email protected]>
> > > > > >
> > > > > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote:
> > > > > > >
> > > > > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道:
> > > > > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote:
> > > > > > > > > From: Kenneth Lee <[email protected]>
> > > > > > > > >
> > > > > > > > > WarpDrive is a general accelerator framework for the user application to
> > > > > > > > > access the hardware without going through the kernel in data path.
> > > > > > > > >
> > > > > > > > > The kernel component to provide kernel facility to driver for expose the
> > > > > > > > > user interface is called uacce. It a short name for
> > > > > > > > > "Unified/User-space-access-intended Accelerator Framework".
> > > > > > > > >
> > > > > > > > > This patch add document to explain how it works.
> > > > > > > > + RDMA and netdev folks
> > > > > > > >
> > > > > > > > Sorry, to be late in the game, I don't see other patches, but from
> > > > > > > > the description below it seems like you are reinventing RDMA verbs
> > > > > > > > model. I have hard time to see the differences in the proposed
> > > > > > > > framework to already implemented in drivers/infiniband/* for the kernel
> > > > > > > > space and for the https://github.com/linux-rdma/rdma-core/ for the user
> > > > > > > > space parts.
> > > > > > >
> > > > > > > Thanks Leon,
> > > > > > >
> > > > > > > Yes, we tried to solve similar problem in RDMA. We also learned a lot from
> > > > > > > the exist code of RDMA. But we we have to make a new one because we cannot
> > > > > > > register accelerators such as AI operation, encryption or compression to the
> > > > > > > RDMA framework:)
> > > > > >
> > > > > > Assuming that you did everything right and still failed to use RDMA
> > > > > > framework, you was supposed to fix it and not to reinvent new exactly
> > > > > > same one. It is how we develop kernel, by reusing existing code.
> > > > >
> > > > > Yes, but we don't force other system such as NIC or GPU into RDMA, do we?
> > > >
> > > > You don't introduce new NIC or GPU, but proposing another interface to
> > > > directly access HW memory and bypass kernel for the data path. This is
> > > > whole idea of RDMA and this is why it is already present in the kernel.
> > > >
> > > > Various hardware devices are supported in our stack allow a ton of crazy
> > > > stuff, including GPUs interconnections and NIC functionalities.
> > >
> > > Yes. We don't want to invent new wheel. That is why we did it behind VFIO in RFC
> > > v1 and v2. But finally we were persuaded by Mr. Jerome Glisse that VFIO was not
> > > a good place to solve the problem.
>
> I saw a couple of his responses, he constantly said to you that you are
> reinventing the wheel.
> https://lore.kernel.org/lkml/[email protected]/
>

No. I think he asked me did not create trouble in VFIO but just use common
interface from dma_buf and iommu itself. That is exactly what I am doing.

> > >
> > > And currently, as you see, IB is bound with devices doing RDMA. The register
> > > function, ib_register_device() hint that it is a netdev (get_netdev() callback), it know
> > > about gid, pkey, and Memory Window. IB is not simply a address space management
> > > framework. And verbs to IB are not transparent. If we start to add
> > > compression/decompression, AI (RNN, CNN stuff) operations, and encryption/decryption
> > > to the verbs set. It will become very complexity. Or maybe I misunderstand the
> > > IB idea? But I don't see compression hardware is integrated in the mainline
> > > Kernel. Could you directly point out which one I can used as a reference?
> > >
>
> I strongly advise you to read the code, not all drivers are implementing
> gids, pkeys and get_netdev() callback.
>
> Yes, you are misunderstanding drivers/infiniband subsystem. We have
> plenty options to expose APIs to the user space applications, starting
> from standard verbs API and ending with private objects which are
> understandable by specific device/driver.
>
> IB stack provides secure FD to access device, by creating context,
> after that you can send direct commands to the FW (see mlx5 DEVX
> or hfi1) in sane way.
>
> So actually, you will need to register your device, declare your own
> set of objects (similar to mlx5 include/uapi/rdma/mlx5_user_ioctl_*.h).
>
> In regards to reference of compression hardware, I don't have.
> But there is an example of how T10-DIF can be implemented in verbs
> layer:
> https://www.openfabrics.org/images/2018workshop/presentations/307_TOved_T10-DIFOffload.pdf
> Or IPsec crypto:
> https://www.spinics.net/lists/linux-rdma/msg48906.html
>

OK. I will spend some time on it first. But according to current discussion,
Don't you think I should avoid all these complexities but simply use SVM/SVA on
iommu or let the user application use the kernel-allocated VMA and page? It
does not create anything new. Just a new user of IOMMU and its SVM/SVA
capability.

> > > >
> > > > >
> > > > > I assume you would not agree to register a zip accelerator to infiniband? :)
> > > >
> > > > "infiniband" name in the "drivers/infiniband/" is legacy one and the
> > > > current code supports IB, RoCE, iWARP and OmniPath as a transport layers.
> > > > For a lone time, we wanted to rename that folder to be "drivers/rdma",
> > > > but didn't find enough brave men/women to do it, due to backport mess
> > > > for such move.
> > > >
> > > > The addition of zip accelerator to RDMA is possible and depends on how
> > > > you will model such new functionality - new driver, or maybe new ULP.
> > > >
> > > > >
> > > > > Further, I don't think it is wise to break an exist system (RDMA) to fulfill a
> > > > > totally new scenario. The better choice is to let them run in parallel for some
> > > > > time and try to merge them accordingly.
> > > >
> > > > Awesome, so please run your code out-of-tree for now and once you are ready
> > > > for submission let's try to merge it.
> > >
> > > Yes, yes. We know trust need time to gain. But the fact is that there is no
> > > accelerator user driver can be added to mainline kernel. We should raise the
> > > topic time to time. So to help the communication to fix the gap, right?
> > >
> > > We are also opened to cooperate with IB to do it within the IB framework. But
> > > please let me know where to start. I feel it is quite wired to make a
> > > ib_register_device for a zip or RSA accelerator.
>
> Most of ib_ prefixes in drivers/infinband/ are legacy names. You can
> rename them to be rdma_register_device() if it helps.
>
> So from implementation point of view, as I wrote above.
> Create minimal driver to register, expose MR to user space, add your own
> objects and capabilities through our new KABI and implement user space part
> in github.com/linux-rdma/rdma-core.

I don't think it is just a name. But anyway, let me spend some time to try the
possibility.

>
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Another problem we tried to address is the way to pin the memory for dma
> > > > > > > operation. The RDMA way to pin the memory cannot avoid the page lost due to
> > > > > > > copy-on-write operation during the memory is used by the device. This may
> > > > > > > not be important to RDMA library. But it is important to accelerator.
> > > > > >
> > > > > > Such support exists in drivers/infiniband/ from late 2014 and
> > > > > > it is called ODP (on demand paging).
> > > > >
> > > > > I reviewed ODP and I think it is a solution bound to infiniband. It is part of
> > > > > MR semantics and required a infiniband specific hook
> > > > > (ucontext->invalidate_range()). And the hook requires the device to be able to
> > > > > stop using the page for a while for the copying. It is ok for infiniband
> > > > > (actually, only mlx5 uses it). I don't think most accelerators can support
> > > > > this mode. But WarpDrive works fully on top of IOMMU interface, it has no this
> > > > > limitation.
> > > >
> > > > 1. It has nothing to do with infiniband.
> > >
> > > But it must be a ib_dev first.
>
> It is just a name.
>
> > >
> > > > 2. MR and uncontext are verbs semantics and needed to ensure that host
> > > > memory exposed to user is properly protected from security point of view.
> > > > 3. "stop using the page for a while for the copying" - I'm not fully
> > > > understand this claim, maybe this article will help you to better
> > > > describe : https://lwn.net/Articles/753027/
> > >
> > > This topic was being discussed in RFCv2. The key problem here is that:
> > >
> > > The device need to hold the memory for its own calculation, but the CPU/software
> > > want to stop it for a while for synchronizing with disk or COW.
> > >
> > > If the hardware support SVM/SVA (Shared Virtual Memory/Address), it is easy, the
> > > device share page table with CPU, the device will raise a page fault when the
> > > CPU downgrade the PTE to read-only.
> > >
> > > If the hardware cannot share page table with the CPU, we then need to have
> > > some way to change the device page table. This is what happen in ODP. It
> > > invalidates the page table in device upon mmu_notifier call back. But this cannot
> > > solve the COW problem: if the user process A share a page P with device, and A
> > > forks a new process B, and it continue to write to the page. By COW, the
> > > process B will keep the page P, while A will get a new page P'. But you have
> > > no way to let the device know it should use P' rather than P.
>
> I didn't hear about such issue and we supported fork for a long time.
>
> > >
> > > This may be OK for RDMA application. Because RDMA is a big thing and we can ask
> > > the programmer to avoid the situation. But for a accelerator, I don't think we
> > > can ask a programmer to care for this when use a zlib.
> > >
> > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it support
> > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you don't need
> > > to write any code for that. Because it has been done by IOMMU framework. If it
> > > dose not, you have to use the kernel allocated memory which has the same IOVA as
> > > the VA in user space. So we can still maintain a unify address space among the
> > > devices and the applicatin.
> > >
> > > > 4. mlx5 supports ODP not because of being partially IB device,
> > > > but because HW performance oriented implementation is not an easy task.
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Hope this can help the understanding.
> > > > > >
> > > > > > Yes, it helped me a lot.
> > > > > > Now, I'm more than before convinced that this whole patchset shouldn't
> > > > > > exist in the first place.
> > > > >
> > > > > Then maybe you can tell me how I can register my accelerator to the user space?
> > > >
> > > > Write kernel driver and write user space part of it.
> > > > https://github.com/linux-rdma/rdma-core/
> > > >
> > > > I have no doubts that your colleagues who wrote and maintain
> > > > drivers/infiniband/hw/hns driver know best how to do it.
> > > > They did it very successfully.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > >
> > > > > > To be clear, NAK.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > >
> > > > > > > > Hard NAK from RDMA side.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > > Signed-off-by: Kenneth Lee <[email protected]>
> > > > > > > > > ---
> > > > > > > > > Documentation/warpdrive/warpdrive.rst | 260 +++++++
> > > > > > > > > Documentation/warpdrive/wd-arch.svg | 764 ++++++++++++++++++++
> > > > > > > > > Documentation/warpdrive/wd.svg | 526 ++++++++++++++
> > > > > > > > > Documentation/warpdrive/wd_q_addr_space.svg | 359 +++++++++
> > > > > > > > > 4 files changed, 1909 insertions(+)
> > > > > > > > > create mode 100644 Documentation/warpdrive/warpdrive.rst
> > > > > > > > > create mode 100644 Documentation/warpdrive/wd-arch.svg
> > > > > > > > > create mode 100644 Documentation/warpdrive/wd.svg
> > > > > > > > > create mode 100644 Documentation/warpdrive/wd_q_addr_space.svg
> > > > > > > > >
> > > > > > > > > diff --git a/Documentation/warpdrive/warpdrive.rst b/Documentation/warpdrive/warpdrive.rst
> > > > > > > > > new file mode 100644
> > > > > > > > > index 000000000000..ef84d3a2d462
> > > > > > > > > --- /dev/null
> > > > > > > > > +++ b/Documentation/warpdrive/warpdrive.rst
> > > > > > > > > @@ -0,0 +1,260 @@
> > > > > > > > > +Introduction of WarpDrive
> > > > > > > > > +=========================
> > > > > > > > > +
> > > > > > > > > +*WarpDrive* is a general accelerator framework for the user application to
> > > > > > > > > +access the hardware without going through the kernel in data path.
> > > > > > > > > +
> > > > > > > > > +It can be used as the quick channel for accelerators, network adaptors or
> > > > > > > > > +other hardware for application in user space.
> > > > > > > > > +
> > > > > > > > > +This may make some implementation simpler. E.g. you can reuse most of the
> > > > > > > > > +*netdev* driver in kernel and just share some ring buffer to the user space
> > > > > > > > > +driver for *DPDK* [4] or *ODP* [5]. Or you can combine the RSA accelerator with
> > > > > > > > > +the *netdev* in the user space as a https reversed proxy, etc.
> > > > > > > > > +
> > > > > > > > > +*WarpDrive* takes the hardware accelerator as a heterogeneous processor which
> > > > > > > > > +can share particular load from the CPU:
> > > > > > > > > +
> > > > > > > > > +.. image:: wd.svg
> > > > > > > > > + :alt: WarpDrive Concept
> > > > > > > > > +
> > > > > > > > > +The virtual concept, queue, is used to manage the requests sent to the
> > > > > > > > > +accelerator. The application send requests to the queue by writing to some
> > > > > > > > > +particular address, while the hardware takes the requests directly from the
> > > > > > > > > +address and send feedback accordingly.
> > > > > > > > > +
> > > > > > > > > +The format of the queue may differ from hardware to hardware. But the
> > > > > > > > > +application need not to make any system call for the communication.
> > > > > > > > > +
> > > > > > > > > +*WarpDrive* tries to create a shared virtual address space for all involved
> > > > > > > > > +accelerators. Within this space, the requests sent to queue can refer to any
> > > > > > > > > +virtual address, which will be valid to the application and all involved
> > > > > > > > > +accelerators.
> > > > > > > > > +
> > > > > > > > > +The name *WarpDrive* is simply a cool and general name meaning the framework
> > > > > > > > > +makes the application faster. It includes general user library, kernel
> > > > > > > > > +management module and drivers for the hardware. In kernel, the management
> > > > > > > > > +module is called *uacce*, meaning "Unified/User-space-access-intended
> > > > > > > > > +Accelerator Framework".
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +How does it work
> > > > > > > > > +================
> > > > > > > > > +
> > > > > > > > > +*WarpDrive* uses *mmap* and *IOMMU* to play the trick.
> > > > > > > > > +
> > > > > > > > > +*Uacce* creates a chrdev for the device registered to it. A "queue" will be
> > > > > > > > > +created when the chrdev is opened. The application access the queue by mmap
> > > > > > > > > +different address region of the queue file.
> > > > > > > > > +
> > > > > > > > > +The following figure demonstrated the queue file address space:
> > > > > > > > > +
> > > > > > > > > +.. image:: wd_q_addr_space.svg
> > > > > > > > > + :alt: WarpDrive Queue Address Space
> > > > > > > > > +
> > > > > > > > > +The first region of the space, device region, is used for the application to
> > > > > > > > > +write request or read answer to or from the hardware.
> > > > > > > > > +
> > > > > > > > > +Normally, there can be three types of device regions mmio and memory regions.
> > > > > > > > > +It is recommended to use common memory for request/answer descriptors and use
> > > > > > > > > +the mmio space for device notification, such as doorbell. But of course, this
> > > > > > > > > +is all up to the interface designer.
> > > > > > > > > +
> > > > > > > > > +There can be two types of device memory regions, kernel-only and user-shared.
> > > > > > > > > +This will be explained in the "kernel APIs" section.
> > > > > > > > > +
> > > > > > > > > +The Static Share Virtual Memory region is necessary only when the device IOMMU
> > > > > > > > > +does not support "Share Virtual Memory". This will be explained after the
> > > > > > > > > +*IOMMU* idea.
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +Architecture
> > > > > > > > > +------------
> > > > > > > > > +
> > > > > > > > > +The full *WarpDrive* architecture is represented in the following class
> > > > > > > > > +diagram:
> > > > > > > > > +
> > > > > > > > > +.. image:: wd-arch.svg
> > > > > > > > > + :alt: WarpDrive Architecture
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +The user API
> > > > > > > > > +------------
> > > > > > > > > +
> > > > > > > > > +We adopt a polling style interface in the user space: ::
> > > > > > > > > +
> > > > > > > > > + int wd_request_queue(struct wd_queue *q);
> > > > > > > > > + void wd_release_queue(struct wd_queue *q);
> > > > > > > > > +
> > > > > > > > > + int wd_send(struct wd_queue *q, void *req);
> > > > > > > > > + int wd_recv(struct wd_queue *q, void **req);
> > > > > > > > > + int wd_recv_sync(struct wd_queue *q, void **req);
> > > > > > > > > + void wd_flush(struct wd_queue *q);
> > > > > > > > > +
> > > > > > > > > +wd_recv_sync() is a wrapper to its non-sync version. It will trapped into
> > > > > > > > > +kernel and waits until the queue become available.
> > > > > > > > > +
> > > > > > > > > +If the queue do not support SVA/SVM. The following helper function
> > > > > > > > > +can be used to create Static Virtual Share Memory: ::
> > > > > > > > > +
> > > > > > > > > + void *wd_preserve_share_memory(struct wd_queue *q, size_t size);
> > > > > > > > > +
> > > > > > > > > +The user API is not mandatory. It is simply a suggestion and hint what the
> > > > > > > > > +kernel interface is supposed to support.
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +The user driver
> > > > > > > > > +---------------
> > > > > > > > > +
> > > > > > > > > +The queue file mmap space will need a user driver to wrap the communication
> > > > > > > > > +protocol. *UACCE* provides some attributes in sysfs for the user driver to
> > > > > > > > > +match the right accelerator accordingly.
> > > > > > > > > +
> > > > > > > > > +The *UACCE* device attribute is under the following directory:
> > > > > > > > > +
> > > > > > > > > +/sys/class/uacce/<dev-name>/params
> > > > > > > > > +
> > > > > > > > > +The following attributes is supported:
> > > > > > > > > +
> > > > > > > > > +nr_queue_remained (ro)
> > > > > > > > > + number of queue remained
> > > > > > > > > +
> > > > > > > > > +api_version (ro)
> > > > > > > > > + a string to identify the queue mmap space format and its version
> > > > > > > > > +
> > > > > > > > > +device_attr (ro)
> > > > > > > > > + attributes of the device, see UACCE_DEV_xxx flag defined in uacce.h
> > > > > > > > > +
> > > > > > > > > +numa_node (ro)
> > > > > > > > > + id of numa node
> > > > > > > > > +
> > > > > > > > > +priority (rw)
> > > > > > > > > + Priority or the device, bigger is higher
> > > > > > > > > +
> > > > > > > > > +(This is not yet implemented in RFC version)
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +The kernel API
> > > > > > > > > +--------------
> > > > > > > > > +
> > > > > > > > > +The *uacce* kernel API is defined in uacce.h. If the hardware support SVM/SVA,
> > > > > > > > > +The driver need only the following API functions: ::
> > > > > > > > > +
> > > > > > > > > + int uacce_register(uacce);
> > > > > > > > > + void uacce_unregister(uacce);
> > > > > > > > > + void uacce_wake_up(q);
> > > > > > > > > +
> > > > > > > > > +*uacce_wake_up* is used to notify the process who epoll() on the queue file.
> > > > > > > > > +
> > > > > > > > > +According to the IOMMU capability, *uacce* categories the devices as follow:
> > > > > > > > > +
> > > > > > > > > +UACCE_DEV_NOIOMMU
> > > > > > > > > + The device has no IOMMU. The user process cannot use VA on the hardware
> > > > > > > > > + This mode is not recommended.
> > > > > > > > > +
> > > > > > > > > +UACCE_DEV_SVA (UACCE_DEV_PASID | UACCE_DEV_FAULT_FROM_DEV)
> > > > > > > > > + The device has IOMMU which can share the same page table with user
> > > > > > > > > + process
> > > > > > > > > +
> > > > > > > > > +UACCE_DEV_SHARE_DOMAIN
> > > > > > > > > + The device has IOMMU which has no multiple page table and device page
> > > > > > > > > + fault support
> > > > > > > > > +
> > > > > > > > > +If the device works in mode other than UACCE_DEV_NOIOMMU, *uacce* will set its
> > > > > > > > > +IOMMU to IOMMU_DOMAIN_UNMANAGED. So the driver must not use any kernel
> > > > > > > > > +DMA API but the following ones from *uacce* instead: ::
> > > > > > > > > +
> > > > > > > > > + uacce_dma_map(q, va, size, prot);
> > > > > > > > > + uacce_dma_unmap(q, va, size, prot);
> > > > > > > > > +
> > > > > > > > > +*uacce_dma_map/unmap* is valid only for UACCE_DEV_SVA device. It creates a
> > > > > > > > > +particular PASID and page table for the kernel in the IOMMU (Not yet
> > > > > > > > > +implemented in the RFC)
> > > > > > > > > +
> > > > > > > > > +For the UACCE_DEV_SHARE_DOMAIN device, uacce_dma_map/unmap is not valid.
> > > > > > > > > +*Uacce* call back start_queue only when the DUS and DKO region is mmapped. The
> > > > > > > > > +accelerator driver must use those dma buffer, via uacce_queue->qfrs[], on
> > > > > > > > > +start_queue call back. The size of the queue file region is defined by
> > > > > > > > > +uacce->ops->qf_pg_start[].
> > > > > > > > > +
> > > > > > > > > +We have to do it this way because most of current IOMMU cannot support the
> > > > > > > > > +kernel and user virtual address at the same time. So we have to let them both
> > > > > > > > > +share the same user virtual address space.
> > > > > > > > > +
> > > > > > > > > +If the device have to support kernel and user at the same time, both kernel
> > > > > > > > > +and the user should use these DMA API. This is not convenient. A better
> > > > > > > > > +solution is to change the future DMA/IOMMU design to let them separate the
> > > > > > > > > +address space between the user and kernel space. But it is not going to be in
> > > > > > > > > +a short time.
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +Multiple processes support
> > > > > > > > > +==========================
> > > > > > > > > +
> > > > > > > > > +In the latest mainline kernel (4.19) when this document is written, the IOMMU
> > > > > > > > > +subsystem do not support multiple process page tables yet.
> > > > > > > > > +
> > > > > > > > > +Most IOMMU hardware implementation support multi-process with the concept
> > > > > > > > > +of PASID. But they may use different name, e.g. it is call sub-stream-id in
> > > > > > > > > +SMMU of ARM. With PASID or similar design, multi page table can be added to
> > > > > > > > > +the IOMMU and referred by its PASID.
> > > > > > > > > +
> > > > > > > > > +*JPB* has a patchset to enable this[1]_. We have tested it with our hardware
> > > > > > > > > +(which is known as *D06*). It works well. *WarpDrive* rely on them to support
> > > > > > > > > +UACCE_DEV_SVA. If it is not enabled, *WarpDrive* can still work. But it
> > > > > > > > > +support only one process, the device will be set to UACCE_DEV_SHARE_DOMAIN
> > > > > > > > > +even it is set to UACCE_DEV_SVA initially.
> > > > > > > > > +
> > > > > > > > > +Static Share Virtual Memory is mainly used by UACCE_DEV_SHARE_DOMAIN device.
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +Legacy Mode Support
> > > > > > > > > +===================
> > > > > > > > > +For the hardware without IOMMU, WarpDrive can still work, the only problem is
> > > > > > > > > +VA cannot be used in the device. The driver should adopt another strategy for
> > > > > > > > > +the shared memory. It is only for testing, and not recommended.
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +The Folk Scenario
> > > > > > > > > +=================
> > > > > > > > > +For a process with allocated queues and shared memory, what happen if it forks
> > > > > > > > > +a child?
> > > > > > > > > +
> > > > > > > > > +The fd of the queue will be duplicated on folk, so the child can send request
> > > > > > > > > +to the same queue as its parent. But the requests which is sent from processes
> > > > > > > > > +except for the one who open the queue will be blocked.
> > > > > > > > > +
> > > > > > > > > +It is recommended to add O_CLOEXEC to the queue file.
> > > > > > > > > +
> > > > > > > > > +The queue mmap space has a VM_DONTCOPY in its VMA. So the child will lost all
> > > > > > > > > +those VMAs.
> > > > > > > > > +
> > > > > > > > > +This is why *WarpDrive* does not adopt the mode used in *VFIO* and *InfiniBand*.
> > > > > > > > > +Both solutions can set any user pointer for hardware sharing. But they cannot
> > > > > > > > > +support fork when the dma is in process. Or the "Copy-On-Write" procedure will
> > > > > > > > > +make the parent process lost its physical pages.
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +The Sample Code
> > > > > > > > > +===============
> > > > > > > > > +There is a sample user land implementation with a simple driver for Hisilicon
> > > > > > > > > +Hi1620 ZIP Accelerator.
> > > > > > > > > +
> > > > > > > > > +To test, do the following in samples/warpdrive (for the case of PC host): ::
> > > > > > > > > + ./autogen.sh
> > > > > > > > > + ./conf.sh # or simply ./configure if you build on target system
> > > > > > > > > + make
> > > > > > > > > +
> > > > > > > > > +Then you can get test_hisi_zip in the test subdirectory. Copy it to the target
> > > > > > > > > +system and make sure the hisi_zip driver is enabled (the major and minor of
> > > > > > > > > +the uacce chrdev can be gotten from the dmesg or sysfs), and run: ::
> > > > > > > > > + mknod /dev/ua1 c <major> <minior>
> > > > > > > > > + test/test_hisi_zip -z < data > data.zip
> > > > > > > > > + test/test_hisi_zip -g < data > data.gzip
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +References
> > > > > > > > > +==========
> > > > > > > > > +.. [1] https://patchwork.kernel.org/patch/10394851/
> > > > > > > > > +
> > > > > > > > > +.. vim: tw=78
> > > > > [...]
> > > > > > > > > --
> > > > > > > > > 2.17.1
> > > > > > > > >
> >
> > I don't know if Mr. Jerome Glisse in the list. I think I should cc him for my
> > respectation to his help on last RFC.
> >
> > - Kenneth

2018-11-20 06:51:39

by Jerome Glisse

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 01:11:56PM -0700, Jason Gunthorpe wrote:
> On Mon, Nov 19, 2018 at 02:46:32PM -0500, Jerome Glisse wrote:
>
> > > ?? How can O_DIRECT be fine but RDMA not? They use exactly the same
> > > get_user_pages flow, right? Can we do what O_DIRECT does in RDMA and
> > > be fine too?
> > >
> > > AFAIK the only difference is the length of the race window. You'd have
> > > to fork and fault during the shorter time O_DIRECT has get_user_pages
> > > open.
> >
> > Well in O_DIRECT case there is only one page table, the CPU
> > page table and it gets updated during fork() so there is an
> > ordering there and the race window is small.
>
> Not really, in O_DIRECT case there is another 'page table', we just
> call it a DMA scatter/gather list and it is sent directly to the block
> device's DMA HW. The sgl plays exactly the same role as the various HW
> page list data structures that underly RDMA MRs.
>
> It is not a page table that matters here, it is if the DMA address of
> the page is active for DMA on HW.
>
> Like you say, the only difference is that the race is hopefully small
> with O_DIRECT (though that is not really small, NVMeof for instance
> has windows as large as connection timeouts, if you try hard enough)
>
> So we probably can trigger this trouble with O_DIRECT and fork(), and
> I would call it a bug :(

I can not think of any scenario that would be a bug with O_DIRECT.
Do you have one in mind ? When you fork() and do other syscall that
affect the memory of your process in another thread you should
expect non consistant results. Kernel is not here to provide a fully
safe environement to user, user can shoot itself in the foot and
that's fine as long as it only affect the process itself and no one
else. We should not be in the business of making everything baby
proof :)

>
> > > Why? Keep track in each mm if there are any active get_user_pages
> > > FOLL_WRITE pages in the mm, if yes then sweep the VMAs and fix the
> > > issue for the FOLL_WRITE pages.
> >
> > This has a cost and you don't want to do it for O_DIRECT. I am pretty
> > sure that any such patch to modify fork() code path would be rejected.
> > At least i would not like it and vote against.
>
> I was thinking the incremental cost on top of what John is already
> doing would be very small in the common case and only be triggered in
> cases that matter (which apps should avoid anyhow).

What John is addressing has nothing to do with fork() it has to do with
GUP and filesystem page. More specificaly that after page_mkclean() all
filesystem expect that the page content is stable (ie no one write to
the page) with GUP and hardware (DIRECT_IO too) this is not necessarily
the case.

So John is trying to fix that. Not trying to make fork() baby proof
AFAICT :)

I rather keep saying that you should expect weird thing with RDMA and
VFIO when doing fork() than trying to work around this in the kernel.

Better behavior through hardware is what we should aim for (CAPI, ODP,
...).

J?r?me

2018-11-20 06:11:47

by Jerome Glisse

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 12:27:02PM -0700, Jason Gunthorpe wrote:
> On Mon, Nov 19, 2018 at 02:17:21PM -0500, Jerome Glisse wrote:
> > On Mon, Nov 19, 2018 at 11:53:33AM -0700, Jason Gunthorpe wrote:
> > > On Mon, Nov 19, 2018 at 01:42:16PM -0500, Jerome Glisse wrote:
> > > > On Mon, Nov 19, 2018 at 11:27:52AM -0700, Jason Gunthorpe wrote:
> > > > > On Mon, Nov 19, 2018 at 11:48:54AM -0500, Jerome Glisse wrote:
> > > > >
> > > > > > Just to comment on this, any infiniband driver which use umem and do
> > > > > > not have ODP (here ODP for me means listening to mmu notifier so all
> > > > > > infiniband driver except mlx5) will be affected by same issue AFAICT.
> > > > > >
> > > > > > AFAICT there is no special thing happening after fork() inside any of
> > > > > > those driver. So if parent create a umem mr before fork() and program
> > > > > > hardware with it then after fork() the parent might start using new
> > > > > > page for the umem range while the old memory is use by the child. The
> > > > > > reverse is also true (parent using old memory and child new memory)
> > > > > > bottom line you can not predict which memory the child or the parent
> > > > > > will use for the range after fork().
> > > > > >
> > > > > > So no matter what you consider the child or the parent, what the hw
> > > > > > will use for the mr is unlikely to match what the CPU use for the
> > > > > > same virtual address. In other word:
> > > > > >
> > > > > > Before fork:
> > > > > > CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
> > > > > > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
> > > > > >
> > > > > > Case 1:
> > > > > > CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
> > > > > > CPU child: virtual addr ptr1 -> physical address = 0xDEAD
> > > > > > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
> > > > > >
> > > > > > Case 2:
> > > > > > CPU parent: virtual addr ptr1 -> physical address = 0xBEEF
> > > > > > CPU child: virtual addr ptr1 -> physical address = 0xCAFE
> > > > > > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
> > > > >
> > > > > IIRC this is solved in IB by automatically calling
> > > > > madvise(MADV_DONTFORK) before creating the MR.
> > > > >
> > > > > MADV_DONTFORK
> > > > > .. This is useful to prevent copy-on-write semantics from changing the
> > > > > physical location of a page if the parent writes to it after a
> > > > > fork(2) ..
> > > >
> > > > This would work around the issue but this is not transparent ie
> > > > range marked with DONTFORK no longer behave as expected from the
> > > > application point of view.
> > >
> > > Do you know what the difference is? The man page really gives no
> > > hint..
> > >
> > > Does it sometimes unmap the pages during fork?
> >
> > It is handled in kernel/fork.c look for DONTCOPY, basicaly it just
> > leave empty page table in the child process so child will have to
> > fault in new page. This also means that child will get 0 as initial
> > value for all memory address under DONTCOPY/DONTFORK which breaks
> > application expectation of what fork() do.
>
> Hum, I wonder why this API was selected then..

Because there is nothing else ? :)

>
> > > I actually wonder if the kernel is a bit broken here, we have the same
> > > problem with O_DIRECT and other stuff, right?
> >
> > No it is not, O_DIRECT is fine. The only corner case i can think
> > of with O_DIRECT is one thread launching an O_DIRECT that write
> > to private anonymous memory (other O_DIRECT case do not matter)
> > while another thread call fork() then what the child get can be
> > undefined ie either it get the data before the O_DIRECT finish
> > or it gets the result of the O_DIRECT. But this is realy what
> > you should expect when doing such thing without synchronization.
> >
> > So O_DIRECT is fine.
>
> ?? How can O_DIRECT be fine but RDMA not? They use exactly the same
> get_user_pages flow, right? Can we do what O_DIRECT does in RDMA and
> be fine too?
>
> AFAIK the only difference is the length of the race window. You'd have
> to fork and fault during the shorter time O_DIRECT has get_user_pages
> open.

Well in O_DIRECT case there is only one page table, the CPU
page table and it gets updated during fork() so there is an
ordering there and the race window is small.

More over programmer knows that can get in trouble if they
do thing like fork() and don't synchronize their threads
with each other. So while some weird thing can happen with
O_DIRECT, it is unlikely (very small race window) and if
it happens its well within the expected behavior.

For hardware the race window is the same as the process
lifetime so it can be days, months, years ... Once the
hardware has programmed its page table they will never
see any update (again mlx5 ODP is the exception here).

This is where "issues" weird behavior can arise. Because
you use DONTFORK than you never see weird thing happening.
If you were to comment out DONTFORK then RDMA in the parent
might change data in the child (or the other way around ie
RDMA in the child might change data in the parent).


> > > Really, if I have a get_user_pages FOLL_WRITE on a page and we fork,
> > > then shouldn't the COW immediately be broken during the fork?
> > >
> > > The kernel can't guarentee that an ongoing DMA will not write to those
> > > pages, and it breaks the fork semantic to write to both processes.
> >
> > Fixing that would incur a high cost: need to grow struct page, need
> > to copy potentialy gigabyte of memory during fork() ... this would be
> > a serious performance regression for many folks just to work around an
> > abuse of device driver. So i don't think anything on that front would
> > be welcome.
>
> Why? Keep track in each mm if there are any active get_user_pages
> FOLL_WRITE pages in the mm, if yes then sweep the VMAs and fix the
> issue for the FOLL_WRITE pages.

This has a cost and you don't want to do it for O_DIRECT. I am pretty
sure that any such patch to modify fork() code path would be rejected.
At least i would not like it and vote against.

>
> John is already working on being able to detect pages under GUP, so it
> seems like a small step..

John is trying to fix serious bugs which can result in filesystem
corruption. It has a performance cost and thus i don't see that as
something we should pursue as a default solution. I posted patches
to remove get_user_page() from GPU driver and i intend to remove
as many GUP as i can (for hardware that can do the right thing).

To me it sounds better to reward good hardware rather than punish
everyone :)

>
> Since nearly all cases of fork don't have a GUP FOLL_WRITE active
> there would be no performance hit.
>
> > umem without proper ODP and VFIO are the only bad user i know of (for
> > VFIO you can argue that it is part of the API contract and thus that
> > it is not an abuse but it is not spell out loud in documentation). I
> > have been trying to push back on any people trying to push thing that
> > would make the same mistake or at least making sure they understand
> > what is happening.
>
> It is something we have to live with and support for the foreseeable
> future.

Yes for RDMA and VFIO, but i want to avoid any more new users hence
why i push back on any solution that have the same issues.

>
> > What really need to happen is people fixing their hardware and do the
> > right thing (good software engineer versus evil hardware engineer ;))
>
> Even ODP is no pancea, there are performance problems. What we really
> need is CAPI like stuff, so you will tell Intel to redesign the CPU??
> :)

I agree

J?r?me

2018-11-20 07:52:14

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 03:26:15PM -0500, Jerome Glisse wrote:
> On Mon, Nov 19, 2018 at 01:11:56PM -0700, Jason Gunthorpe wrote:
> > On Mon, Nov 19, 2018 at 02:46:32PM -0500, Jerome Glisse wrote:
> >
> > > > ?? How can O_DIRECT be fine but RDMA not? They use exactly the same
> > > > get_user_pages flow, right? Can we do what O_DIRECT does in RDMA and
> > > > be fine too?
> > > >
> > > > AFAIK the only difference is the length of the race window. You'd have
> > > > to fork and fault during the shorter time O_DIRECT has get_user_pages
> > > > open.
> > >
> > > Well in O_DIRECT case there is only one page table, the CPU
> > > page table and it gets updated during fork() so there is an
> > > ordering there and the race window is small.
> >
> > Not really, in O_DIRECT case there is another 'page table', we just
> > call it a DMA scatter/gather list and it is sent directly to the block
> > device's DMA HW. The sgl plays exactly the same role as the various HW
> > page list data structures that underly RDMA MRs.
> >
> > It is not a page table that matters here, it is if the DMA address of
> > the page is active for DMA on HW.
> >
> > Like you say, the only difference is that the race is hopefully small
> > with O_DIRECT (though that is not really small, NVMeof for instance
> > has windows as large as connection timeouts, if you try hard enough)
> >
> > So we probably can trigger this trouble with O_DIRECT and fork(), and
> > I would call it a bug :(
>
> I can not think of any scenario that would be a bug with O_DIRECT.
> Do you have one in mind ? When you fork() and do other syscall that
> affect the memory of your process in another thread you should
> expect non consistant results. Kernel is not here to provide a fully
> safe environement to user, user can shoot itself in the foot and
> that's fine as long as it only affect the process itself and no one
> else. We should not be in the business of making everything baby
> proof :)

Sure, I setup AIO with O_DIRECT and launch a read.

Then I fork and dirty the READ target memory using the CPU in the
child.

As you described in this case the fork will retain the physical page
that is undergoing O_DIRECT DMA, and the parent gets a new copy'd page.

The DMA completes, and the child gets the DMA'd to page. The parent
gets an unchanged copy'd page.

The parent gets the AIO completion, but can't see the data.

I'd call that a bug with O_DIRECT. The only correct outcome is that
the parent will always see the O_DIRECT data. Fork should not cause
the *parent* to malfunction. I agree the child cannot make any
prediction what memory it will see.

I assume the same flow is possible using threads and read()..

It is really no different than the RDMA bug with fork.

Jason

2018-11-20 22:48:24

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On 20/11/2018 09:16, Jonathan Cameron wrote:
> +CC Jean-Phillipe and iommu list.

Thanks for the Cc, sorry I don't have enough bandwidth to follow this
thread at the moment.

>>>>> In WarpDrive/uacce, we make this simple. If you support IOMMU and it support
>>>>> SVM/SVA. Everything will be fine just like ODP implicit mode. And you don't need
>>>>> to write any code for that. Because it has been done by IOMMU framework. If it
>>>>
>>>> Looks like the IOMMU code uses mmu_notifier, so it is identical to
>>>> IB's ODP. The only difference is that IB tends to have the IOMMU page
>>>> table in the device, not in the CPU.
>>>>
>>>> The only case I know if that is different is the new-fangled CAPI
>>>> stuff where the IOMMU can directly use the CPU's page table and the
>>>> IOMMU page table (in device or CPU) is eliminated.
>>>
>>> Yes. We are not focusing on the current implementation. As mentioned in the
>>> cover letter. We are expecting Jean Philips' SVA patch:
>>> git://linux-arm.org/linux-jpb.
>>
>> This SVA stuff does not look comparable to CAPI as it still requires
>> maintaining seperate IOMMU page tables.

With SVA, we use the same page tables in the IOMMU and CPU. It's the
same pgd pointer, there is no mirroring of mappings. We bind the process
page tables with the device using a PASID (Process Address Space ID).

After fork(), the child's mm is different from the parent's one, and is
not automatically bound to the device. The device driver will have to
issue a new bind() request, and the child mm will be bound with a
different PASID.

There could be a problem if the child inherits the parent's device
handle. Then depending on the device, the child could be able to program
DMA and possibly access the parent's address space. The parent needs to
be aware of that when using the bind() API, and close the device fd in
the child after fork().

We use MMU notifiers for some address space changes:

* The IOTLB needs to be invalidated after any unmap() to the process
address space. On Arm systems the SMMU IOTLBs can be invalidated by the
CPU TLBI instructions, but we still need to invalidate TLBs private to
devices that are arch-agnostic (Address Translation Cache in PCI ATS).

* When the process mm exits, we need to remove the associated PASID
configuration in the IOMMU and invalidate the TLBs.

Thanks,
Jean

2018-11-20 15:45:04

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Tue, Nov 20, 2018 at 11:07:02AM +0800, Kenneth Lee wrote:
> On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote:
> > Date: Mon, 19 Nov 2018 11:49:54 -0700
> > From: Jason Gunthorpe <[email protected]>
> > To: Kenneth Lee <[email protected]>
> > CC: Leon Romanovsky <[email protected]>, Kenneth Lee <[email protected]>,
> > Tim Sell <[email protected]>, [email protected], Alexander
> > Shishkin <[email protected]>, Zaibo Xu
> > <[email protected]>, [email protected], [email protected],
> > [email protected], Christoph Lameter <[email protected]>, Hao Fang
> > <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> > list <[email protected]>, Zhou Wang <[email protected]>,
> > Doug Ledford <[email protected]>, Uwe Kleine-K?nig
> > <[email protected]>, David Kershner
> > <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> > Pitchen <[email protected]>, Sagar Dharia
> > <[email protected]>, Jens Axboe <[email protected]>,
> > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > <[email protected]>, [email protected], Vinod Koul
> > <[email protected]>, [email protected], Philippe Ombredanne
> > <[email protected]>, Sanyog Kale <[email protected]>, "David S.
> > Miller" <[email protected]>, [email protected]
> > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > User-Agent: Mutt/1.9.4 (2018-02-28)
> > Message-ID: <[email protected]>
> >
> > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> >
> > > If the hardware cannot share page table with the CPU, we then need to have
> > > some way to change the device page table. This is what happen in ODP. It
> > > invalidates the page table in device upon mmu_notifier call back. But this cannot
> > > solve the COW problem: if the user process A share a page P with device, and A
> > > forks a new process B, and it continue to write to the page. By COW, the
> > > process B will keep the page P, while A will get a new page P'. But you have
> > > no way to let the device know it should use P' rather than P.
> >
> > Is this true? I thought mmu_notifiers covered all these cases.
> >
> > The mm_notifier for A should fire if B causes the physical address of
> > A's pages to change via COW.
> >
> > And this causes the device page tables to re-synchronize.
>
> I don't see such code. The current do_cow_fault() implemenation has nothing to
> do with mm_notifer.
>
> >
> > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it support
> > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you don't need
> > > to write any code for that. Because it has been done by IOMMU framework. If it
> >
> > Looks like the IOMMU code uses mmu_notifier, so it is identical to
> > IB's ODP. The only difference is that IB tends to have the IOMMU page
> > table in the device, not in the CPU.
> >
> > The only case I know if that is different is the new-fangled CAPI
> > stuff where the IOMMU can directly use the CPU's page table and the
> > IOMMU page table (in device or CPU) is eliminated.
> >
>
> Yes. We are not focusing on the current implementation. As mentioned in the
> cover letter. We are expecting Jean Philips' SVA patch:
> git://linux-arm.org/linux-jpb.
>
> > Anyhow, I don't think a single instance of hardware should justify an
> > entire new subsystem. Subsystems are hard to make and without multiple
> > hardware examples there is no way to expect that it would cover any
> > future use cases.
>
> Yes. That's our first expectation. We can keep it with our driver. But because
> there is no user driver support for any accelerator in mainline kernel. Even the
> well known QuickAssit has to be maintained out of tree. So we try to see if
> people is interested in working together to solve the problem.
>
> >
> > If all your driver needs is to mmap some PCI bar space, route
> > interrupts and do DMA mapping then mediated VFIO is probably a good
> > choice.
>
> Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion and
> try not to add complexity to the mm subsystem.
>
> >
> > If it needs to do a bunch of other stuff, not related to PCI bar
> > space, interrupts and DMA mapping (ie special code for compression,
> > crypto, AI, whatever) then you should probably do what Jerome said and
> > make a drivers/char/hisillicon_foo_bar.c that exposes just what your
> > hardware does.
>
> Yes. If no other accelerator driver writer is interested. That is the
> expectation:)
>
> But we really like to have a public solution here. Consider this scenario:
>
> You create some connections (queues) to NIC, RSA, and AI engine. Then you got
> data direct from the NIC and pass the pointer to RSA engine for decryption. The
> CPU then finish some data taking or operation and then pass through to the AI
> engine for CNN calculation....This will need a place to maintain the same
> address space by some means.

You are using NIC terminology, in the documentation, you wrote that it is needed
for DPDK use and I don't really understand, why do we need another shiny new
interface for DPDK.

>
> It is not complex, but it is helpful.
>
> >
> > If you have networking involved in here then consider RDMA,
> > particularly if this functionality is already part of the same
> > hardware that the hns infiniband driver is servicing.
> >
> > 'computational MRs' are a reasonable approach to a side-car offload of
> > already existing RDMA support.
>
> OK. Thanks. I will spend some time on it. But personally, I really don't like
> RDMA's complexity. I cannot even try one single function without a...some
> expensive hardwares and complexity connection in the lab. This is not like a
> open source way.

It is not very accurate. We have RXE driver which is virtual RDMA device
which is implemented purely in SW. It struggles from bad performance and
sporadic failures, but it is enough to try RDMA on your laptop in VM.

Thanks

>
> >
> > Jason
>


Attachments:
(No filename) (6.01 kB)
signature.asc (801.00 B)
Download all attachments

2018-11-12 17:52:57

by Kenneth Lee

[permalink] [raw]
Subject: [RFCv3 PATCH 5/6] crypto: add uacce support to Hisilicon qm

From: Kenneth Lee <[email protected]>

This patch add uacce support to the Hislicon QM driver, any accelerator
that use QM can share its queues to the user space.

Signed-off-by: Kenneth Lee <[email protected]>
Signed-off-by: Zhou Wang <[email protected]>
Signed-off-by: Hao Fang <[email protected]>
Signed-off-by: Zaibo Xu <[email protected]>
---
drivers/crypto/hisilicon/Kconfig | 7 +
drivers/crypto/hisilicon/qm.c | 227 +++++++++++++++++++++---
drivers/crypto/hisilicon/qm.h | 16 +-
drivers/crypto/hisilicon/zip/zip_main.c | 27 ++-
4 files changed, 249 insertions(+), 28 deletions(-)

diff --git a/drivers/crypto/hisilicon/Kconfig b/drivers/crypto/hisilicon/Kconfig
index ce9deefbf037..819e4995f361 100644
--- a/drivers/crypto/hisilicon/Kconfig
+++ b/drivers/crypto/hisilicon/Kconfig
@@ -16,6 +16,13 @@ config CRYPTO_DEV_HISI_QM
tristate
depends on ARM64 && PCI

+config CRYPTO_QM_UACCE
+ bool "enable UACCE support for all acceleartor with Hisi QM"
+ depends on CRYPTO_DEV_HISI_QM
+ select UACCE
+ help
+ Support UACCE interface in Hisi QM.
+
config CRYPTO_DEV_HISI_ZIP
tristate "Support for HISI ZIP Driver"
depends on ARM64
diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
index 5b810a6f4dd5..750d8c069d92 100644
--- a/drivers/crypto/hisilicon/qm.c
+++ b/drivers/crypto/hisilicon/qm.c
@@ -5,6 +5,7 @@
#include <linux/io.h>
#include <linux/irqreturn.h>
#include <linux/log2.h>
+#include <linux/uacce.h>
#include "qm.h"

#define QM_DEF_Q_NUM 128
@@ -435,17 +436,29 @@ int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long arg)
qp->sqc_dma = qm->sqc_dma + qp_index * sizeof(struct sqc);
qp->cqc_dma = qm->cqc_dma + qp_index * sizeof(struct cqc);

- qp->qdma.size = qm->sqe_size * QM_Q_DEPTH +
- sizeof(struct cqe) * QM_Q_DEPTH,
- qp->qdma.va = dma_alloc_coherent(dev, qp->qdma.size,
- &qp->qdma.dma,
- GFP_KERNEL | __GFP_ZERO);
- dev_dbg(dev, "allocate qp dma buf(va=%p, dma=%pad, size=%lx)\n",
- qp->qdma.va, &qp->qdma.dma, qp->qdma.size);
+ if (qm->uacce_mode) {
+ dev_dbg(dev, "User shared DMA Buffer used: (%lx/%x)\n",
+ off, QM_DUS_PAGE_NR << PAGE_SHIFT);
+ if (off > (QM_DUS_PAGE_NR << PAGE_SHIFT))
+ return -EINVAL;
+ } else {
+
+ /*
+ * todo: we are using dma api here. it should be updated to
+ * uacce api for user and kernel mode working at the same time
+ */
+ qp->qdma.size = qm->sqe_size * QM_Q_DEPTH +
+ sizeof(struct cqe) * QM_Q_DEPTH,
+ qp->qdma.va = dma_alloc_coherent(dev, qp->qdma.size,
+ &qp->qdma.dma,
+ GFP_KERNEL | __GFP_ZERO);
+ dev_dbg(dev, "allocate qp dma buf(va=%p, dma=%pad, size=%lx)\n",
+ qp->qdma.va, &qp->qdma.dma, qp->qdma.size);
+ }

if (!qp->qdma.va) {
dev_err(dev, "cannot get qm dma buffer\n");
- return -ENOMEM;
+ return qm->uacce_mode ? -EINVAL : -ENOMEM;
}

QP_INIT_BUF(qp, sqe, qm->sqe_size * QM_Q_DEPTH);
@@ -491,7 +504,8 @@ void hisi_qm_release_qp(struct hisi_qp *qp)
bitmap_clear(qm->qp_bitmap, qid, 1);
write_unlock(&qm->qps_lock);

- dma_free_coherent(dev, qdma->size, qdma->va, qdma->dma);
+ if (!qm->uacce_mode)
+ dma_free_coherent(dev, qdma->size, qdma->va, qdma->dma);

kfree(qp);
}
@@ -535,6 +549,149 @@ int hisi_qp_send(struct hisi_qp *qp, void *msg)
}
EXPORT_SYMBOL_GPL(hisi_qp_send);

+#ifdef CONFIG_CRYPTO_QM_UACCE
+static void qm_qp_event_notifier(struct hisi_qp *qp)
+{
+ uacce_wake_up(qp->uacce_q);
+}
+
+static int hisi_qm_get_queue(struct uacce *uacce, unsigned long arg,
+ struct uacce_queue **q)
+{
+ struct qm_info *qm = uacce->priv;
+ struct hisi_qp *qp = NULL;
+ struct uacce_queue *wd_q;
+ u8 alg_type = 0; /* fix me here */
+ int ret;
+
+ qp = hisi_qm_create_qp(qm, alg_type);
+ if (IS_ERR(qp))
+ return PTR_ERR(qp);
+
+ wd_q = kzalloc(sizeof(struct uacce_queue), GFP_KERNEL);
+ if (!wd_q) {
+ ret = -ENOMEM;
+ goto err_with_qp;
+ }
+
+ wd_q->priv = qp;
+ wd_q->uacce = uacce;
+ *q = wd_q;
+ qp->uacce_q = wd_q;
+ qp->event_cb = qm_qp_event_notifier;
+ qp->pasid = arg;
+
+ return 0;
+
+err_with_qp:
+ hisi_qm_release_qp(qp);
+ return ret;
+}
+
+static void hisi_qm_put_queue(struct uacce_queue *q)
+{
+ struct hisi_qp *qp = q->priv;
+
+ /* need to stop hardware, but can not support in v1 */
+ hisi_qm_release_qp(qp);
+ kfree(q);
+}
+
+/* map sq/cq/doorbell to user space */
+static int hisi_qm_mmap(struct uacce_queue *q,
+ struct vm_area_struct *vma)
+{
+ struct hisi_qp *qp = (struct hisi_qp *)q->priv;
+ struct qm_info *qm = qp->qm;
+ size_t sz = vma->vm_end - vma->vm_start;
+ u8 region;
+
+ region = vma->vm_pgoff;
+
+ switch (region) {
+ case 0:
+ if (sz > PAGE_SIZE)
+ return -EINVAL;
+
+ vma->vm_flags |= VM_IO;
+ /*
+ * Warning: This is not safe as multiple queues use the same
+ * doorbell, v1 hardware interface problem. will fix it in v2
+ */
+ return remap_pfn_range(vma, vma->vm_start,
+ qm->phys_base >> PAGE_SHIFT,
+ sz, pgprot_noncached(vma->vm_page_prot));
+
+ default:
+ return -EINVAL;
+ }
+}
+
+static int hisi_qm_start_queue(struct uacce_queue *q)
+{
+ int ret;
+ struct qm_info *qm = q->uacce->priv;
+ struct hisi_qp *qp = (struct hisi_qp *)q->priv;
+
+ /* todo: we don't need to start qm here in SVA version */
+ qm->qdma.dma = q->qfrs[UACCE_QFRT_DKO]->iova;
+ qm->qdma.va = q->qfrs[UACCE_QFRT_DKO]->kaddr;
+
+ ret = hisi_qm_start(qm);
+ if (ret)
+ return ret;
+
+ qp->qdma.dma = q->qfrs[UACCE_QFRT_DUS]->iova;
+ qp->qdma.va = q->qfrs[UACCE_QFRT_DUS]->kaddr;
+ ret = hisi_qm_start_qp(qp, qp->pasid);
+ if (ret)
+ hisi_qm_stop(qm);
+
+ return 0;
+}
+
+static void hisi_qm_stop_queue(struct uacce_queue *q)
+{
+ struct qm_info *qm = q->uacce->priv;
+
+ /* todo: we don't need to stop qm in SVA version */
+ hisi_qm_stop(qm);
+}
+
+/*
+ * the device is set the UACCE_DEV_SVA, but it will be cut if SVA patch is not
+ * available
+ */
+static struct uacce_ops uacce_qm_ops = {
+ .owner = THIS_MODULE,
+ .flags = UACCE_DEV_SVA | UACCE_DEV_KMAP_DUS,
+ .api_ver = "hisi_qm_v1",
+ .qf_pg_start = {QM_DOORBELL_PAGE_NR,
+ QM_DOORBELL_PAGE_NR + QM_DKO_PAGE_NR,
+ QM_DOORBELL_PAGE_NR + QM_DKO_PAGE_NR + QM_DUS_PAGE_NR},
+
+ .get_queue = hisi_qm_get_queue,
+ .put_queue = hisi_qm_put_queue,
+ .start_queue = hisi_qm_start_queue,
+ .stop_queue = hisi_qm_stop_queue,
+ .mmap = hisi_qm_mmap,
+};
+
+static int qm_register_uacce(struct qm_info *qm)
+{
+ struct pci_dev *pdev = qm->pdev;
+ struct uacce *uacce = &qm->uacce;
+
+ uacce->name = dev_name(&pdev->dev);
+ uacce->dev = &pdev->dev;
+ uacce->is_vf = pdev->is_virtfn;
+ uacce->priv = qm;
+ uacce->ops = &uacce_qm_ops;
+
+ return uacce_register(uacce);
+}
+#endif
+
static irqreturn_t qm_irq(int irq, void *data)
{
struct qm_info *qm = data;
@@ -635,21 +792,34 @@ int hisi_qm_init(struct qm_info *qm)
}
}

- qm->qdma.size = max_t(size_t, sizeof(struct eqc),
- sizeof(struct aeqc)) +
- sizeof(struct eqe) * QM_Q_DEPTH +
- sizeof(struct sqc) * qm->qp_num +
- sizeof(struct cqc) * qm->qp_num;
- qm->qdma.va = dma_alloc_coherent(dev, qm->qdma.size,
- &qm->qdma.dma,
- GFP_KERNEL | __GFP_ZERO);
- dev_dbg(dev, "allocate qm dma buf(va=%p, dma=%pad, size=%lx)\n",
- qm->qdma.va, &qm->qdma.dma, qm->qdma.size);
- ret = qm->qdma.va ? 0 : -ENOMEM;
+ if (qm->uacce_mode) {
+#ifdef CONFIG_CRYPTO_QM_UACCE
+ ret = qm_register_uacce(qm);
+#else
+ dev_err(dev, "qm uacce feature is not enabled\n");
+ ret = -EINVAL;
+#endif
+
+ } else {
+ qm->qdma.size = max_t(size_t, sizeof(struct eqc),
+ sizeof(struct aeqc)) +
+ sizeof(struct eqe) * QM_Q_DEPTH +
+ sizeof(struct sqc) * qm->qp_num +
+ sizeof(struct cqc) * qm->qp_num;
+ qm->qdma.va = dma_alloc_coherent(dev, qm->qdma.size,
+ &qm->qdma.dma,
+ GFP_KERNEL | __GFP_ZERO);
+ dev_dbg(dev, "allocate qm dma buf(va=%p, dma=%pad, size=%lx)\n",
+ qm->qdma.va, &qm->qdma.dma, qm->qdma.size);
+ ret = qm->qdma.va ? 0 : -ENOMEM;
+ }

if (ret)
goto err_with_irq;

+ dev_dbg(dev, "init qm %s to %s mode\n", pdev->is_physfn ? "pf" : "vf",
+ qm->uacce_mode ? "uacce" : "crypto");
+
return 0;

err_with_irq:
@@ -669,7 +839,13 @@ void hisi_qm_uninit(struct qm_info *qm)
{
struct pci_dev *pdev = qm->pdev;

- dma_free_coherent(&pdev->dev, qm->qdma.size, qm->qdma.va, qm->qdma.dma);
+ if (qm->uacce_mode) {
+#ifdef CONFIG_CRYPTO_QM_UACCE
+ uacce_unregister(&qm->uacce);
+#endif
+ } else
+ dma_free_coherent(&pdev->dev, qm->qdma.size, qm->qdma.va,
+ qm->qdma.dma);

devm_free_irq(&pdev->dev, pci_irq_vector(pdev, 0), qm);
pci_free_irq_vectors(pdev);
@@ -690,7 +866,7 @@ int hisi_qm_start(struct qm_info *qm)
} while (0)

if (!qm->qdma.va)
- return -EINVAL;
+ return qm->uacce_mode ? 0 : -EINVAL;

if (qm->pdev->is_physfn)
qm->ops->vft_config(qm, qm->qp_base, qm->qp_num);
@@ -705,6 +881,13 @@ int hisi_qm_start(struct qm_info *qm)
QM_INIT_BUF(qm, eqc,
max_t(size_t, sizeof(struct eqc), sizeof(struct aeqc)));

+ if (qm->uacce_mode) {
+ dev_dbg(&qm->pdev->dev, "kernel-only buffer used (0x%lx/0x%x)\n",
+ off, QM_DKO_PAGE_NR << PAGE_SHIFT);
+ if (off > (QM_DKO_PAGE_NR << PAGE_SHIFT))
+ return -EINVAL;
+ }
+
qm->eqc->base_l = lower_32_bits(qm->eqe_dma);
qm->eqc->base_h = upper_32_bits(qm->eqe_dma);
qm->eqc->dw3 = 2 << MB_EQC_EQE_SHIFT;
diff --git a/drivers/crypto/hisilicon/qm.h b/drivers/crypto/hisilicon/qm.h
index 6d124d948738..81b0b8c1f0b0 100644
--- a/drivers/crypto/hisilicon/qm.h
+++ b/drivers/crypto/hisilicon/qm.h
@@ -9,6 +9,10 @@
#include <linux/slab.h>
#include "qm_usr_if.h"

+#ifdef CONFIG_CRYPTO_QM_UACCE
+#include <linux/uacce.h>
+#endif
+
/* qm user domain */
#define QM_ARUSER_M_CFG_1 0x100088
#define QM_ARUSER_M_CFG_ENABLE 0x100090
@@ -146,6 +150,12 @@ struct qm_info {
struct mutex mailbox_lock;

struct hisi_acc_qm_hw_ops *ops;
+
+ bool uacce_mode;
+
+#ifdef CONFIG_CRYPTO_QM_UACCE
+ struct uacce uacce;
+#endif
};
#define QM_ADDR(qm, off) ((qm)->io_base + off)

@@ -186,6 +196,10 @@ struct hisi_qp {

struct qm_info *qm;

+#ifdef CONFIG_CRYPTO_QM_UACCE
+ struct uacce_queue *uacce_q;
+#endif
+
/* for crypto sync API */
struct completion completion;

@@ -197,7 +211,7 @@ struct hisi_qp {

/* QM external interface for accelerator driver.
* To use qm:
- * 1. Set qm with pdev, and sqe_size set accordingly
+ * 1. Set qm with pdev, uacce_mode, and sqe_size set accordingly
* 2. hisi_qm_init()
* 3. config the accelerator hardware
* 4. hisi_qm_start()
diff --git a/drivers/crypto/hisilicon/zip/zip_main.c b/drivers/crypto/hisilicon/zip/zip_main.c
index f4f3b6d89340..f5fcd0f4b836 100644
--- a/drivers/crypto/hisilicon/zip/zip_main.c
+++ b/drivers/crypto/hisilicon/zip/zip_main.c
@@ -5,6 +5,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/pci.h>
+#include <linux/uacce.h>
#include "zip.h"
#include "zip_crypto.h"

@@ -28,6 +29,9 @@
LIST_HEAD(hisi_zip_list);
DEFINE_MUTEX(hisi_zip_list_lock);

+static bool uacce_mode;
+module_param(uacce_mode, bool, 0);
+
static const char hisi_zip_name[] = "hisi_zip";

static const struct pci_device_id hisi_zip_dev_ids[] = {
@@ -96,6 +100,7 @@ static int hisi_zip_probe(struct pci_dev *pdev, const struct pci_device_id *id)

qm = &hisi_zip->qm;
qm->pdev = pdev;
+ qm->uacce_mode = uacce_mode;
qm->qp_base = HZIP_PF_DEF_Q_BASE;
qm->qp_num = HZIP_PF_DEF_Q_NUM;
qm->sqe_size = HZIP_SQE_SIZE;
@@ -150,10 +155,21 @@ static int __init hisi_zip_init(void)
return ret;
}

- ret = hisi_zip_register_to_crypto();
- if (ret < 0) {
- pci_unregister_driver(&hisi_zip_pci_driver);
- return ret;
+ /* todo:
+ *
+ * Before JPB's SVA patch is enabled, SMMU/IOMMU cannot support PASID.
+ * When it is accepted in the mainline kernel, we can add a
+ * IOMMU_DOMAIN_DAUL mode to IOMMU, then the dma and iommu API can
+ * work together. We then can let crypto and uacce mode works at the
+ * same time.
+ */
+ if (!uacce_mode) {
+ pr_debug("hisi_zip: init crypto mode\n");
+ ret = hisi_zip_register_to_crypto();
+ if (ret < 0) {
+ pci_unregister_driver(&hisi_zip_pci_driver);
+ return ret;
+ }
}

return 0;
@@ -161,7 +177,8 @@ static int __init hisi_zip_init(void)

static void __exit hisi_zip_exit(void)
{
- hisi_zip_unregister_from_crypto();
+ if (!uacce_mode)
+ hisi_zip_unregister_from_crypto();
pci_unregister_driver(&hisi_zip_pci_driver);
}

--
2.17.1

2018-11-23 18:44:17

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Wed, Nov 21, 2018 at 07:58:40PM -0700, Jason Gunthorpe wrote:
> Date: Wed, 21 Nov 2018 19:58:40 -0700
> From: Jason Gunthorpe <[email protected]>
> To: Kenneth Lee <[email protected]>
> CC: Leon Romanovsky <[email protected]>, Kenneth Lee <[email protected]>,
> Tim Sell <[email protected]>, [email protected], Alexander
> Shishkin <[email protected]>, Zaibo Xu
> <[email protected]>, [email protected], [email protected],
> [email protected], Christoph Lameter <[email protected]>, Hao Fang
> <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> list <[email protected]>, Zhou Wang <[email protected]>,
> Doug Ledford <[email protected]>, Uwe Kleine-K?nig
> <[email protected]>, David Kershner
> <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> Pitchen <[email protected]>, Sagar Dharia
> <[email protected]>, Jens Axboe <[email protected]>,
> [email protected], linux-netdev <[email protected]>, Randy Dunlap
> <[email protected]>, [email protected], Vinod Koul
> <[email protected]>, [email protected], Philippe Ombredanne
> <[email protected]>, Sanyog Kale <[email protected]>, "David S.
> Miller" <[email protected]>, [email protected]
> Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> User-Agent: Mutt/1.9.4 (2018-02-28)
> Message-ID: <[email protected]>
>
> On Wed, Nov 21, 2018 at 02:08:05PM +0800, Kenneth Lee wrote:
>
> > > But considering Jean's SVA stuff seems based on mmu notifiers, I have
> > > a hard time believing that it has any different behavior from RDMA's
> > > ODP, and if it does have different behavior, then it is probably just
> > > a bug in the ODP implementation.
> >
> > As Jean has explained, his solution is based on page table sharing. I think ODP
> > should also consider this new feature.
>
> Shared page tables would require the HW to walk the page table format
> of the CPU directly, not sure how that would be possible for ODP?
>
> Presumably the implementation for ARM relies on the IOMMU hardware
> doing this?

Yes, that is the idea. And since Jean is merging the AMD and Intel solution
together, I assume they can do the same. This is also the reason I want to solve
my problem on top of IOMMU directly. But anyway, let me try to see if I can
merge the logic with ODP.

>
> > > > > If all your driver needs is to mmap some PCI bar space, route
> > > > > interrupts and do DMA mapping then mediated VFIO is probably a good
> > > > > choice.
> > > >
> > > > Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion and
> > > > try not to add complexity to the mm subsystem.
> > >
> > > Why would a mediated VFIO driver touch the mm subsystem? Sounds like
> > > you don't have a VFIO driver if it needs to do stuff like that...
> >
> > VFIO has no ODP-like solution, and if we want to solve the fork problem, we have
> > to make some change to iommu and the fork procedure. Further, VFIO takes every
> > queue as a independent device. This create a lot of trouble on resource
> > management. For example, you will need a manager process to withdraw the unused
> > device and you need to let the user process know about PASID of the queue, and
> > so on.
>
> Well, I would think you'd add SVA support to the VFIO driver as a
> generic capability - it seems pretty useful for any VFIO user as it
> avoids all the kernel upcalls to do memory pinning and DMA address
> translation.

It is already part of Jean's patchset. And that's why I built my solution on
VFIO in the first place. But I think the concept of SVA and PASID is not
compatible with the original VFIO concept space. You would not share your whole
address space to a device at all in a virtual machine manager, wouldn't you? And
if you can manage to have a separated mdev for your virtual machine, why bother
to set a PASID to it? The answer to those problem, I think, will be Intel's
Scalable IO Virtualization. For accelerator, the requirement is simply: getting
a handle to device, attaching the process's mm with the handle by sharing the
process's page table with its iommu indexed by PASID, and start the
communication...

>
> Once the VFIO driver knows about this as a generic capability then the
> device it exposes to userspace would use CPU addresses instead of DMA
> addresses.
>
> The question is if your driver needs much more than the device
> agnostic generic services VFIO provides.
>
> I'm not sure what you have in mind with resource management.. It is
> hard to revoke resources from userspace, unless you are doing
> kernel syscalls, but then why do all this?

Say, I have 1024 queues in my accelerator. I can get one by opening the device
and attach it with the fd. If the process exit by any means, the queue can be
returned with the release of the fd. But if it is mdev, it will still be there
and some one should tell the allocator it is available again. This is not easy
to design in user space.

>
> Jason

--

2018-11-12 17:52:16

by Kenneth Lee

[permalink] [raw]
Subject: [RFCv3 PATCH 3/6] crypto/hisilicon: add hisilicon Queue Manager driver

From: Kenneth Lee <[email protected]>

Hisilicon QM is a general IP used by some Hisilicon accelerators. It
provides a general PCIE device interface for the CPU and the accelerator
to share a group of queues.

QM is implemented as an End Point of the virtual PCI-E bus in Hisilicon
SoC. It is actually a interface wrapper of ARM SMMU and GIC standard.

A QM provides 1024 channels, namely queue pairs, to the CPU. It also
support SR-IOV with maximum 64 VFs. The queue pairs can be distributed
to the PF or VFs according to configuration to the PF. The hardware
configuration and notification is done in the VF/PF BAR space, while the
queue elements are stored in memory. Software application sends
requests as messages to the accelerator via the queue protocol, which is
FIFO ring buffer protocol based on share memory.

The accelerator integrating a QM, has its own vendor, device ID, and
message format. The QM just forwards the message accordingly. Every QM
has its own SMMU (Unit) for address translation. It does not support
ATS/PRI, but it support SMMU stall mode. So it can do page fault from
the devices side.

This patch includes a library used by the accelerator driver to access
the QM hardware.

Signed-off-by: Kenneth Lee <[email protected]>
Signed-off-by: Zhou Wang <[email protected]>
Signed-off-by: Hao Fang <[email protected]>
---
drivers/crypto/hisilicon/Kconfig | 5 +-
drivers/crypto/hisilicon/Makefile | 1 +
drivers/crypto/hisilicon/qm.c | 743 +++++++++++++++++++++++++++
drivers/crypto/hisilicon/qm.h | 213 ++++++++
drivers/crypto/hisilicon/qm_usr_if.h | 32 ++
5 files changed, 993 insertions(+), 1 deletion(-)
create mode 100644 drivers/crypto/hisilicon/qm.c
create mode 100644 drivers/crypto/hisilicon/qm.h
create mode 100644 drivers/crypto/hisilicon/qm_usr_if.h

diff --git a/drivers/crypto/hisilicon/Kconfig b/drivers/crypto/hisilicon/Kconfig
index 8ca9c503bcb0..0e40f4a6666b 100644
--- a/drivers/crypto/hisilicon/Kconfig
+++ b/drivers/crypto/hisilicon/Kconfig
@@ -1,5 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
-
config CRYPTO_DEV_HISI_SEC
tristate "Support for Hisilicon SEC crypto block cipher accelerator"
select CRYPTO_BLKCIPHER
@@ -12,3 +11,7 @@ config CRYPTO_DEV_HISI_SEC

To compile this as a module, choose M here: the module
will be called hisi_sec.
+
+config CRYPTO_DEV_HISI_QM
+ tristate
+ depends on ARM64 && PCI
diff --git a/drivers/crypto/hisilicon/Makefile b/drivers/crypto/hisilicon/Makefile
index 463f46ace182..05e9052e0f52 100644
--- a/drivers/crypto/hisilicon/Makefile
+++ b/drivers/crypto/hisilicon/Makefile
@@ -1,2 +1,3 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_CRYPTO_DEV_HISI_SEC) += sec/
+obj-$(CONFIG_CRYPTO_DEV_HISI_QM) += qm.o
diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
new file mode 100644
index 000000000000..5b810a6f4dd5
--- /dev/null
+++ b/drivers/crypto/hisilicon/qm.c
@@ -0,0 +1,743 @@
+// SPDX-License-Identifier: GPL-2.0+
+#include <asm/page.h>
+#include <linux/bitmap.h>
+#include <linux/dma-mapping.h>
+#include <linux/io.h>
+#include <linux/irqreturn.h>
+#include <linux/log2.h>
+#include "qm.h"
+
+#define QM_DEF_Q_NUM 128
+
+/* eq/aeq irq enable */
+#define QM_VF_AEQ_INT_SOURCE 0x0
+#define QM_VF_AEQ_INT_MASK 0x4
+#define QM_VF_EQ_INT_SOURCE 0x8
+#define QM_VF_EQ_INT_MASK 0xc
+
+/* mailbox */
+#define MAILBOX_CMD_SQC 0x0
+#define MAILBOX_CMD_CQC 0x1
+#define MAILBOX_CMD_EQC 0x2
+#define MAILBOX_CMD_AEQC 0x3
+#define MAILBOX_CMD_SQC_BT 0x4
+#define MAILBOX_CMD_CQC_BT 0x5
+
+#define MAILBOX_CMD_SEND_BASE 0x300
+#define MAILBOX_EVENT_SHIFT 8
+#define MAILBOX_STATUS_SHIFT 9
+#define MAILBOX_BUSY_SHIFT 13
+#define MAILBOX_OP_SHIFT 14
+#define MAILBOX_QUEUE_SHIFT 16
+
+/* sqc shift */
+#define SQ_HEAD_SHIFT 0
+#define SQ_TAIL_SHIFT 16
+#define SQ_HOP_NUM_SHIFT 0
+#define SQ_PAGE_SIZE_SHIFT 4
+#define SQ_BUF_SIZE_SHIFT 8
+#define SQ_SQE_SIZE_SHIFT 12
+#define SQ_HEAD_IDX_SIG_SHIFT 0
+#define SQ_TAIL_IDX_SIG_SHIFT 0
+#define SQ_CQN_SHIFT 0
+#define SQ_PRIORITY_SHIFT 0
+#define SQ_ORDERS_SHIFT 4
+#define SQ_TYPE_SHIFT 8
+
+#define SQ_TYPE_MASK 0xf
+
+/* cqc shift */
+#define CQ_HEAD_SHIFT 0
+#define CQ_TAIL_SHIFT 16
+#define CQ_HOP_NUM_SHIFT 0
+#define CQ_PAGE_SIZE_SHIFT 4
+#define CQ_BUF_SIZE_SHIFT 8
+#define CQ_SQE_SIZE_SHIFT 12
+#define CQ_PASID 0
+#define CQ_HEAD_IDX_SIG_SHIFT 0
+#define CQ_TAIL_IDX_SIG_SHIFT 0
+#define CQ_CQN_SHIFT 0
+#define CQ_PRIORITY_SHIFT 16
+#define CQ_ORDERS_SHIFT 0
+#define CQ_TYPE_SHIFT 0
+#define CQ_PHASE_SHIFT 0
+#define CQ_FLAG_SHIFT 1
+
+#define CQC_HEAD_INDEX(cqc) ((cqc)->cq_head)
+#define CQC_PHASE(cqc) (((cqc)->dw6) & 0x1)
+#define CQC_CQ_ADDRESS(cqc) (((u64)((cqc)->cq_base_h) << 32) | \
+ ((cqc)->cq_base_l))
+#define CQC_PHASE_BIT 0x1
+
+/* eqc shift */
+#define MB_EQC_EQE_SHIFT 12
+#define MB_EQC_PHASE_SHIFT 16
+
+#define EQC_HEAD_INDEX(eqc) ((eqc)->eq_head)
+#define EQC_TAIL_INDEX(eqc) ((eqc)->eq_tail)
+#define EQC_PHASE(eqc) ((((eqc)->dw6) >> 16) & 0x1)
+
+#define EQC_PHASE_BIT 0x00010000
+
+/* aeqc shift */
+#define MB_AEQC_AEQE_SHIFT 12
+#define MB_AEQC_PHASE_SHIFT 16
+
+/* cqe shift */
+#define CQE_PHASE(cqe) ((cqe)->w7 & 0x1)
+
+/* eqe shift */
+#define EQE_PHASE(eqe) (((eqe)->dw0 >> 16) & 0x1)
+#define EQE_CQN(eqe) (((eqe)->dw0) & 0xffff)
+
+#define QM_EQE_CQN_MASK 0xffff
+
+/* doorbell */
+#define DOORBELL_CMD_SQ 0
+#define DOORBELL_CMD_CQ 1
+#define DOORBELL_CMD_EQ 2
+#define DOORBELL_CMD_AEQ 3
+
+#define DOORBELL_CMD_SEND_BASE 0x340
+
+#define QM_MEM_START_INIT 0x100040
+#define QM_MEM_INIT_DONE 0x100044
+#define QM_VFT_CFG_RDY 0x10006c
+#define QM_VFT_CFG_OP_WR 0x100058
+#define QM_VFT_CFG_TYPE 0x10005c
+#define QM_SQC_VFT 0x0
+#define QM_CQC_VFT 0x1
+#define QM_VFT_CFG_ADDRESS 0x100060
+#define QM_VFT_CFG_OP_ENABLE 0x100054
+
+#define QM_VFT_CFG_DATA_L 0x100064
+#define QM_VFT_CFG_DATA_H 0x100068
+#define QM_SQC_VFT_BUF_SIZE (7ULL << 8)
+#define QM_SQC_VFT_SQC_SIZE (5ULL << 12)
+#define QM_SQC_VFT_INDEX_NUMBER (1ULL << 16)
+#define QM_SQC_VFT_BT_INDEX_SHIFT 22
+#define QM_SQC_VFT_START_SQN_SHIFT 28
+#define QM_SQC_VFT_VALID (1ULL << 44)
+#define QM_CQC_VFT_BUF_SIZE (7ULL << 8)
+#define QM_CQC_VFT_SQC_SIZE (5ULL << 12)
+#define QM_CQC_VFT_INDEX_NUMBER (1ULL << 16)
+#define QM_CQC_VFT_BT_INDEX_SHIFT 22
+#define QM_CQC_VFT_VALID (1ULL << 28)
+
+
+#define QM_MK_SQC_DW3(hop_num, page_sz, buf_sz, sqe_sz) \
+ ((hop_num << SQ_HOP_NUM_SHIFT) | \
+ (page_sz << SQ_PAGE_SIZE_SHIFT) | \
+ (buf_sz << SQ_BUF_SIZE_SHIFT) | \
+ (sqe_sz << SQ_SQE_SIZE_SHIFT))
+#define QM_MK_SQC_W13(priority, orders, type) \
+ ((priority << SQ_PRIORITY_SHIFT) | \
+ (orders << SQ_ORDERS_SHIFT) | \
+ ((type & SQ_TYPE_MASK) << SQ_TYPE_SHIFT))
+#define QM_MK_CQC_DW3(hop_num, page_sz, buf_sz, sqe_sz) \
+ ((hop_num << CQ_HOP_NUM_SHIFT) | \
+ (page_sz << CQ_PAGE_SIZE_SHIFT) | \
+ (buf_sz << CQ_BUF_SIZE_SHIFT) | \
+ (sqe_sz << CQ_SQE_SIZE_SHIFT))
+#define QM_MK_CQC_DW6(phase, flag) \
+ ((phase << CQ_PHASE_SHIFT) | (flag << CQ_FLAG_SHIFT))
+
+static inline void qm_writel(struct qm_info *qm, u32 val, u32 offset)
+{
+ writel(val, qm->io_base + offset);
+}
+
+struct qm_info;
+
+struct hisi_acc_qm_hw_ops {
+ int (*vft_config)(struct qm_info *qm, u16 base, u32 number);
+};
+
+static inline int hacc_qm_mb_is_busy(struct qm_info *qm)
+{
+ u32 val;
+
+ return readl_relaxed_poll_timeout(QM_ADDR(qm, MAILBOX_CMD_SEND_BASE),
+ val, !((val >> MAILBOX_BUSY_SHIFT) & 0x1), 10, 1000);
+}
+
+static inline void qm_mb_write(struct qm_info *qm, void *src)
+{
+ void __iomem *fun_base = QM_ADDR(qm, MAILBOX_CMD_SEND_BASE);
+ unsigned long tmp0 = 0, tmp1 = 0;
+
+ asm volatile("ldp %0, %1, %3\n"
+ "stp %0, %1, %2\n"
+ "dsb sy\n"
+ : "=&r" (tmp0),
+ "=&r" (tmp1),
+ "+Q" (*((char *)fun_base))
+ : "Q" (*((char *)src))
+ : "memory");
+}
+
+static int qm_mb(struct qm_info *qm, u8 cmd, phys_addr_t phys_addr, u16 queue,
+ bool op, bool event)
+{
+ struct mailbox mailbox;
+ int ret;
+
+ dev_dbg(&qm->pdev->dev, "QM HW request to q-%u: %d-%llx\n", queue, cmd,
+ phys_addr);
+
+ mailbox.w0 = cmd |
+ (event ? 0x1 << MAILBOX_EVENT_SHIFT : 0) |
+ (op ? 0x1 << MAILBOX_OP_SHIFT : 0) |
+ (0x1 << MAILBOX_BUSY_SHIFT);
+ mailbox.queue_num = queue;
+ mailbox.base_l = lower_32_bits(phys_addr);
+ mailbox.base_h = upper_32_bits(phys_addr);
+ mailbox.rsvd = 0;
+
+ mutex_lock(&qm->mailbox_lock);
+
+ ret = hacc_qm_mb_is_busy(qm);
+ if (unlikely(ret))
+ goto out_with_lock;
+
+ qm_mb_write(qm, &mailbox);
+ ret = hacc_qm_mb_is_busy(qm);
+ if (unlikely(ret))
+ goto out_with_lock;
+
+out_with_lock:
+ mutex_unlock(&qm->mailbox_lock);
+ return ret;
+}
+
+static void qm_db(struct qm_info *qm, u16 qn, u8 cmd, u16 index, u8 priority)
+{
+ u64 doorbell = 0;
+
+ dev_dbg(&qm->pdev->dev, "doorbell(qn=%d, cmd=%d, index=%d, pri=%d)\n",
+ qn, cmd, index, priority);
+
+ doorbell = qn | (cmd << 16) | ((u64)((index | (priority << 16)))) << 32;
+ writeq(doorbell, QM_ADDR(qm, DOORBELL_CMD_SEND_BASE));
+}
+
+/* @return 0 - cq/eq event, 1 - async event, 2 - abnormal error */
+static u32 qm_get_irq_source(struct qm_info *qm)
+{
+ return readl(QM_ADDR(qm, QM_VF_EQ_INT_SOURCE));
+}
+
+static inline struct hisi_qp *to_hisi_qp(struct qm_info *qm, struct eqe *eqe)
+{
+ u16 cqn = eqe->dw0 & QM_EQE_CQN_MASK;
+ struct hisi_qp *qp;
+
+ read_lock(&qm->qps_lock);
+ qp = qm->qp_array[cqn];
+ read_unlock(&qm->qps_lock);
+
+ return qp;
+}
+
+static inline void qm_cq_head_update(struct hisi_qp *qp)
+{
+ if (qp->qp_status.cq_head == QM_Q_DEPTH - 1) {
+ qp->cqc->dw6 = qp->cqc->dw6 ^ CQC_PHASE_BIT;
+ qp->qp_status.cq_head = 0;
+ } else {
+ qp->qp_status.cq_head++;
+ }
+}
+
+static inline void qm_poll_qp(struct hisi_qp *qp, struct qm_info *qm)
+{
+ struct cqe *cqe;
+
+ cqe = qp->cqe + qp->qp_status.cq_head;
+
+ if (qp->req_cb) {
+ while (CQE_PHASE(cqe) == CQC_PHASE(qp->cqc)) {
+ dma_rmb();
+ qp->req_cb(qp, (unsigned long)(qp->sqe +
+ qm->sqe_size * cqe->sq_head));
+ qm_cq_head_update(qp);
+ cqe = qp->cqe + qp->qp_status.cq_head;
+ }
+ } else if (qp->event_cb) {
+ qp->event_cb(qp);
+ qm_cq_head_update(qp);
+ cqe = qp->cqe + qp->qp_status.cq_head;
+ }
+
+ qm_db(qm, qp->queue_id, DOORBELL_CMD_CQ, qp->qp_status.cq_head, 0);
+ qm_db(qm, qp->queue_id, DOORBELL_CMD_CQ, qp->qp_status.cq_head, 1);
+}
+
+static irqreturn_t qm_irq_thread(int irq, void *data)
+{
+ struct qm_info *qm = data;
+ struct eqe *eqe = qm->eqe + qm->eq_head;
+ struct eqc *eqc = qm->eqc;
+ struct hisi_qp *qp;
+
+ while (EQE_PHASE(eqe) == EQC_PHASE(eqc)) {
+ qp = to_hisi_qp(qm, eqe);
+ if (qp)
+ qm_poll_qp(qp, qm);
+
+ if (qm->eq_head == QM_Q_DEPTH - 1) {
+ eqc->dw6 = eqc->dw6 ^ EQC_PHASE_BIT;
+ eqe = qm->eqe;
+ qm->eq_head = 0;
+ } else {
+ eqe++;
+ qm->eq_head++;
+ }
+ }
+
+ qm_db(qm, 0, DOORBELL_CMD_EQ, qm->eq_head, 0);
+
+ return IRQ_HANDLED;
+}
+
+static void qm_init_qp_status(struct hisi_qp *qp)
+{
+ struct hisi_acc_qp_status *qp_status = &qp->qp_status;
+
+ qp_status->sq_tail = 0;
+ qp_status->sq_head = 0;
+ qp_status->cq_head = 0;
+ qp_status->sqn = 0;
+ qp_status->cqc_phase = 1;
+ qp_status->is_sq_full = 0;
+}
+
+/* check if bit in regs is 1 */
+static inline int qm_reg_wait_bit(struct qm_info *qm, u32 offset, u32 bit)
+{
+ int val;
+
+ return readl_relaxed_poll_timeout(QM_ADDR(qm, offset), val,
+ val & BIT(bit), 10, 1000);
+}
+
+/* the config should be conducted after hisi_acc_init_qm_mem() */
+static int qm_vft_common_config(struct qm_info *qm, u16 base, u32 number)
+{
+ u64 tmp;
+ int ret;
+
+ ret = qm_reg_wait_bit(qm, QM_VFT_CFG_RDY, 0);
+ if (ret)
+ return ret;
+ qm_writel(qm, 0x0, QM_VFT_CFG_OP_WR);
+ qm_writel(qm, QM_SQC_VFT, QM_VFT_CFG_TYPE);
+ qm_writel(qm, qm->pdev->devfn, QM_VFT_CFG_ADDRESS);
+
+ tmp = QM_SQC_VFT_BUF_SIZE |
+ QM_SQC_VFT_SQC_SIZE |
+ QM_SQC_VFT_INDEX_NUMBER |
+ QM_SQC_VFT_VALID |
+ (u64)base << QM_SQC_VFT_START_SQN_SHIFT;
+
+ qm_writel(qm, tmp & 0xffffffff, QM_VFT_CFG_DATA_L);
+ qm_writel(qm, tmp >> 32, QM_VFT_CFG_DATA_H);
+
+ qm_writel(qm, 0x0, QM_VFT_CFG_RDY);
+ qm_writel(qm, 0x1, QM_VFT_CFG_OP_ENABLE);
+ ret = qm_reg_wait_bit(qm, QM_VFT_CFG_RDY, 0);
+ if (ret)
+ return ret;
+ tmp = 0;
+
+ qm_writel(qm, 0x0, QM_VFT_CFG_OP_WR);
+ qm_writel(qm, QM_CQC_VFT, QM_VFT_CFG_TYPE);
+ qm_writel(qm, qm->pdev->devfn, QM_VFT_CFG_ADDRESS);
+
+ tmp = QM_CQC_VFT_BUF_SIZE |
+ QM_CQC_VFT_SQC_SIZE |
+ QM_CQC_VFT_INDEX_NUMBER |
+ QM_CQC_VFT_VALID;
+
+ qm_writel(qm, tmp & 0xffffffff, QM_VFT_CFG_DATA_L);
+ qm_writel(qm, tmp >> 32, QM_VFT_CFG_DATA_H);
+
+ qm_writel(qm, 0x0, QM_VFT_CFG_RDY);
+ qm_writel(qm, 0x1, QM_VFT_CFG_OP_ENABLE);
+ ret = qm_reg_wait_bit(qm, QM_VFT_CFG_RDY, 0);
+ if (ret)
+ return ret;
+ return 0;
+}
+
+/*
+ * v1: For Hi1620ES
+ * v2: For Hi1620CS (Not implemented yet)
+ */
+static struct hisi_acc_qm_hw_ops qm_hw_ops_v1 = {
+ .vft_config = qm_vft_common_config,
+};
+
+struct hisi_qp *hisi_qm_create_qp(struct qm_info *qm, u8 alg_type)
+{
+ struct hisi_qp *qp;
+ int qp_index;
+ int ret;
+
+ write_lock(&qm->qps_lock);
+ qp_index = find_first_zero_bit(qm->qp_bitmap, qm->qp_num);
+ if (qp_index >= qm->qp_num) {
+ write_unlock(&qm->qps_lock);
+ return ERR_PTR(-EBUSY);
+ }
+
+ qp = kzalloc(sizeof(*qp), GFP_KERNEL);
+ if (!qp) {
+ ret = -ENOMEM;
+ write_unlock(&qm->qps_lock);
+ goto err_with_bitset;
+ }
+
+ qp->queue_id = qp_index;
+ qp->qm = qm;
+ qp->alg_type = alg_type;
+ qm_init_qp_status(qp);
+ set_bit(qp_index, qm->qp_bitmap);
+
+ write_unlock(&qm->qps_lock);
+ return qp;
+
+err_with_bitset:
+ write_unlock(&qm->qps_lock);
+ return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_GPL(hisi_qm_create_qp);
+
+int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long arg)
+{
+ struct qm_info *qm = qp->qm;
+ struct device *dev = &qm->pdev->dev;
+ int ret;
+ struct sqc *sqc;
+ struct cqc *cqc;
+ int qp_index = qp->queue_id;
+ int pasid = arg;
+ size_t off = 0;
+
+#define QP_INIT_BUF(qp, type, size) do { \
+ (qp)->type = (struct type *)((void *)(qp)->qdma.va + (off)); \
+ (qp)->type##_dma = (qp)->qdma.dma + (off); \
+ off += size; \
+} while (0)
+
+ sqc = qp->sqc = qm->sqc + qp_index;
+ cqc = qp->cqc = qm->cqc + qp_index;
+ qp->sqc_dma = qm->sqc_dma + qp_index * sizeof(struct sqc);
+ qp->cqc_dma = qm->cqc_dma + qp_index * sizeof(struct cqc);
+
+ qp->qdma.size = qm->sqe_size * QM_Q_DEPTH +
+ sizeof(struct cqe) * QM_Q_DEPTH,
+ qp->qdma.va = dma_alloc_coherent(dev, qp->qdma.size,
+ &qp->qdma.dma,
+ GFP_KERNEL | __GFP_ZERO);
+ dev_dbg(dev, "allocate qp dma buf(va=%p, dma=%pad, size=%lx)\n",
+ qp->qdma.va, &qp->qdma.dma, qp->qdma.size);
+
+ if (!qp->qdma.va) {
+ dev_err(dev, "cannot get qm dma buffer\n");
+ return -ENOMEM;
+ }
+
+ QP_INIT_BUF(qp, sqe, qm->sqe_size * QM_Q_DEPTH);
+ QP_INIT_BUF(qp, cqe, sizeof(struct cqe) * QM_Q_DEPTH);
+
+ INIT_QC(sqc, qp->sqe_dma);
+ sqc->pasid = pasid;
+ sqc->dw3 = QM_MK_SQC_DW3(0, 0, 0, ilog2(qm->sqe_size));
+ sqc->cq_num = qp_index;
+ sqc->w13 = QM_MK_SQC_W13(0, 1, qp->alg_type);
+
+ ret = qm_mb(qm, MAILBOX_CMD_SQC, qp->sqc_dma, qp_index, 0, 0);
+ if (ret)
+ return ret;
+
+ INIT_QC(cqc, qp->cqe_dma);
+ cqc->dw3 = QM_MK_CQC_DW3(0, 0, 0, 4);
+ cqc->dw6 = QM_MK_CQC_DW6(1, 1);
+ ret = qm_mb(qm, MAILBOX_CMD_CQC, (u64)qp->cqc_dma, qp_index, 0, 0);
+ if (ret)
+ return ret;
+
+ write_lock(&qm->qps_lock);
+ qm->qp_array[qp_index] = qp;
+ init_completion(&qp->completion);
+ write_unlock(&qm->qps_lock);
+
+ dev_dbg(&qm->pdev->dev, "qp %d started\n", qp_index);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(hisi_qm_start_qp);
+
+void hisi_qm_release_qp(struct hisi_qp *qp)
+{
+ struct qm_info *qm = qp->qm;
+ struct qm_dma *qdma = &qp->qdma;
+ struct device *dev = &qm->pdev->dev;
+ int qid = qp->queue_id;
+
+ write_lock(&qm->qps_lock);
+ qm->qp_array[qp->queue_id] = NULL;
+ bitmap_clear(qm->qp_bitmap, qid, 1);
+ write_unlock(&qm->qps_lock);
+
+ dma_free_coherent(dev, qdma->size, qdma->va, qdma->dma);
+
+ kfree(qp);
+}
+EXPORT_SYMBOL_GPL(hisi_qm_release_qp);
+
+static void *qm_get_avail_sqe(struct hisi_qp *qp)
+{
+ struct hisi_acc_qp_status *qp_status = &qp->qp_status;
+ u16 sq_tail = qp_status->sq_tail;
+
+ if (qp_status->is_sq_full == 1)
+ return NULL;
+
+ return qp->sqe + sq_tail * qp->qm->sqe_size;
+}
+
+int hisi_qp_send(struct hisi_qp *qp, void *msg)
+{
+ struct hisi_acc_qp_status *qp_status = &qp->qp_status;
+ u16 sq_tail = qp_status->sq_tail;
+ u16 sq_tail_next = (sq_tail + 1) % QM_Q_DEPTH;
+ unsigned long timeout = 100;
+ void *sqe = qm_get_avail_sqe(qp);
+
+ if (!sqe)
+ return -EBUSY;
+
+ memcpy(sqe, msg, qp->qm->sqe_size);
+
+ qm_db(qp->qm, qp->queue_id, DOORBELL_CMD_SQ, sq_tail_next, 0);
+
+ /* wait until job finished */
+ wait_for_completion_timeout(&qp->completion, timeout);
+
+ qp_status->sq_tail = sq_tail_next;
+
+ if (qp_status->sq_tail == qp_status->sq_head)
+ qp_status->is_sq_full = 1;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(hisi_qp_send);
+
+static irqreturn_t qm_irq(int irq, void *data)
+{
+ struct qm_info *qm = data;
+ u32 int_source;
+
+ int_source = qm_get_irq_source(qm);
+ if (int_source)
+ return IRQ_WAKE_THREAD;
+
+ dev_err(&qm->pdev->dev, "invalid int source %d\n", int_source);
+
+ return IRQ_HANDLED;
+}
+
+/* put qm into init state, so the acce config become available */
+static int hisi_qm_mem_start(struct qm_info *qm)
+{
+ u32 val;
+
+ qm_writel(qm, 0x1, QM_MEM_START_INIT);
+ return readl_relaxed_poll_timeout(QM_ADDR(qm, QM_MEM_INIT_DONE), val,
+ val & BIT(0), 10, 1000);
+}
+
+/* todo: The VF case is not considerred carefullly */
+int hisi_qm_init(struct qm_info *qm)
+{
+ int ret;
+ u16 ecam_val16;
+ struct pci_dev *pdev = qm->pdev;
+ struct device *dev = &pdev->dev;
+
+ pci_set_power_state(pdev, PCI_D0);
+ ecam_val16 = PCI_COMMAND_MASTER | PCI_COMMAND_MEMORY;
+ pci_write_config_word(pdev, PCI_COMMAND, ecam_val16);
+
+ ret = pci_enable_device_mem(pdev);
+ if (ret < 0) {
+ dev_err(dev, "Can't enable device mem!\n");
+ return ret;
+ }
+
+ ret = pci_request_mem_regions(pdev, dev_name(dev));
+ if (ret < 0) {
+ dev_err(dev, "Can't request mem regions!\n");
+ goto err_with_pcidev;
+ }
+
+ qm->phys_base = pci_resource_start(pdev, 2);
+ qm->size = pci_resource_len(qm->pdev, 2);
+ qm->io_base = devm_ioremap(dev, qm->phys_base, qm->size);
+ if (!qm->io_base) {
+ dev_err(dev, "Map IO space fail!\n");
+ ret = -EIO;
+ goto err_with_mem_regions;
+ }
+
+ dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
+ pci_set_master(pdev);
+
+ ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSI);
+ if (ret < 0) {
+ dev_err(dev, "Enable MSI vectors fail!\n");
+ goto err_with_mem_regions;
+ }
+
+ qm->eq_head = 0;
+ mutex_init(&qm->mailbox_lock);
+ rwlock_init(&qm->qps_lock);
+
+ if (qm->ver == 1)
+ qm->ops = &qm_hw_ops_v1;
+ else {
+ dev_err(dev, "qm version not support %d\n", qm->ver);
+ return -EINVAL;
+ }
+
+ ret = devm_request_threaded_irq(dev, pci_irq_vector(pdev, 0),
+ qm_irq, qm_irq_thread, IRQF_SHARED,
+ dev_name(dev), qm);
+ if (ret)
+ goto err_with_irq_vec;
+
+ qm->qp_bitmap = devm_kcalloc(dev, BITS_TO_LONGS(qm->qp_num),
+ sizeof(long), GFP_KERNEL);
+ qm->qp_array = devm_kcalloc(dev, qm->qp_num,
+ sizeof(struct hisi_qp *), GFP_KERNEL);
+ if (!qm->qp_bitmap || !qm->qp_array) {
+ ret = -ENOMEM;
+ goto err_with_irq;
+ }
+
+ if (pdev->is_physfn) {
+ ret = hisi_qm_mem_start(qm);
+ if (ret) {
+ dev_err(dev, "mem start fail\n");
+ goto err_with_irq;
+ }
+ }
+
+ qm->qdma.size = max_t(size_t, sizeof(struct eqc),
+ sizeof(struct aeqc)) +
+ sizeof(struct eqe) * QM_Q_DEPTH +
+ sizeof(struct sqc) * qm->qp_num +
+ sizeof(struct cqc) * qm->qp_num;
+ qm->qdma.va = dma_alloc_coherent(dev, qm->qdma.size,
+ &qm->qdma.dma,
+ GFP_KERNEL | __GFP_ZERO);
+ dev_dbg(dev, "allocate qm dma buf(va=%p, dma=%pad, size=%lx)\n",
+ qm->qdma.va, &qm->qdma.dma, qm->qdma.size);
+ ret = qm->qdma.va ? 0 : -ENOMEM;
+
+ if (ret)
+ goto err_with_irq;
+
+ return 0;
+
+err_with_irq:
+ /* even for devm, it should be removed for the irq vec to be freed */
+ devm_free_irq(dev, pci_irq_vector(pdev, 0), qm);
+err_with_irq_vec:
+ pci_free_irq_vectors(pdev);
+err_with_mem_regions:
+ pci_release_mem_regions(pdev);
+err_with_pcidev:
+ pci_disable_device(pdev);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(hisi_qm_init);
+
+void hisi_qm_uninit(struct qm_info *qm)
+{
+ struct pci_dev *pdev = qm->pdev;
+
+ dma_free_coherent(&pdev->dev, qm->qdma.size, qm->qdma.va, qm->qdma.dma);
+
+ devm_free_irq(&pdev->dev, pci_irq_vector(pdev, 0), qm);
+ pci_free_irq_vectors(pdev);
+ pci_release_mem_regions(pdev);
+ pci_disable_device(pdev);
+}
+EXPORT_SYMBOL_GPL(hisi_qm_uninit);
+
+int hisi_qm_start(struct qm_info *qm)
+{
+ size_t off = 0;
+ int ret;
+
+#define QM_INIT_BUF(qm, type, size) do { \
+ (qm)->type = (struct type *)((void *)(qm)->qdma.va + (off)); \
+ (qm)->type##_dma = (qm)->qdma.dma + (off); \
+ off += size; \
+} while (0)
+
+ if (!qm->qdma.va)
+ return -EINVAL;
+
+ if (qm->pdev->is_physfn)
+ qm->ops->vft_config(qm, qm->qp_base, qm->qp_num);
+
+ /*
+ * notes: the order is important because the buffer should be stay in
+ * alignment boundary
+ */
+ QM_INIT_BUF(qm, eqe, sizeof(struct eqe) * QM_Q_DEPTH);
+ QM_INIT_BUF(qm, sqc, sizeof(struct sqc) * qm->qp_num);
+ QM_INIT_BUF(qm, cqc, sizeof(struct cqc) * qm->qp_num);
+ QM_INIT_BUF(qm, eqc,
+ max_t(size_t, sizeof(struct eqc), sizeof(struct aeqc)));
+
+ qm->eqc->base_l = lower_32_bits(qm->eqe_dma);
+ qm->eqc->base_h = upper_32_bits(qm->eqe_dma);
+ qm->eqc->dw3 = 2 << MB_EQC_EQE_SHIFT;
+ qm->eqc->dw6 = (QM_Q_DEPTH - 1) | (1 << MB_EQC_PHASE_SHIFT);
+ ret = qm_mb(qm, MAILBOX_CMD_EQC, qm->eqc_dma, 0, 0, 0);
+ if (ret)
+ return ret;
+
+ ret = qm_mb(qm, MAILBOX_CMD_SQC_BT, qm->sqc_dma, 0, 0, 0);
+ if (ret)
+ return ret;
+
+ ret = qm_mb(qm, MAILBOX_CMD_CQC_BT, qm->cqc_dma, 0, 0, 0);
+ if (ret)
+ return ret;
+
+ writel(0x0, QM_ADDR(qm, QM_VF_EQ_INT_MASK));
+
+ dev_dbg(&qm->pdev->dev, "qm started\n");
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(hisi_qm_start);
+
+
+void hisi_qm_stop(struct qm_info *qm)
+{
+ /* todo: recheck if this is the right way to disable the hw irq */
+ writel(0x1, QM_ADDR(qm, QM_VF_EQ_INT_MASK));
+
+}
+EXPORT_SYMBOL_GPL(hisi_qm_stop);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Zhou Wang <[email protected]>");
+MODULE_DESCRIPTION("HiSilicon Accelerator queue manager driver");
diff --git a/drivers/crypto/hisilicon/qm.h b/drivers/crypto/hisilicon/qm.h
new file mode 100644
index 000000000000..6d124d948738
--- /dev/null
+++ b/drivers/crypto/hisilicon/qm.h
@@ -0,0 +1,213 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+#ifndef HISI_ACC_QM_H
+#define HISI_ACC_QM_H
+
+#include <linux/dmapool.h>
+#include <linux/iopoll.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+#include "qm_usr_if.h"
+
+/* qm user domain */
+#define QM_ARUSER_M_CFG_1 0x100088
+#define QM_ARUSER_M_CFG_ENABLE 0x100090
+#define QM_AWUSER_M_CFG_1 0x100098
+#define QM_AWUSER_M_CFG_ENABLE 0x1000a0
+#define QM_WUSER_M_CFG_ENABLE 0x1000a8
+
+/* qm cache */
+#define QM_CACHE_CTL 0x100050
+#define QM_AXI_M_CFG 0x1000ac
+#define QM_AXI_M_CFG_ENABLE 0x1000b0
+#define QM_PEH_AXUSER_CFG 0x1000cc
+#define QM_PEH_AXUSER_CFG_ENABLE 0x1000d0
+
+struct eqe {
+ __le32 dw0;
+};
+
+struct aeqe {
+ __le32 dw0;
+};
+
+struct sqc {
+ __le16 head;
+ __le16 tail;
+ __le32 base_l;
+ __le32 base_h;
+ __le32 dw3;
+ __le16 qes;
+ __le16 rsvd0;
+ __le16 pasid;
+ __le16 w11;
+ __le16 cq_num;
+ __le16 w13;
+ __le32 rsvd1;
+};
+
+struct cqc {
+ __le16 head;
+ __le16 tail;
+ __le32 base_l;
+ __le32 base_h;
+ __le32 dw3;
+ __le16 qes;
+ __le16 rsvd0;
+ __le16 pasid;
+ __le16 w11;
+ __le32 dw6;
+ __le32 rsvd1;
+};
+
+#define INIT_QC(qc, base) do { \
+ (qc)->head = 0; \
+ (qc)->tail = 0; \
+ (qc)->base_l = lower_32_bits((unsigned long)base); \
+ (qc)->base_h = upper_32_bits((unsigned long)base); \
+ (qc)->pasid = 0; \
+ (qc)->w11 = 0; \
+ (qc)->rsvd1 = 0; \
+ (qc)->qes = QM_Q_DEPTH - 1; \
+} while (0)
+
+struct eqc {
+ __le16 head;
+ __le16 tail;
+ __le32 base_l;
+ __le32 base_h;
+ __le32 dw3;
+ __le32 rsvd[2];
+ __le32 dw6;
+};
+
+struct aeqc {
+ __le16 head;
+ __le16 tail;
+ __le32 base_l;
+ __le32 base_h;
+ __le32 rsvd[3];
+ __le32 dw6;
+};
+
+struct mailbox {
+ __le16 w0;
+ __le16 queue_num;
+ __le32 base_l;
+ __le32 base_h;
+ __le32 rsvd;
+};
+
+struct doorbell {
+ __le16 queue_num;
+ __le16 cmd;
+ __le16 index;
+ __le16 priority;
+};
+
+struct qm_dma {
+ void *va;
+ dma_addr_t dma;
+ size_t size;
+};
+
+struct qm_info {
+ int ver;
+ struct pci_dev *pdev;
+
+ resource_size_t phys_base;
+ resource_size_t size;
+ void __iomem *io_base;
+
+ u32 sqe_size;
+ u32 qp_base;
+ u32 qp_num;
+
+ struct qm_dma qdma;
+ struct sqc *sqc;
+ struct cqc *cqc;
+ struct eqc *eqc;
+ struct eqe *eqe;
+ struct aeqc *aeqc;
+ struct aeqe *aeqe;
+ unsigned long sqc_dma,
+ cqc_dma,
+ eqc_dma,
+ eqe_dma,
+ aeqc_dma,
+ aeqe_dma;
+
+ u32 eq_head;
+
+ rwlock_t qps_lock;
+ unsigned long *qp_bitmap;
+ struct hisi_qp **qp_array;
+
+ struct mutex mailbox_lock;
+
+ struct hisi_acc_qm_hw_ops *ops;
+};
+#define QM_ADDR(qm, off) ((qm)->io_base + off)
+
+struct hisi_acc_qp_status {
+ u16 sq_tail;
+ u16 sq_head;
+ u16 cq_head;
+ u16 sqn;
+ bool cqc_phase;
+ int is_sq_full;
+};
+
+struct hisi_qp;
+
+struct hisi_qp_ops {
+ int (*fill_sqe)(void *sqe, void *q_parm, void *d_parm);
+};
+
+struct hisi_qp {
+ /* sq number in this function */
+ u32 queue_id;
+ u8 alg_type;
+ u8 req_type;
+ int pasid;
+
+ struct qm_dma qdma;
+ struct sqc *sqc;
+ struct cqc *cqc;
+ void *sqe;
+ struct cqe *cqe;
+
+ unsigned long sqc_dma,
+ cqc_dma,
+ sqe_dma,
+ cqe_dma;
+
+ struct hisi_acc_qp_status qp_status;
+
+ struct qm_info *qm;
+
+ /* for crypto sync API */
+ struct completion completion;
+
+ struct hisi_qp_ops *hw_ops;
+ void *qp_ctx;
+ void (*event_cb)(struct hisi_qp *qp);
+ void (*req_cb)(struct hisi_qp *qp, unsigned long data);
+};
+
+/* QM external interface for accelerator driver.
+ * To use qm:
+ * 1. Set qm with pdev, and sqe_size set accordingly
+ * 2. hisi_qm_init()
+ * 3. config the accelerator hardware
+ * 4. hisi_qm_start()
+ */
+extern int hisi_qm_init(struct qm_info *qm);
+extern void hisi_qm_uninit(struct qm_info *qm);
+extern int hisi_qm_start(struct qm_info *qm);
+extern void hisi_qm_stop(struct qm_info *qm);
+extern struct hisi_qp *hisi_qm_create_qp(struct qm_info *qm, u8 alg_type);
+extern int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long arg);
+extern void hisi_qm_release_qp(struct hisi_qp *qp);
+extern int hisi_qp_send(struct hisi_qp *qp, void *msg);
+#endif
diff --git a/drivers/crypto/hisilicon/qm_usr_if.h b/drivers/crypto/hisilicon/qm_usr_if.h
new file mode 100644
index 000000000000..13aab94adb05
--- /dev/null
+++ b/drivers/crypto/hisilicon/qm_usr_if.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+#ifndef HISI_QM_USR_IF_H
+#define HISI_QM_USR_IF_H
+
+#define QM_CQE_SIZE 16
+
+/* default queue depth for sq/cq/eq */
+#define QM_Q_DEPTH 1024
+
+/* page number for queue file region */
+#define QM_DOORBELL_PAGE_NR 1
+#define QM_DKO_PAGE_NR 3
+#define QM_DUS_PAGE_NR 36
+
+#define QM_DOORBELL_PG_START 0
+#define QM_DKO_PAGE_START (QM_DOORBELL_PG_START + QM_DOORBELL_PAGE_NR)
+#define QM_DUS_PAGE_START (QM_DKO_PAGE_START + QM_DKO_PAGE_NR)
+#define QM_SS_PAGE_START (QM_DUS_PAGE_START + QM_DUS_PAGE_NR)
+
+#define QM_DOORBELL_OFFSET 0x340
+
+struct cqe {
+ __le32 rsvd0;
+ __le16 cmd_id;
+ __le16 rsvd1;
+ __le16 sq_head;
+ __le16 sq_num;
+ __le16 rsvd2;
+ __le16 w7;
+};
+
+#endif
--
2.17.1

2018-11-19 21:11:23

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 05:19:10PM +0800, Kenneth Lee wrote:
> On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> > Date: Mon, 19 Nov 2018 17:14:05 +0800
> > From: Kenneth Lee <[email protected]>
> > To: Leon Romanovsky <[email protected]>
> > CC: Tim Sell <[email protected]>, [email protected],
> > Alexander Shishkin <[email protected]>, Zaibo Xu
> > <[email protected]>, [email protected], [email protected],
> > [email protected], Christoph Lameter <[email protected]>, Hao Fang
> > <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> > list <[email protected]>, Vinod Koul <[email protected]>, Jason
> > Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> > Kleine-König <[email protected]>, David Kershner
> > <[email protected]>, Kenneth Lee <[email protected]>, Johan
> > Hovold <[email protected]>, Cyrille Pitchen
> > <[email protected]>, Sagar Dharia
> > <[email protected]>, Jens Axboe <[email protected]>,
> > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > <[email protected]>, [email protected], Zhou Wang
> > <[email protected]>, [email protected], Philippe
> > Ombredanne <[email protected]>, Sanyog Kale <[email protected]>,
> > "David S. Miller" <[email protected]>,
> > [email protected]
> > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > User-Agent: Mutt/1.5.21 (2010-09-15)
> > Message-ID: <20181119091405.GE157308@Turing-Arch-b>
> >
> > On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote:
> > > Date: Thu, 15 Nov 2018 16:54:55 +0200
> > > From: Leon Romanovsky <[email protected]>
> > > To: Kenneth Lee <[email protected]>
> > > CC: Kenneth Lee <[email protected]>, Tim Sell <[email protected]>,
> > > [email protected], Alexander Shishkin
> > > <[email protected]>, Zaibo Xu <[email protected]>,
> > > [email protected], [email protected], [email protected],
> > > Christoph Lameter <[email protected]>, Hao Fang <[email protected]>, Gavin
> > > Schenk <[email protected]>, RDMA mailing list
> > > <[email protected]>, Zhou Wang <[email protected]>, Jason
> > > Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> > > Kleine-König <[email protected]>, David Kershner
> > > <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> > > Pitchen <[email protected]>, Sagar Dharia
> > > <[email protected]>, Jens Axboe <[email protected]>,
> > > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > > <[email protected]>, [email protected], Vinod Koul
> > > <[email protected]>, [email protected], Philippe Ombredanne
> > > <[email protected]>, Sanyog Kale <[email protected]>, "David S.
> > > Miller" <[email protected]>, [email protected]
> > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > User-Agent: Mutt/1.10.1 (2018-07-13)
> > > Message-ID: <[email protected]>
> > >
> > > On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote:
> > > > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote:
> > > > > Date: Wed, 14 Nov 2018 18:00:17 +0200
> > > > > From: Leon Romanovsky <[email protected]>
> > > > > To: Kenneth Lee <[email protected]>
> > > > > CC: Tim Sell <[email protected]>, [email protected],
> > > > > Alexander Shishkin <[email protected]>, Zaibo Xu
> > > > > <[email protected]>, [email protected], [email protected],
> > > > > [email protected], Christoph Lameter <[email protected]>, Hao Fang
> > > > > <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> > > > > list <[email protected]>, Zhou Wang <[email protected]>,
> > > > > Jason Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> > > > > Kleine-König <[email protected]>, David Kershner
> > > > > <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> > > > > Pitchen <[email protected]>, Sagar Dharia
> > > > > <[email protected]>, Jens Axboe <[email protected]>,
> > > > > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > > > > <[email protected]>, [email protected], Vinod Koul
> > > > > <[email protected]>, [email protected], Philippe Ombredanne
> > > > > <[email protected]>, Sanyog Kale <[email protected]>, Kenneth Lee
> > > > > <[email protected]>, "David S. Miller" <[email protected]>,
> > > > > [email protected]
> > > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > > > User-Agent: Mutt/1.10.1 (2018-07-13)
> > > > > Message-ID: <[email protected]>
> > > > >
> > > > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote:
> > > > > >
> > > > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道:
> > > > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote:
> > > > > > > > From: Kenneth Lee <[email protected]>
> > > > > > > >
> > > > > > > > WarpDrive is a general accelerator framework for the user application to
> > > > > > > > access the hardware without going through the kernel in data path.
> > > > > > > >
> > > > > > > > The kernel component to provide kernel facility to driver for expose the
> > > > > > > > user interface is called uacce. It a short name for
> > > > > > > > "Unified/User-space-access-intended Accelerator Framework".
> > > > > > > >
> > > > > > > > This patch add document to explain how it works.
> > > > > > > + RDMA and netdev folks
> > > > > > >
> > > > > > > Sorry, to be late in the game, I don't see other patches, but from
> > > > > > > the description below it seems like you are reinventing RDMA verbs
> > > > > > > model. I have hard time to see the differences in the proposed
> > > > > > > framework to already implemented in drivers/infiniband/* for the kernel
> > > > > > > space and for the https://github.com/linux-rdma/rdma-core/ for the user
> > > > > > > space parts.
> > > > > >
> > > > > > Thanks Leon,
> > > > > >
> > > > > > Yes, we tried to solve similar problem in RDMA. We also learned a lot from
> > > > > > the exist code of RDMA. But we we have to make a new one because we cannot
> > > > > > register accelerators such as AI operation, encryption or compression to the
> > > > > > RDMA framework:)
> > > > >
> > > > > Assuming that you did everything right and still failed to use RDMA
> > > > > framework, you was supposed to fix it and not to reinvent new exactly
> > > > > same one. It is how we develop kernel, by reusing existing code.
> > > >
> > > > Yes, but we don't force other system such as NIC or GPU into RDMA, do we?
> > >
> > > You don't introduce new NIC or GPU, but proposing another interface to
> > > directly access HW memory and bypass kernel for the data path. This is
> > > whole idea of RDMA and this is why it is already present in the kernel.
> > >
> > > Various hardware devices are supported in our stack allow a ton of crazy
> > > stuff, including GPUs interconnections and NIC functionalities.
> >
> > Yes. We don't want to invent new wheel. That is why we did it behind VFIO in RFC
> > v1 and v2. But finally we were persuaded by Mr. Jerome Glisse that VFIO was not
> > a good place to solve the problem.

I saw a couple of his responses, he constantly said to you that you are
reinventing the wheel.
https://lore.kernel.org/lkml/[email protected]/

> >
> > And currently, as you see, IB is bound with devices doing RDMA. The register
> > function, ib_register_device() hint that it is a netdev (get_netdev() callback), it know
> > about gid, pkey, and Memory Window. IB is not simply a address space management
> > framework. And verbs to IB are not transparent. If we start to add
> > compression/decompression, AI (RNN, CNN stuff) operations, and encryption/decryption
> > to the verbs set. It will become very complexity. Or maybe I misunderstand the
> > IB idea? But I don't see compression hardware is integrated in the mainline
> > Kernel. Could you directly point out which one I can used as a reference?
> >

I strongly advise you to read the code, not all drivers are implementing
gids, pkeys and get_netdev() callback.

Yes, you are misunderstanding drivers/infiniband subsystem. We have
plenty options to expose APIs to the user space applications, starting
from standard verbs API and ending with private objects which are
understandable by specific device/driver.

IB stack provides secure FD to access device, by creating context,
after that you can send direct commands to the FW (see mlx5 DEVX
or hfi1) in sane way.

So actually, you will need to register your device, declare your own
set of objects (similar to mlx5 include/uapi/rdma/mlx5_user_ioctl_*.h).

In regards to reference of compression hardware, I don't have.
But there is an example of how T10-DIF can be implemented in verbs
layer:
https://www.openfabrics.org/images/2018workshop/presentations/307_TOved_T10-DIFOffload.pdf
Or IPsec crypto:
https://www.spinics.net/lists/linux-rdma/msg48906.html

> > >
> > > >
> > > > I assume you would not agree to register a zip accelerator to infiniband? :)
> > >
> > > "infiniband" name in the "drivers/infiniband/" is legacy one and the
> > > current code supports IB, RoCE, iWARP and OmniPath as a transport layers.
> > > For a lone time, we wanted to rename that folder to be "drivers/rdma",
> > > but didn't find enough brave men/women to do it, due to backport mess
> > > for such move.
> > >
> > > The addition of zip accelerator to RDMA is possible and depends on how
> > > you will model such new functionality - new driver, or maybe new ULP.
> > >
> > > >
> > > > Further, I don't think it is wise to break an exist system (RDMA) to fulfill a
> > > > totally new scenario. The better choice is to let them run in parallel for some
> > > > time and try to merge them accordingly.
> > >
> > > Awesome, so please run your code out-of-tree for now and once you are ready
> > > for submission let's try to merge it.
> >
> > Yes, yes. We know trust need time to gain. But the fact is that there is no
> > accelerator user driver can be added to mainline kernel. We should raise the
> > topic time to time. So to help the communication to fix the gap, right?
> >
> > We are also opened to cooperate with IB to do it within the IB framework. But
> > please let me know where to start. I feel it is quite wired to make a
> > ib_register_device for a zip or RSA accelerator.

Most of ib_ prefixes in drivers/infinband/ are legacy names. You can
rename them to be rdma_register_device() if it helps.

So from implementation point of view, as I wrote above.
Create minimal driver to register, expose MR to user space, add your own
objects and capabilities through our new KABI and implement user space part
in github.com/linux-rdma/rdma-core.

> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > Another problem we tried to address is the way to pin the memory for dma
> > > > > > operation. The RDMA way to pin the memory cannot avoid the page lost due to
> > > > > > copy-on-write operation during the memory is used by the device. This may
> > > > > > not be important to RDMA library. But it is important to accelerator.
> > > > >
> > > > > Such support exists in drivers/infiniband/ from late 2014 and
> > > > > it is called ODP (on demand paging).
> > > >
> > > > I reviewed ODP and I think it is a solution bound to infiniband. It is part of
> > > > MR semantics and required a infiniband specific hook
> > > > (ucontext->invalidate_range()). And the hook requires the device to be able to
> > > > stop using the page for a while for the copying. It is ok for infiniband
> > > > (actually, only mlx5 uses it). I don't think most accelerators can support
> > > > this mode. But WarpDrive works fully on top of IOMMU interface, it has no this
> > > > limitation.
> > >
> > > 1. It has nothing to do with infiniband.
> >
> > But it must be a ib_dev first.

It is just a name.

> >
> > > 2. MR and uncontext are verbs semantics and needed to ensure that host
> > > memory exposed to user is properly protected from security point of view.
> > > 3. "stop using the page for a while for the copying" - I'm not fully
> > > understand this claim, maybe this article will help you to better
> > > describe : https://lwn.net/Articles/753027/
> >
> > This topic was being discussed in RFCv2. The key problem here is that:
> >
> > The device need to hold the memory for its own calculation, but the CPU/software
> > want to stop it for a while for synchronizing with disk or COW.
> >
> > If the hardware support SVM/SVA (Shared Virtual Memory/Address), it is easy, the
> > device share page table with CPU, the device will raise a page fault when the
> > CPU downgrade the PTE to read-only.
> >
> > If the hardware cannot share page table with the CPU, we then need to have
> > some way to change the device page table. This is what happen in ODP. It
> > invalidates the page table in device upon mmu_notifier call back. But this cannot
> > solve the COW problem: if the user process A share a page P with device, and A
> > forks a new process B, and it continue to write to the page. By COW, the
> > process B will keep the page P, while A will get a new page P'. But you have
> > no way to let the device know it should use P' rather than P.

I didn't hear about such issue and we supported fork for a long time.

> >
> > This may be OK for RDMA application. Because RDMA is a big thing and we can ask
> > the programmer to avoid the situation. But for a accelerator, I don't think we
> > can ask a programmer to care for this when use a zlib.
> >
> > In WarpDrive/uacce, we make this simple. If you support IOMMU and it support
> > SVM/SVA. Everything will be fine just like ODP implicit mode. And you don't need
> > to write any code for that. Because it has been done by IOMMU framework. If it
> > dose not, you have to use the kernel allocated memory which has the same IOVA as
> > the VA in user space. So we can still maintain a unify address space among the
> > devices and the applicatin.
> >
> > > 4. mlx5 supports ODP not because of being partially IB device,
> > > but because HW performance oriented implementation is not an easy task.
> > >
> > > >
> > > > >
> > > > > >
> > > > > > Hope this can help the understanding.
> > > > >
> > > > > Yes, it helped me a lot.
> > > > > Now, I'm more than before convinced that this whole patchset shouldn't
> > > > > exist in the first place.
> > > >
> > > > Then maybe you can tell me how I can register my accelerator to the user space?
> > >
> > > Write kernel driver and write user space part of it.
> > > https://github.com/linux-rdma/rdma-core/
> > >
> > > I have no doubts that your colleagues who wrote and maintain
> > > drivers/infiniband/hw/hns driver know best how to do it.
> > > They did it very successfully.
> > >
> > > Thanks
> > >
> > > >
> > > > >
> > > > > To be clear, NAK.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > >
> > > > > > > Hard NAK from RDMA side.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > > Signed-off-by: Kenneth Lee <[email protected]>
> > > > > > > > ---
> > > > > > > > Documentation/warpdrive/warpdrive.rst | 260 +++++++
> > > > > > > > Documentation/warpdrive/wd-arch.svg | 764 ++++++++++++++++++++
> > > > > > > > Documentation/warpdrive/wd.svg | 526 ++++++++++++++
> > > > > > > > Documentation/warpdrive/wd_q_addr_space.svg | 359 +++++++++
> > > > > > > > 4 files changed, 1909 insertions(+)
> > > > > > > > create mode 100644 Documentation/warpdrive/warpdrive.rst
> > > > > > > > create mode 100644 Documentation/warpdrive/wd-arch.svg
> > > > > > > > create mode 100644 Documentation/warpdrive/wd.svg
> > > > > > > > create mode 100644 Documentation/warpdrive/wd_q_addr_space.svg
> > > > > > > >
> > > > > > > > diff --git a/Documentation/warpdrive/warpdrive.rst b/Documentation/warpdrive/warpdrive.rst
> > > > > > > > new file mode 100644
> > > > > > > > index 000000000000..ef84d3a2d462
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/Documentation/warpdrive/warpdrive.rst
> > > > > > > > @@ -0,0 +1,260 @@
> > > > > > > > +Introduction of WarpDrive
> > > > > > > > +=========================
> > > > > > > > +
> > > > > > > > +*WarpDrive* is a general accelerator framework for the user application to
> > > > > > > > +access the hardware without going through the kernel in data path.
> > > > > > > > +
> > > > > > > > +It can be used as the quick channel for accelerators, network adaptors or
> > > > > > > > +other hardware for application in user space.
> > > > > > > > +
> > > > > > > > +This may make some implementation simpler. E.g. you can reuse most of the
> > > > > > > > +*netdev* driver in kernel and just share some ring buffer to the user space
> > > > > > > > +driver for *DPDK* [4] or *ODP* [5]. Or you can combine the RSA accelerator with
> > > > > > > > +the *netdev* in the user space as a https reversed proxy, etc.
> > > > > > > > +
> > > > > > > > +*WarpDrive* takes the hardware accelerator as a heterogeneous processor which
> > > > > > > > +can share particular load from the CPU:
> > > > > > > > +
> > > > > > > > +.. image:: wd.svg
> > > > > > > > + :alt: WarpDrive Concept
> > > > > > > > +
> > > > > > > > +The virtual concept, queue, is used to manage the requests sent to the
> > > > > > > > +accelerator. The application send requests to the queue by writing to some
> > > > > > > > +particular address, while the hardware takes the requests directly from the
> > > > > > > > +address and send feedback accordingly.
> > > > > > > > +
> > > > > > > > +The format of the queue may differ from hardware to hardware. But the
> > > > > > > > +application need not to make any system call for the communication.
> > > > > > > > +
> > > > > > > > +*WarpDrive* tries to create a shared virtual address space for all involved
> > > > > > > > +accelerators. Within this space, the requests sent to queue can refer to any
> > > > > > > > +virtual address, which will be valid to the application and all involved
> > > > > > > > +accelerators.
> > > > > > > > +
> > > > > > > > +The name *WarpDrive* is simply a cool and general name meaning the framework
> > > > > > > > +makes the application faster. It includes general user library, kernel
> > > > > > > > +management module and drivers for the hardware. In kernel, the management
> > > > > > > > +module is called *uacce*, meaning "Unified/User-space-access-intended
> > > > > > > > +Accelerator Framework".
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +How does it work
> > > > > > > > +================
> > > > > > > > +
> > > > > > > > +*WarpDrive* uses *mmap* and *IOMMU* to play the trick.
> > > > > > > > +
> > > > > > > > +*Uacce* creates a chrdev for the device registered to it. A "queue" will be
> > > > > > > > +created when the chrdev is opened. The application access the queue by mmap
> > > > > > > > +different address region of the queue file.
> > > > > > > > +
> > > > > > > > +The following figure demonstrated the queue file address space:
> > > > > > > > +
> > > > > > > > +.. image:: wd_q_addr_space.svg
> > > > > > > > + :alt: WarpDrive Queue Address Space
> > > > > > > > +
> > > > > > > > +The first region of the space, device region, is used for the application to
> > > > > > > > +write request or read answer to or from the hardware.
> > > > > > > > +
> > > > > > > > +Normally, there can be three types of device regions mmio and memory regions.
> > > > > > > > +It is recommended to use common memory for request/answer descriptors and use
> > > > > > > > +the mmio space for device notification, such as doorbell. But of course, this
> > > > > > > > +is all up to the interface designer.
> > > > > > > > +
> > > > > > > > +There can be two types of device memory regions, kernel-only and user-shared.
> > > > > > > > +This will be explained in the "kernel APIs" section.
> > > > > > > > +
> > > > > > > > +The Static Share Virtual Memory region is necessary only when the device IOMMU
> > > > > > > > +does not support "Share Virtual Memory". This will be explained after the
> > > > > > > > +*IOMMU* idea.
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +Architecture
> > > > > > > > +------------
> > > > > > > > +
> > > > > > > > +The full *WarpDrive* architecture is represented in the following class
> > > > > > > > +diagram:
> > > > > > > > +
> > > > > > > > +.. image:: wd-arch.svg
> > > > > > > > + :alt: WarpDrive Architecture
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +The user API
> > > > > > > > +------------
> > > > > > > > +
> > > > > > > > +We adopt a polling style interface in the user space: ::
> > > > > > > > +
> > > > > > > > + int wd_request_queue(struct wd_queue *q);
> > > > > > > > + void wd_release_queue(struct wd_queue *q);
> > > > > > > > +
> > > > > > > > + int wd_send(struct wd_queue *q, void *req);
> > > > > > > > + int wd_recv(struct wd_queue *q, void **req);
> > > > > > > > + int wd_recv_sync(struct wd_queue *q, void **req);
> > > > > > > > + void wd_flush(struct wd_queue *q);
> > > > > > > > +
> > > > > > > > +wd_recv_sync() is a wrapper to its non-sync version. It will trapped into
> > > > > > > > +kernel and waits until the queue become available.
> > > > > > > > +
> > > > > > > > +If the queue do not support SVA/SVM. The following helper function
> > > > > > > > +can be used to create Static Virtual Share Memory: ::
> > > > > > > > +
> > > > > > > > + void *wd_preserve_share_memory(struct wd_queue *q, size_t size);
> > > > > > > > +
> > > > > > > > +The user API is not mandatory. It is simply a suggestion and hint what the
> > > > > > > > +kernel interface is supposed to support.
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +The user driver
> > > > > > > > +---------------
> > > > > > > > +
> > > > > > > > +The queue file mmap space will need a user driver to wrap the communication
> > > > > > > > +protocol. *UACCE* provides some attributes in sysfs for the user driver to
> > > > > > > > +match the right accelerator accordingly.
> > > > > > > > +
> > > > > > > > +The *UACCE* device attribute is under the following directory:
> > > > > > > > +
> > > > > > > > +/sys/class/uacce/<dev-name>/params
> > > > > > > > +
> > > > > > > > +The following attributes is supported:
> > > > > > > > +
> > > > > > > > +nr_queue_remained (ro)
> > > > > > > > + number of queue remained
> > > > > > > > +
> > > > > > > > +api_version (ro)
> > > > > > > > + a string to identify the queue mmap space format and its version
> > > > > > > > +
> > > > > > > > +device_attr (ro)
> > > > > > > > + attributes of the device, see UACCE_DEV_xxx flag defined in uacce.h
> > > > > > > > +
> > > > > > > > +numa_node (ro)
> > > > > > > > + id of numa node
> > > > > > > > +
> > > > > > > > +priority (rw)
> > > > > > > > + Priority or the device, bigger is higher
> > > > > > > > +
> > > > > > > > +(This is not yet implemented in RFC version)
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +The kernel API
> > > > > > > > +--------------
> > > > > > > > +
> > > > > > > > +The *uacce* kernel API is defined in uacce.h. If the hardware support SVM/SVA,
> > > > > > > > +The driver need only the following API functions: ::
> > > > > > > > +
> > > > > > > > + int uacce_register(uacce);
> > > > > > > > + void uacce_unregister(uacce);
> > > > > > > > + void uacce_wake_up(q);
> > > > > > > > +
> > > > > > > > +*uacce_wake_up* is used to notify the process who epoll() on the queue file.
> > > > > > > > +
> > > > > > > > +According to the IOMMU capability, *uacce* categories the devices as follow:
> > > > > > > > +
> > > > > > > > +UACCE_DEV_NOIOMMU
> > > > > > > > + The device has no IOMMU. The user process cannot use VA on the hardware
> > > > > > > > + This mode is not recommended.
> > > > > > > > +
> > > > > > > > +UACCE_DEV_SVA (UACCE_DEV_PASID | UACCE_DEV_FAULT_FROM_DEV)
> > > > > > > > + The device has IOMMU which can share the same page table with user
> > > > > > > > + process
> > > > > > > > +
> > > > > > > > +UACCE_DEV_SHARE_DOMAIN
> > > > > > > > + The device has IOMMU which has no multiple page table and device page
> > > > > > > > + fault support
> > > > > > > > +
> > > > > > > > +If the device works in mode other than UACCE_DEV_NOIOMMU, *uacce* will set its
> > > > > > > > +IOMMU to IOMMU_DOMAIN_UNMANAGED. So the driver must not use any kernel
> > > > > > > > +DMA API but the following ones from *uacce* instead: ::
> > > > > > > > +
> > > > > > > > + uacce_dma_map(q, va, size, prot);
> > > > > > > > + uacce_dma_unmap(q, va, size, prot);
> > > > > > > > +
> > > > > > > > +*uacce_dma_map/unmap* is valid only for UACCE_DEV_SVA device. It creates a
> > > > > > > > +particular PASID and page table for the kernel in the IOMMU (Not yet
> > > > > > > > +implemented in the RFC)
> > > > > > > > +
> > > > > > > > +For the UACCE_DEV_SHARE_DOMAIN device, uacce_dma_map/unmap is not valid.
> > > > > > > > +*Uacce* call back start_queue only when the DUS and DKO region is mmapped. The
> > > > > > > > +accelerator driver must use those dma buffer, via uacce_queue->qfrs[], on
> > > > > > > > +start_queue call back. The size of the queue file region is defined by
> > > > > > > > +uacce->ops->qf_pg_start[].
> > > > > > > > +
> > > > > > > > +We have to do it this way because most of current IOMMU cannot support the
> > > > > > > > +kernel and user virtual address at the same time. So we have to let them both
> > > > > > > > +share the same user virtual address space.
> > > > > > > > +
> > > > > > > > +If the device have to support kernel and user at the same time, both kernel
> > > > > > > > +and the user should use these DMA API. This is not convenient. A better
> > > > > > > > +solution is to change the future DMA/IOMMU design to let them separate the
> > > > > > > > +address space between the user and kernel space. But it is not going to be in
> > > > > > > > +a short time.
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +Multiple processes support
> > > > > > > > +==========================
> > > > > > > > +
> > > > > > > > +In the latest mainline kernel (4.19) when this document is written, the IOMMU
> > > > > > > > +subsystem do not support multiple process page tables yet.
> > > > > > > > +
> > > > > > > > +Most IOMMU hardware implementation support multi-process with the concept
> > > > > > > > +of PASID. But they may use different name, e.g. it is call sub-stream-id in
> > > > > > > > +SMMU of ARM. With PASID or similar design, multi page table can be added to
> > > > > > > > +the IOMMU and referred by its PASID.
> > > > > > > > +
> > > > > > > > +*JPB* has a patchset to enable this[1]_. We have tested it with our hardware
> > > > > > > > +(which is known as *D06*). It works well. *WarpDrive* rely on them to support
> > > > > > > > +UACCE_DEV_SVA. If it is not enabled, *WarpDrive* can still work. But it
> > > > > > > > +support only one process, the device will be set to UACCE_DEV_SHARE_DOMAIN
> > > > > > > > +even it is set to UACCE_DEV_SVA initially.
> > > > > > > > +
> > > > > > > > +Static Share Virtual Memory is mainly used by UACCE_DEV_SHARE_DOMAIN device.
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +Legacy Mode Support
> > > > > > > > +===================
> > > > > > > > +For the hardware without IOMMU, WarpDrive can still work, the only problem is
> > > > > > > > +VA cannot be used in the device. The driver should adopt another strategy for
> > > > > > > > +the shared memory. It is only for testing, and not recommended.
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +The Folk Scenario
> > > > > > > > +=================
> > > > > > > > +For a process with allocated queues and shared memory, what happen if it forks
> > > > > > > > +a child?
> > > > > > > > +
> > > > > > > > +The fd of the queue will be duplicated on folk, so the child can send request
> > > > > > > > +to the same queue as its parent. But the requests which is sent from processes
> > > > > > > > +except for the one who open the queue will be blocked.
> > > > > > > > +
> > > > > > > > +It is recommended to add O_CLOEXEC to the queue file.
> > > > > > > > +
> > > > > > > > +The queue mmap space has a VM_DONTCOPY in its VMA. So the child will lost all
> > > > > > > > +those VMAs.
> > > > > > > > +
> > > > > > > > +This is why *WarpDrive* does not adopt the mode used in *VFIO* and *InfiniBand*.
> > > > > > > > +Both solutions can set any user pointer for hardware sharing. But they cannot
> > > > > > > > +support fork when the dma is in process. Or the "Copy-On-Write" procedure will
> > > > > > > > +make the parent process lost its physical pages.
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +The Sample Code
> > > > > > > > +===============
> > > > > > > > +There is a sample user land implementation with a simple driver for Hisilicon
> > > > > > > > +Hi1620 ZIP Accelerator.
> > > > > > > > +
> > > > > > > > +To test, do the following in samples/warpdrive (for the case of PC host): ::
> > > > > > > > + ./autogen.sh
> > > > > > > > + ./conf.sh # or simply ./configure if you build on target system
> > > > > > > > + make
> > > > > > > > +
> > > > > > > > +Then you can get test_hisi_zip in the test subdirectory. Copy it to the target
> > > > > > > > +system and make sure the hisi_zip driver is enabled (the major and minor of
> > > > > > > > +the uacce chrdev can be gotten from the dmesg or sysfs), and run: ::
> > > > > > > > + mknod /dev/ua1 c <major> <minior>
> > > > > > > > + test/test_hisi_zip -z < data > data.zip
> > > > > > > > + test/test_hisi_zip -g < data > data.gzip
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +References
> > > > > > > > +==========
> > > > > > > > +.. [1] https://patchwork.kernel.org/patch/10394851/
> > > > > > > > +
> > > > > > > > +.. vim: tw=78
> > > > [...]
> > > > > > > > --
> > > > > > > > 2.17.1
> > > > > > > >
>
> I don't know if Mr. Jerome Glisse in the list. I think I should cc him for my
> respectation to his help on last RFC.
>
> - Kenneth


Attachments:
(No filename) (29.99 kB)
signature.asc (801.00 B)
Download all attachments

2018-11-20 03:13:16

by Jerome Glisse

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 12:48:01PM +0200, Leon Romanovsky wrote:
> On Mon, Nov 19, 2018 at 05:19:10PM +0800, Kenneth Lee wrote:
> > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> > > On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote:
> > > > On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote:
> > > > > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote:
> > > > > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote:
> > > > > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote:

[...]

> > > > memory exposed to user is properly protected from security point of view.
> > > > 3. "stop using the page for a while for the copying" - I'm not fully
> > > > understand this claim, maybe this article will help you to better
> > > > describe : https://lwn.net/Articles/753027/
> > >
> > > This topic was being discussed in RFCv2. The key problem here is that:
> > >
> > > The device need to hold the memory for its own calculation, but the CPU/software
> > > want to stop it for a while for synchronizing with disk or COW.
> > >
> > > If the hardware support SVM/SVA (Shared Virtual Memory/Address), it is easy, the
> > > device share page table with CPU, the device will raise a page fault when the
> > > CPU downgrade the PTE to read-only.
> > >
> > > If the hardware cannot share page table with the CPU, we then need to have
> > > some way to change the device page table. This is what happen in ODP. It
> > > invalidates the page table in device upon mmu_notifier call back. But this cannot
> > > solve the COW problem: if the user process A share a page P with device, and A
> > > forks a new process B, and it continue to write to the page. By COW, the
> > > process B will keep the page P, while A will get a new page P'. But you have
> > > no way to let the device know it should use P' rather than P.
>
> I didn't hear about such issue and we supported fork for a long time.
>

Just to comment on this, any infiniband driver which use umem and do
not have ODP (here ODP for me means listening to mmu notifier so all
infiniband driver except mlx5) will be affected by same issue AFAICT.

AFAICT there is no special thing happening after fork() inside any of
those driver. So if parent create a umem mr before fork() and program
hardware with it then after fork() the parent might start using new
page for the umem range while the old memory is use by the child. The
reverse is also true (parent using old memory and child new memory)
bottom line you can not predict which memory the child or the parent
will use for the range after fork().

So no matter what you consider the child or the parent, what the hw
will use for the mr is unlikely to match what the CPU use for the
same virtual address. In other word:

Before fork:
CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE

Case 1:
CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
CPU child: virtual addr ptr1 -> physical address = 0xDEAD
HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE

Case 2:
CPU parent: virtual addr ptr1 -> physical address = 0xBEEF
CPU child: virtual addr ptr1 -> physical address = 0xCAFE
HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE


This apply for every single page and is not predictable. This only
apply to private memory (mmap() with MAP_PRIVATE)

I am not familiar enough with RDMA user space API contract to know
if this is an issue or not.

Note that this can not be fix, no one should have done umem without
ODP like mlx5. For this to work properly you need sane hardware like
mlx5.

Cheers,
J?r?me

2018-11-20 05:14:48

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:

> If the hardware cannot share page table with the CPU, we then need to have
> some way to change the device page table. This is what happen in ODP. It
> invalidates the page table in device upon mmu_notifier call back. But this cannot
> solve the COW problem: if the user process A share a page P with device, and A
> forks a new process B, and it continue to write to the page. By COW, the
> process B will keep the page P, while A will get a new page P'. But you have
> no way to let the device know it should use P' rather than P.

Is this true? I thought mmu_notifiers covered all these cases.

The mm_notifier for A should fire if B causes the physical address of
A's pages to change via COW.

And this causes the device page tables to re-synchronize.

> In WarpDrive/uacce, we make this simple. If you support IOMMU and it support
> SVM/SVA. Everything will be fine just like ODP implicit mode. And you don't need
> to write any code for that. Because it has been done by IOMMU framework. If it

Looks like the IOMMU code uses mmu_notifier, so it is identical to
IB's ODP. The only difference is that IB tends to have the IOMMU page
table in the device, not in the CPU.

The only case I know if that is different is the new-fangled CAPI
stuff where the IOMMU can directly use the CPU's page table and the
IOMMU page table (in device or CPU) is eliminated.

Anyhow, I don't think a single instance of hardware should justify an
entire new subsystem. Subsystems are hard to make and without multiple
hardware examples there is no way to expect that it would cover any
future use cases.

If all your driver needs is to mmap some PCI bar space, route
interrupts and do DMA mapping then mediated VFIO is probably a good
choice.

If it needs to do a bunch of other stuff, not related to PCI bar
space, interrupts and DMA mapping (ie special code for compression,
crypto, AI, whatever) then you should probably do what Jerome said and
make a drivers/char/hisillicon_foo_bar.c that exposes just what your
hardware does.

If you have networking involved in here then consider RDMA,
particularly if this functionality is already part of the same
hardware that the hns infiniband driver is servicing.

'computational MRs' are a reasonable approach to a side-car offload of
already existing RDMA support.

Jason

2018-11-19 19:40:37

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> Date: Mon, 19 Nov 2018 17:14:05 +0800
> From: Kenneth Lee <[email protected]>
> To: Leon Romanovsky <[email protected]>
> CC: Tim Sell <[email protected]>, [email protected],
> Alexander Shishkin <[email protected]>, Zaibo Xu
> <[email protected]>, [email protected], [email protected],
> [email protected], Christoph Lameter <[email protected]>, Hao Fang
> <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> list <[email protected]>, Vinod Koul <[email protected]>, Jason
> Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> Kleine-König <[email protected]>, David Kershner
> <[email protected]>, Kenneth Lee <[email protected]>, Johan
> Hovold <[email protected]>, Cyrille Pitchen
> <[email protected]>, Sagar Dharia
> <[email protected]>, Jens Axboe <[email protected]>,
> [email protected], linux-netdev <[email protected]>, Randy Dunlap
> <[email protected]>, [email protected], Zhou Wang
> <[email protected]>, [email protected], Philippe
> Ombredanne <[email protected]>, Sanyog Kale <[email protected]>,
> "David S. Miller" <[email protected]>,
> [email protected]
> Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> User-Agent: Mutt/1.5.21 (2010-09-15)
> Message-ID: <20181119091405.GE157308@Turing-Arch-b>
>
> On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote:
> > Date: Thu, 15 Nov 2018 16:54:55 +0200
> > From: Leon Romanovsky <[email protected]>
> > To: Kenneth Lee <[email protected]>
> > CC: Kenneth Lee <[email protected]>, Tim Sell <[email protected]>,
> > [email protected], Alexander Shishkin
> > <[email protected]>, Zaibo Xu <[email protected]>,
> > [email protected], [email protected], [email protected],
> > Christoph Lameter <[email protected]>, Hao Fang <[email protected]>, Gavin
> > Schenk <[email protected]>, RDMA mailing list
> > <[email protected]>, Zhou Wang <[email protected]>, Jason
> > Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> > Kleine-König <[email protected]>, David Kershner
> > <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> > Pitchen <[email protected]>, Sagar Dharia
> > <[email protected]>, Jens Axboe <[email protected]>,
> > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > <[email protected]>, [email protected], Vinod Koul
> > <[email protected]>, [email protected], Philippe Ombredanne
> > <[email protected]>, Sanyog Kale <[email protected]>, "David S.
> > Miller" <[email protected]>, [email protected]
> > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > User-Agent: Mutt/1.10.1 (2018-07-13)
> > Message-ID: <[email protected]>
> >
> > On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote:
> > > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote:
> > > > Date: Wed, 14 Nov 2018 18:00:17 +0200
> > > > From: Leon Romanovsky <[email protected]>
> > > > To: Kenneth Lee <[email protected]>
> > > > CC: Tim Sell <[email protected]>, [email protected],
> > > > Alexander Shishkin <[email protected]>, Zaibo Xu
> > > > <[email protected]>, [email protected], [email protected],
> > > > [email protected], Christoph Lameter <[email protected]>, Hao Fang
> > > > <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> > > > list <[email protected]>, Zhou Wang <[email protected]>,
> > > > Jason Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> > > > Kleine-König <[email protected]>, David Kershner
> > > > <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> > > > Pitchen <[email protected]>, Sagar Dharia
> > > > <[email protected]>, Jens Axboe <[email protected]>,
> > > > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > > > <[email protected]>, [email protected], Vinod Koul
> > > > <[email protected]>, [email protected], Philippe Ombredanne
> > > > <[email protected]>, Sanyog Kale <[email protected]>, Kenneth Lee
> > > > <[email protected]>, "David S. Miller" <[email protected]>,
> > > > [email protected]
> > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > > User-Agent: Mutt/1.10.1 (2018-07-13)
> > > > Message-ID: <[email protected]>
> > > >
> > > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote:
> > > > >
> > > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道:
> > > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote:
> > > > > > > From: Kenneth Lee <[email protected]>
> > > > > > >
> > > > > > > WarpDrive is a general accelerator framework for the user application to
> > > > > > > access the hardware without going through the kernel in data path.
> > > > > > >
> > > > > > > The kernel component to provide kernel facility to driver for expose the
> > > > > > > user interface is called uacce. It a short name for
> > > > > > > "Unified/User-space-access-intended Accelerator Framework".
> > > > > > >
> > > > > > > This patch add document to explain how it works.
> > > > > > + RDMA and netdev folks
> > > > > >
> > > > > > Sorry, to be late in the game, I don't see other patches, but from
> > > > > > the description below it seems like you are reinventing RDMA verbs
> > > > > > model. I have hard time to see the differences in the proposed
> > > > > > framework to already implemented in drivers/infiniband/* for the kernel
> > > > > > space and for the https://github.com/linux-rdma/rdma-core/ for the user
> > > > > > space parts.
> > > > >
> > > > > Thanks Leon,
> > > > >
> > > > > Yes, we tried to solve similar problem in RDMA. We also learned a lot from
> > > > > the exist code of RDMA. But we we have to make a new one because we cannot
> > > > > register accelerators such as AI operation, encryption or compression to the
> > > > > RDMA framework:)
> > > >
> > > > Assuming that you did everything right and still failed to use RDMA
> > > > framework, you was supposed to fix it and not to reinvent new exactly
> > > > same one. It is how we develop kernel, by reusing existing code.
> > >
> > > Yes, but we don't force other system such as NIC or GPU into RDMA, do we?
> >
> > You don't introduce new NIC or GPU, but proposing another interface to
> > directly access HW memory and bypass kernel for the data path. This is
> > whole idea of RDMA and this is why it is already present in the kernel.
> >
> > Various hardware devices are supported in our stack allow a ton of crazy
> > stuff, including GPUs interconnections and NIC functionalities.
>
> Yes. We don't want to invent new wheel. That is why we did it behind VFIO in RFC
> v1 and v2. But finally we were persuaded by Mr. Jerome Glisse that VFIO was not
> a good place to solve the problem.
>
> And currently, as you see, IB is bound with devices doing RDMA. The register
> function, ib_register_device() hint that it is a netdev (get_netdev() callback), it know
> about gid, pkey, and Memory Window. IB is not simply a address space management
> framework. And verbs to IB are not transparent. If we start to add
> compression/decompression, AI (RNN, CNN stuff) operations, and encryption/decryption
> to the verbs set. It will become very complexity. Or maybe I misunderstand the
> IB idea? But I don't see compression hardware is integrated in the mainline
> Kernel. Could you directly point out which one I can used as a reference?
>
> >
> > >
> > > I assume you would not agree to register a zip accelerator to infiniband? :)
> >
> > "infiniband" name in the "drivers/infiniband/" is legacy one and the
> > current code supports IB, RoCE, iWARP and OmniPath as a transport layers.
> > For a lone time, we wanted to rename that folder to be "drivers/rdma",
> > but didn't find enough brave men/women to do it, due to backport mess
> > for such move.
> >
> > The addition of zip accelerator to RDMA is possible and depends on how
> > you will model such new functionality - new driver, or maybe new ULP.
> >
> > >
> > > Further, I don't think it is wise to break an exist system (RDMA) to fulfill a
> > > totally new scenario. The better choice is to let them run in parallel for some
> > > time and try to merge them accordingly.
> >
> > Awesome, so please run your code out-of-tree for now and once you are ready
> > for submission let's try to merge it.
>
> Yes, yes. We know trust need time to gain. But the fact is that there is no
> accelerator user driver can be added to mainline kernel. We should raise the
> topic time to time. So to help the communication to fix the gap, right?
>
> We are also opened to cooperate with IB to do it within the IB framework. But
> please let me know where to start. I feel it is quite wired to make a
> ib_register_device for a zip or RSA accelerator.
>
> >
> > >
> > > >
> > > > >
> > > > > Another problem we tried to address is the way to pin the memory for dma
> > > > > operation. The RDMA way to pin the memory cannot avoid the page lost due to
> > > > > copy-on-write operation during the memory is used by the device. This may
> > > > > not be important to RDMA library. But it is important to accelerator.
> > > >
> > > > Such support exists in drivers/infiniband/ from late 2014 and
> > > > it is called ODP (on demand paging).
> > >
> > > I reviewed ODP and I think it is a solution bound to infiniband. It is part of
> > > MR semantics and required a infiniband specific hook
> > > (ucontext->invalidate_range()). And the hook requires the device to be able to
> > > stop using the page for a while for the copying. It is ok for infiniband
> > > (actually, only mlx5 uses it). I don't think most accelerators can support
> > > this mode. But WarpDrive works fully on top of IOMMU interface, it has no this
> > > limitation.
> >
> > 1. It has nothing to do with infiniband.
>
> But it must be a ib_dev first.
>
> > 2. MR and uncontext are verbs semantics and needed to ensure that host
> > memory exposed to user is properly protected from security point of view.
> > 3. "stop using the page for a while for the copying" - I'm not fully
> > understand this claim, maybe this article will help you to better
> > describe : https://lwn.net/Articles/753027/
>
> This topic was being discussed in RFCv2. The key problem here is that:
>
> The device need to hold the memory for its own calculation, but the CPU/software
> want to stop it for a while for synchronizing with disk or COW.
>
> If the hardware support SVM/SVA (Shared Virtual Memory/Address), it is easy, the
> device share page table with CPU, the device will raise a page fault when the
> CPU downgrade the PTE to read-only.
>
> If the hardware cannot share page table with the CPU, we then need to have
> some way to change the device page table. This is what happen in ODP. It
> invalidates the page table in device upon mmu_notifier call back. But this cannot
> solve the COW problem: if the user process A share a page P with device, and A
> forks a new process B, and it continue to write to the page. By COW, the
> process B will keep the page P, while A will get a new page P'. But you have
> no way to let the device know it should use P' rather than P.
>
> This may be OK for RDMA application. Because RDMA is a big thing and we can ask
> the programmer to avoid the situation. But for a accelerator, I don't think we
> can ask a programmer to care for this when use a zlib.
>
> In WarpDrive/uacce, we make this simple. If you support IOMMU and it support
> SVM/SVA. Everything will be fine just like ODP implicit mode. And you don't need
> to write any code for that. Because it has been done by IOMMU framework. If it
> dose not, you have to use the kernel allocated memory which has the same IOVA as
> the VA in user space. So we can still maintain a unify address space among the
> devices and the applicatin.
>
> > 4. mlx5 supports ODP not because of being partially IB device,
> > but because HW performance oriented implementation is not an easy task.
> >
> > >
> > > >
> > > > >
> > > > > Hope this can help the understanding.
> > > >
> > > > Yes, it helped me a lot.
> > > > Now, I'm more than before convinced that this whole patchset shouldn't
> > > > exist in the first place.
> > >
> > > Then maybe you can tell me how I can register my accelerator to the user space?
> >
> > Write kernel driver and write user space part of it.
> > https://github.com/linux-rdma/rdma-core/
> >
> > I have no doubts that your colleagues who wrote and maintain
> > drivers/infiniband/hw/hns driver know best how to do it.
> > They did it very successfully.
> >
> > Thanks
> >
> > >
> > > >
> > > > To be clear, NAK.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Cheers
> > > > >
> > > > > >
> > > > > > Hard NAK from RDMA side.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > > Signed-off-by: Kenneth Lee <[email protected]>
> > > > > > > ---
> > > > > > > Documentation/warpdrive/warpdrive.rst | 260 +++++++
> > > > > > > Documentation/warpdrive/wd-arch.svg | 764 ++++++++++++++++++++
> > > > > > > Documentation/warpdrive/wd.svg | 526 ++++++++++++++
> > > > > > > Documentation/warpdrive/wd_q_addr_space.svg | 359 +++++++++
> > > > > > > 4 files changed, 1909 insertions(+)
> > > > > > > create mode 100644 Documentation/warpdrive/warpdrive.rst
> > > > > > > create mode 100644 Documentation/warpdrive/wd-arch.svg
> > > > > > > create mode 100644 Documentation/warpdrive/wd.svg
> > > > > > > create mode 100644 Documentation/warpdrive/wd_q_addr_space.svg
> > > > > > >
> > > > > > > diff --git a/Documentation/warpdrive/warpdrive.rst b/Documentation/warpdrive/warpdrive.rst
> > > > > > > new file mode 100644
> > > > > > > index 000000000000..ef84d3a2d462
> > > > > > > --- /dev/null
> > > > > > > +++ b/Documentation/warpdrive/warpdrive.rst
> > > > > > > @@ -0,0 +1,260 @@
> > > > > > > +Introduction of WarpDrive
> > > > > > > +=========================
> > > > > > > +
> > > > > > > +*WarpDrive* is a general accelerator framework for the user application to
> > > > > > > +access the hardware without going through the kernel in data path.
> > > > > > > +
> > > > > > > +It can be used as the quick channel for accelerators, network adaptors or
> > > > > > > +other hardware for application in user space.
> > > > > > > +
> > > > > > > +This may make some implementation simpler. E.g. you can reuse most of the
> > > > > > > +*netdev* driver in kernel and just share some ring buffer to the user space
> > > > > > > +driver for *DPDK* [4] or *ODP* [5]. Or you can combine the RSA accelerator with
> > > > > > > +the *netdev* in the user space as a https reversed proxy, etc.
> > > > > > > +
> > > > > > > +*WarpDrive* takes the hardware accelerator as a heterogeneous processor which
> > > > > > > +can share particular load from the CPU:
> > > > > > > +
> > > > > > > +.. image:: wd.svg
> > > > > > > + :alt: WarpDrive Concept
> > > > > > > +
> > > > > > > +The virtual concept, queue, is used to manage the requests sent to the
> > > > > > > +accelerator. The application send requests to the queue by writing to some
> > > > > > > +particular address, while the hardware takes the requests directly from the
> > > > > > > +address and send feedback accordingly.
> > > > > > > +
> > > > > > > +The format of the queue may differ from hardware to hardware. But the
> > > > > > > +application need not to make any system call for the communication.
> > > > > > > +
> > > > > > > +*WarpDrive* tries to create a shared virtual address space for all involved
> > > > > > > +accelerators. Within this space, the requests sent to queue can refer to any
> > > > > > > +virtual address, which will be valid to the application and all involved
> > > > > > > +accelerators.
> > > > > > > +
> > > > > > > +The name *WarpDrive* is simply a cool and general name meaning the framework
> > > > > > > +makes the application faster. It includes general user library, kernel
> > > > > > > +management module and drivers for the hardware. In kernel, the management
> > > > > > > +module is called *uacce*, meaning "Unified/User-space-access-intended
> > > > > > > +Accelerator Framework".
> > > > > > > +
> > > > > > > +
> > > > > > > +How does it work
> > > > > > > +================
> > > > > > > +
> > > > > > > +*WarpDrive* uses *mmap* and *IOMMU* to play the trick.
> > > > > > > +
> > > > > > > +*Uacce* creates a chrdev for the device registered to it. A "queue" will be
> > > > > > > +created when the chrdev is opened. The application access the queue by mmap
> > > > > > > +different address region of the queue file.
> > > > > > > +
> > > > > > > +The following figure demonstrated the queue file address space:
> > > > > > > +
> > > > > > > +.. image:: wd_q_addr_space.svg
> > > > > > > + :alt: WarpDrive Queue Address Space
> > > > > > > +
> > > > > > > +The first region of the space, device region, is used for the application to
> > > > > > > +write request or read answer to or from the hardware.
> > > > > > > +
> > > > > > > +Normally, there can be three types of device regions mmio and memory regions.
> > > > > > > +It is recommended to use common memory for request/answer descriptors and use
> > > > > > > +the mmio space for device notification, such as doorbell. But of course, this
> > > > > > > +is all up to the interface designer.
> > > > > > > +
> > > > > > > +There can be two types of device memory regions, kernel-only and user-shared.
> > > > > > > +This will be explained in the "kernel APIs" section.
> > > > > > > +
> > > > > > > +The Static Share Virtual Memory region is necessary only when the device IOMMU
> > > > > > > +does not support "Share Virtual Memory". This will be explained after the
> > > > > > > +*IOMMU* idea.
> > > > > > > +
> > > > > > > +
> > > > > > > +Architecture
> > > > > > > +------------
> > > > > > > +
> > > > > > > +The full *WarpDrive* architecture is represented in the following class
> > > > > > > +diagram:
> > > > > > > +
> > > > > > > +.. image:: wd-arch.svg
> > > > > > > + :alt: WarpDrive Architecture
> > > > > > > +
> > > > > > > +
> > > > > > > +The user API
> > > > > > > +------------
> > > > > > > +
> > > > > > > +We adopt a polling style interface in the user space: ::
> > > > > > > +
> > > > > > > + int wd_request_queue(struct wd_queue *q);
> > > > > > > + void wd_release_queue(struct wd_queue *q);
> > > > > > > +
> > > > > > > + int wd_send(struct wd_queue *q, void *req);
> > > > > > > + int wd_recv(struct wd_queue *q, void **req);
> > > > > > > + int wd_recv_sync(struct wd_queue *q, void **req);
> > > > > > > + void wd_flush(struct wd_queue *q);
> > > > > > > +
> > > > > > > +wd_recv_sync() is a wrapper to its non-sync version. It will trapped into
> > > > > > > +kernel and waits until the queue become available.
> > > > > > > +
> > > > > > > +If the queue do not support SVA/SVM. The following helper function
> > > > > > > +can be used to create Static Virtual Share Memory: ::
> > > > > > > +
> > > > > > > + void *wd_preserve_share_memory(struct wd_queue *q, size_t size);
> > > > > > > +
> > > > > > > +The user API is not mandatory. It is simply a suggestion and hint what the
> > > > > > > +kernel interface is supposed to support.
> > > > > > > +
> > > > > > > +
> > > > > > > +The user driver
> > > > > > > +---------------
> > > > > > > +
> > > > > > > +The queue file mmap space will need a user driver to wrap the communication
> > > > > > > +protocol. *UACCE* provides some attributes in sysfs for the user driver to
> > > > > > > +match the right accelerator accordingly.
> > > > > > > +
> > > > > > > +The *UACCE* device attribute is under the following directory:
> > > > > > > +
> > > > > > > +/sys/class/uacce/<dev-name>/params
> > > > > > > +
> > > > > > > +The following attributes is supported:
> > > > > > > +
> > > > > > > +nr_queue_remained (ro)
> > > > > > > + number of queue remained
> > > > > > > +
> > > > > > > +api_version (ro)
> > > > > > > + a string to identify the queue mmap space format and its version
> > > > > > > +
> > > > > > > +device_attr (ro)
> > > > > > > + attributes of the device, see UACCE_DEV_xxx flag defined in uacce.h
> > > > > > > +
> > > > > > > +numa_node (ro)
> > > > > > > + id of numa node
> > > > > > > +
> > > > > > > +priority (rw)
> > > > > > > + Priority or the device, bigger is higher
> > > > > > > +
> > > > > > > +(This is not yet implemented in RFC version)
> > > > > > > +
> > > > > > > +
> > > > > > > +The kernel API
> > > > > > > +--------------
> > > > > > > +
> > > > > > > +The *uacce* kernel API is defined in uacce.h. If the hardware support SVM/SVA,
> > > > > > > +The driver need only the following API functions: ::
> > > > > > > +
> > > > > > > + int uacce_register(uacce);
> > > > > > > + void uacce_unregister(uacce);
> > > > > > > + void uacce_wake_up(q);
> > > > > > > +
> > > > > > > +*uacce_wake_up* is used to notify the process who epoll() on the queue file.
> > > > > > > +
> > > > > > > +According to the IOMMU capability, *uacce* categories the devices as follow:
> > > > > > > +
> > > > > > > +UACCE_DEV_NOIOMMU
> > > > > > > + The device has no IOMMU. The user process cannot use VA on the hardware
> > > > > > > + This mode is not recommended.
> > > > > > > +
> > > > > > > +UACCE_DEV_SVA (UACCE_DEV_PASID | UACCE_DEV_FAULT_FROM_DEV)
> > > > > > > + The device has IOMMU which can share the same page table with user
> > > > > > > + process
> > > > > > > +
> > > > > > > +UACCE_DEV_SHARE_DOMAIN
> > > > > > > + The device has IOMMU which has no multiple page table and device page
> > > > > > > + fault support
> > > > > > > +
> > > > > > > +If the device works in mode other than UACCE_DEV_NOIOMMU, *uacce* will set its
> > > > > > > +IOMMU to IOMMU_DOMAIN_UNMANAGED. So the driver must not use any kernel
> > > > > > > +DMA API but the following ones from *uacce* instead: ::
> > > > > > > +
> > > > > > > + uacce_dma_map(q, va, size, prot);
> > > > > > > + uacce_dma_unmap(q, va, size, prot);
> > > > > > > +
> > > > > > > +*uacce_dma_map/unmap* is valid only for UACCE_DEV_SVA device. It creates a
> > > > > > > +particular PASID and page table for the kernel in the IOMMU (Not yet
> > > > > > > +implemented in the RFC)
> > > > > > > +
> > > > > > > +For the UACCE_DEV_SHARE_DOMAIN device, uacce_dma_map/unmap is not valid.
> > > > > > > +*Uacce* call back start_queue only when the DUS and DKO region is mmapped. The
> > > > > > > +accelerator driver must use those dma buffer, via uacce_queue->qfrs[], on
> > > > > > > +start_queue call back. The size of the queue file region is defined by
> > > > > > > +uacce->ops->qf_pg_start[].
> > > > > > > +
> > > > > > > +We have to do it this way because most of current IOMMU cannot support the
> > > > > > > +kernel and user virtual address at the same time. So we have to let them both
> > > > > > > +share the same user virtual address space.
> > > > > > > +
> > > > > > > +If the device have to support kernel and user at the same time, both kernel
> > > > > > > +and the user should use these DMA API. This is not convenient. A better
> > > > > > > +solution is to change the future DMA/IOMMU design to let them separate the
> > > > > > > +address space between the user and kernel space. But it is not going to be in
> > > > > > > +a short time.
> > > > > > > +
> > > > > > > +
> > > > > > > +Multiple processes support
> > > > > > > +==========================
> > > > > > > +
> > > > > > > +In the latest mainline kernel (4.19) when this document is written, the IOMMU
> > > > > > > +subsystem do not support multiple process page tables yet.
> > > > > > > +
> > > > > > > +Most IOMMU hardware implementation support multi-process with the concept
> > > > > > > +of PASID. But they may use different name, e.g. it is call sub-stream-id in
> > > > > > > +SMMU of ARM. With PASID or similar design, multi page table can be added to
> > > > > > > +the IOMMU and referred by its PASID.
> > > > > > > +
> > > > > > > +*JPB* has a patchset to enable this[1]_. We have tested it with our hardware
> > > > > > > +(which is known as *D06*). It works well. *WarpDrive* rely on them to support
> > > > > > > +UACCE_DEV_SVA. If it is not enabled, *WarpDrive* can still work. But it
> > > > > > > +support only one process, the device will be set to UACCE_DEV_SHARE_DOMAIN
> > > > > > > +even it is set to UACCE_DEV_SVA initially.
> > > > > > > +
> > > > > > > +Static Share Virtual Memory is mainly used by UACCE_DEV_SHARE_DOMAIN device.
> > > > > > > +
> > > > > > > +
> > > > > > > +Legacy Mode Support
> > > > > > > +===================
> > > > > > > +For the hardware without IOMMU, WarpDrive can still work, the only problem is
> > > > > > > +VA cannot be used in the device. The driver should adopt another strategy for
> > > > > > > +the shared memory. It is only for testing, and not recommended.
> > > > > > > +
> > > > > > > +
> > > > > > > +The Folk Scenario
> > > > > > > +=================
> > > > > > > +For a process with allocated queues and shared memory, what happen if it forks
> > > > > > > +a child?
> > > > > > > +
> > > > > > > +The fd of the queue will be duplicated on folk, so the child can send request
> > > > > > > +to the same queue as its parent. But the requests which is sent from processes
> > > > > > > +except for the one who open the queue will be blocked.
> > > > > > > +
> > > > > > > +It is recommended to add O_CLOEXEC to the queue file.
> > > > > > > +
> > > > > > > +The queue mmap space has a VM_DONTCOPY in its VMA. So the child will lost all
> > > > > > > +those VMAs.
> > > > > > > +
> > > > > > > +This is why *WarpDrive* does not adopt the mode used in *VFIO* and *InfiniBand*.
> > > > > > > +Both solutions can set any user pointer for hardware sharing. But they cannot
> > > > > > > +support fork when the dma is in process. Or the "Copy-On-Write" procedure will
> > > > > > > +make the parent process lost its physical pages.
> > > > > > > +
> > > > > > > +
> > > > > > > +The Sample Code
> > > > > > > +===============
> > > > > > > +There is a sample user land implementation with a simple driver for Hisilicon
> > > > > > > +Hi1620 ZIP Accelerator.
> > > > > > > +
> > > > > > > +To test, do the following in samples/warpdrive (for the case of PC host): ::
> > > > > > > + ./autogen.sh
> > > > > > > + ./conf.sh # or simply ./configure if you build on target system
> > > > > > > + make
> > > > > > > +
> > > > > > > +Then you can get test_hisi_zip in the test subdirectory. Copy it to the target
> > > > > > > +system and make sure the hisi_zip driver is enabled (the major and minor of
> > > > > > > +the uacce chrdev can be gotten from the dmesg or sysfs), and run: ::
> > > > > > > + mknod /dev/ua1 c <major> <minior>
> > > > > > > + test/test_hisi_zip -z < data > data.zip
> > > > > > > + test/test_hisi_zip -g < data > data.gzip
> > > > > > > +
> > > > > > > +
> > > > > > > +References
> > > > > > > +==========
> > > > > > > +.. [1] https://patchwork.kernel.org/patch/10394851/
> > > > > > > +
> > > > > > > +.. vim: tw=78
> > > [...]
> > > > > > > --
> > > > > > > 2.17.1
> > > > > > >

I don't know if Mr. Jerome Glisse in the list. I think I should cc him for my
respectation to his help on last RFC.

- Kenneth

2018-11-20 08:06:49

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 04:33:20PM -0500, Jerome Glisse wrote:
> On Mon, Nov 19, 2018 at 02:26:38PM -0700, Jason Gunthorpe wrote:
> > On Mon, Nov 19, 2018 at 03:26:15PM -0500, Jerome Glisse wrote:
> > > On Mon, Nov 19, 2018 at 01:11:56PM -0700, Jason Gunthorpe wrote:
> > > > On Mon, Nov 19, 2018 at 02:46:32PM -0500, Jerome Glisse wrote:
> > > >
> > > > > > ?? How can O_DIRECT be fine but RDMA not? They use exactly the same
> > > > > > get_user_pages flow, right? Can we do what O_DIRECT does in RDMA and
> > > > > > be fine too?
> > > > > >
> > > > > > AFAIK the only difference is the length of the race window. You'd have
> > > > > > to fork and fault during the shorter time O_DIRECT has get_user_pages
> > > > > > open.
> > > > >
> > > > > Well in O_DIRECT case there is only one page table, the CPU
> > > > > page table and it gets updated during fork() so there is an
> > > > > ordering there and the race window is small.
> > > >
> > > > Not really, in O_DIRECT case there is another 'page table', we just
> > > > call it a DMA scatter/gather list and it is sent directly to the block
> > > > device's DMA HW. The sgl plays exactly the same role as the various HW
> > > > page list data structures that underly RDMA MRs.
> > > >
> > > > It is not a page table that matters here, it is if the DMA address of
> > > > the page is active for DMA on HW.
> > > >
> > > > Like you say, the only difference is that the race is hopefully small
> > > > with O_DIRECT (though that is not really small, NVMeof for instance
> > > > has windows as large as connection timeouts, if you try hard enough)
> > > >
> > > > So we probably can trigger this trouble with O_DIRECT and fork(), and
> > > > I would call it a bug :(
> > >
> > > I can not think of any scenario that would be a bug with O_DIRECT.
> > > Do you have one in mind ? When you fork() and do other syscall that
> > > affect the memory of your process in another thread you should
> > > expect non consistant results. Kernel is not here to provide a fully
> > > safe environement to user, user can shoot itself in the foot and
> > > that's fine as long as it only affect the process itself and no one
> > > else. We should not be in the business of making everything baby
> > > proof :)
> >
> > Sure, I setup AIO with O_DIRECT and launch a read.
> >
> > Then I fork and dirty the READ target memory using the CPU in the
> > child.
> >
> > As you described in this case the fork will retain the physical page
> > that is undergoing O_DIRECT DMA, and the parent gets a new copy'd page.
> >
> > The DMA completes, and the child gets the DMA'd to page. The parent
> > gets an unchanged copy'd page.
> >
> > The parent gets the AIO completion, but can't see the data.
> >
> > I'd call that a bug with O_DIRECT. The only correct outcome is that
> > the parent will always see the O_DIRECT data. Fork should not cause
> > the *parent* to malfunction. I agree the child cannot make any
> > prediction what memory it will see.
> >
> > I assume the same flow is possible using threads and read()..
> >
> > It is really no different than the RDMA bug with fork.
> >
>
> Yes and that's expected behavior :) If you fork() and have anything
> still in flight at time of fork that can change your process address
> space (including data in it) then all bets are of.
>
> At least this is my reading of fork() syscall.

Not mine.. I can't think of anything else that would have this
behavior.

All traditional syscalls, will properly dirty the pages of the
parent. ie if I call read() in a thread and do fork in another thread,
then not seeing the data after read() completes is clearly a bug. All
other syscalls are the same.

It is bonkers that opening the file with O_DIRECT would change this
basic behavior. I'm calling it a bug :)

Jason

2018-11-12 17:51:15

by Kenneth Lee

[permalink] [raw]
Subject: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

From: Kenneth Lee <[email protected]>

WarpDrive is a general accelerator framework for the user application to
access the hardware without going through the kernel in data path.

The kernel component to provide kernel facility to driver for expose the
user interface is called uacce. It a short name for
"Unified/User-space-access-intended Accelerator Framework".

This patch add document to explain how it works.

Signed-off-by: Kenneth Lee <[email protected]>
---
Documentation/warpdrive/warpdrive.rst | 260 +++++++
Documentation/warpdrive/wd-arch.svg | 764 ++++++++++++++++++++
Documentation/warpdrive/wd.svg | 526 ++++++++++++++
Documentation/warpdrive/wd_q_addr_space.svg | 359 +++++++++
4 files changed, 1909 insertions(+)
create mode 100644 Documentation/warpdrive/warpdrive.rst
create mode 100644 Documentation/warpdrive/wd-arch.svg
create mode 100644 Documentation/warpdrive/wd.svg
create mode 100644 Documentation/warpdrive/wd_q_addr_space.svg

diff --git a/Documentation/warpdrive/warpdrive.rst b/Documentation/warpdrive/warpdrive.rst
new file mode 100644
index 000000000000..ef84d3a2d462
--- /dev/null
+++ b/Documentation/warpdrive/warpdrive.rst
@@ -0,0 +1,260 @@
+Introduction of WarpDrive
+=========================
+
+*WarpDrive* is a general accelerator framework for the user application to
+access the hardware without going through the kernel in data path.
+
+It can be used as the quick channel for accelerators, network adaptors or
+other hardware for application in user space.
+
+This may make some implementation simpler. E.g. you can reuse most of the
+*netdev* driver in kernel and just share some ring buffer to the user space
+driver for *DPDK* [4] or *ODP* [5]. Or you can combine the RSA accelerator with
+the *netdev* in the user space as a https reversed proxy, etc.
+
+*WarpDrive* takes the hardware accelerator as a heterogeneous processor which
+can share particular load from the CPU:
+
+.. image:: wd.svg
+ :alt: WarpDrive Concept
+
+The virtual concept, queue, is used to manage the requests sent to the
+accelerator. The application send requests to the queue by writing to some
+particular address, while the hardware takes the requests directly from the
+address and send feedback accordingly.
+
+The format of the queue may differ from hardware to hardware. But the
+application need not to make any system call for the communication.
+
+*WarpDrive* tries to create a shared virtual address space for all involved
+accelerators. Within this space, the requests sent to queue can refer to any
+virtual address, which will be valid to the application and all involved
+accelerators.
+
+The name *WarpDrive* is simply a cool and general name meaning the framework
+makes the application faster. It includes general user library, kernel
+management module and drivers for the hardware. In kernel, the management
+module is called *uacce*, meaning "Unified/User-space-access-intended
+Accelerator Framework".
+
+
+How does it work
+================
+
+*WarpDrive* uses *mmap* and *IOMMU* to play the trick.
+
+*Uacce* creates a chrdev for the device registered to it. A "queue" will be
+created when the chrdev is opened. The application access the queue by mmap
+different address region of the queue file.
+
+The following figure demonstrated the queue file address space:
+
+.. image:: wd_q_addr_space.svg
+ :alt: WarpDrive Queue Address Space
+
+The first region of the space, device region, is used for the application to
+write request or read answer to or from the hardware.
+
+Normally, there can be three types of device regions mmio and memory regions.
+It is recommended to use common memory for request/answer descriptors and use
+the mmio space for device notification, such as doorbell. But of course, this
+is all up to the interface designer.
+
+There can be two types of device memory regions, kernel-only and user-shared.
+This will be explained in the "kernel APIs" section.
+
+The Static Share Virtual Memory region is necessary only when the device IOMMU
+does not support "Share Virtual Memory". This will be explained after the
+*IOMMU* idea.
+
+
+Architecture
+------------
+
+The full *WarpDrive* architecture is represented in the following class
+diagram:
+
+.. image:: wd-arch.svg
+ :alt: WarpDrive Architecture
+
+
+The user API
+------------
+
+We adopt a polling style interface in the user space: ::
+
+ int wd_request_queue(struct wd_queue *q);
+ void wd_release_queue(struct wd_queue *q);
+
+ int wd_send(struct wd_queue *q, void *req);
+ int wd_recv(struct wd_queue *q, void **req);
+ int wd_recv_sync(struct wd_queue *q, void **req);
+ void wd_flush(struct wd_queue *q);
+
+wd_recv_sync() is a wrapper to its non-sync version. It will trapped into
+kernel and waits until the queue become available.
+
+If the queue do not support SVA/SVM. The following helper function
+can be used to create Static Virtual Share Memory: ::
+
+ void *wd_preserve_share_memory(struct wd_queue *q, size_t size);
+
+The user API is not mandatory. It is simply a suggestion and hint what the
+kernel interface is supposed to support.
+
+
+The user driver
+---------------
+
+The queue file mmap space will need a user driver to wrap the communication
+protocol. *UACCE* provides some attributes in sysfs for the user driver to
+match the right accelerator accordingly.
+
+The *UACCE* device attribute is under the following directory:
+
+/sys/class/uacce/<dev-name>/params
+
+The following attributes is supported:
+
+nr_queue_remained (ro)
+ number of queue remained
+
+api_version (ro)
+ a string to identify the queue mmap space format and its version
+
+device_attr (ro)
+ attributes of the device, see UACCE_DEV_xxx flag defined in uacce.h
+
+numa_node (ro)
+ id of numa node
+
+priority (rw)
+ Priority or the device, bigger is higher
+
+(This is not yet implemented in RFC version)
+
+
+The kernel API
+--------------
+
+The *uacce* kernel API is defined in uacce.h. If the hardware support SVM/SVA,
+The driver need only the following API functions: ::
+
+ int uacce_register(uacce);
+ void uacce_unregister(uacce);
+ void uacce_wake_up(q);
+
+*uacce_wake_up* is used to notify the process who epoll() on the queue file.
+
+According to the IOMMU capability, *uacce* categories the devices as follow:
+
+UACCE_DEV_NOIOMMU
+ The device has no IOMMU. The user process cannot use VA on the hardware
+ This mode is not recommended.
+
+UACCE_DEV_SVA (UACCE_DEV_PASID | UACCE_DEV_FAULT_FROM_DEV)
+ The device has IOMMU which can share the same page table with user
+ process
+
+UACCE_DEV_SHARE_DOMAIN
+ The device has IOMMU which has no multiple page table and device page
+ fault support
+
+If the device works in mode other than UACCE_DEV_NOIOMMU, *uacce* will set its
+IOMMU to IOMMU_DOMAIN_UNMANAGED. So the driver must not use any kernel
+DMA API but the following ones from *uacce* instead: ::
+
+ uacce_dma_map(q, va, size, prot);
+ uacce_dma_unmap(q, va, size, prot);
+
+*uacce_dma_map/unmap* is valid only for UACCE_DEV_SVA device. It creates a
+particular PASID and page table for the kernel in the IOMMU (Not yet
+implemented in the RFC)
+
+For the UACCE_DEV_SHARE_DOMAIN device, uacce_dma_map/unmap is not valid.
+*Uacce* call back start_queue only when the DUS and DKO region is mmapped. The
+accelerator driver must use those dma buffer, via uacce_queue->qfrs[], on
+start_queue call back. The size of the queue file region is defined by
+uacce->ops->qf_pg_start[].
+
+We have to do it this way because most of current IOMMU cannot support the
+kernel and user virtual address at the same time. So we have to let them both
+share the same user virtual address space.
+
+If the device have to support kernel and user at the same time, both kernel
+and the user should use these DMA API. This is not convenient. A better
+solution is to change the future DMA/IOMMU design to let them separate the
+address space between the user and kernel space. But it is not going to be in
+a short time.
+
+
+Multiple processes support
+==========================
+
+In the latest mainline kernel (4.19) when this document is written, the IOMMU
+subsystem do not support multiple process page tables yet.
+
+Most IOMMU hardware implementation support multi-process with the concept
+of PASID. But they may use different name, e.g. it is call sub-stream-id in
+SMMU of ARM. With PASID or similar design, multi page table can be added to
+the IOMMU and referred by its PASID.
+
+*JPB* has a patchset to enable this[1]_. We have tested it with our hardware
+(which is known as *D06*). It works well. *WarpDrive* rely on them to support
+UACCE_DEV_SVA. If it is not enabled, *WarpDrive* can still work. But it
+support only one process, the device will be set to UACCE_DEV_SHARE_DOMAIN
+even it is set to UACCE_DEV_SVA initially.
+
+Static Share Virtual Memory is mainly used by UACCE_DEV_SHARE_DOMAIN device.
+
+
+Legacy Mode Support
+===================
+For the hardware without IOMMU, WarpDrive can still work, the only problem is
+VA cannot be used in the device. The driver should adopt another strategy for
+the shared memory. It is only for testing, and not recommended.
+
+
+The Folk Scenario
+=================
+For a process with allocated queues and shared memory, what happen if it forks
+a child?
+
+The fd of the queue will be duplicated on folk, so the child can send request
+to the same queue as its parent. But the requests which is sent from processes
+except for the one who open the queue will be blocked.
+
+It is recommended to add O_CLOEXEC to the queue file.
+
+The queue mmap space has a VM_DONTCOPY in its VMA. So the child will lost all
+those VMAs.
+
+This is why *WarpDrive* does not adopt the mode used in *VFIO* and *InfiniBand*.
+Both solutions can set any user pointer for hardware sharing. But they cannot
+support fork when the dma is in process. Or the "Copy-On-Write" procedure will
+make the parent process lost its physical pages.
+
+
+The Sample Code
+===============
+There is a sample user land implementation with a simple driver for Hisilicon
+Hi1620 ZIP Accelerator.
+
+To test, do the following in samples/warpdrive (for the case of PC host): ::
+ ./autogen.sh
+ ./conf.sh # or simply ./configure if you build on target system
+ make
+
+Then you can get test_hisi_zip in the test subdirectory. Copy it to the target
+system and make sure the hisi_zip driver is enabled (the major and minor of
+the uacce chrdev can be gotten from the dmesg or sysfs), and run: ::
+ mknod /dev/ua1 c <major> <minior>
+ test/test_hisi_zip -z < data > data.zip
+ test/test_hisi_zip -g < data > data.gzip
+
+
+References
+==========
+.. [1] https://patchwork.kernel.org/patch/10394851/
+
+.. vim: tw=78
diff --git a/Documentation/warpdrive/wd-arch.svg b/Documentation/warpdrive/wd-arch.svg
new file mode 100644
index 000000000000..e59934188443
--- /dev/null
+++ b/Documentation/warpdrive/wd-arch.svg
@@ -0,0 +1,764 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+
+<svg
+ xmlns:dc="http://purl.org/dc/elements/1.1/"
+ xmlns:cc="http://creativecommons.org/ns#"
+ xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+ xmlns:svg="http://www.w3.org/2000/svg"
+ xmlns="http://www.w3.org/2000/svg"
+ xmlns:xlink="http://www.w3.org/1999/xlink"
+ xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+ xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+ width="210mm"
+ height="193mm"
+ viewBox="0 0 744.09449 683.85823"
+ id="svg2"
+ version="1.1"
+ inkscape:version="0.92.3 (2405546, 2018-03-11)"
+ sodipodi:docname="wd-arch.svg">
+ <defs
+ id="defs4">
+ <linearGradient
+ inkscape:collect="always"
+ id="linearGradient6830">
+ <stop
+ style="stop-color:#000000;stop-opacity:1;"
+ offset="0"
+ id="stop6832" />
+ <stop
+ style="stop-color:#000000;stop-opacity:0;"
+ offset="1"
+ id="stop6834" />
+ </linearGradient>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="translate(-89.949614,405.94594)" />
+ <linearGradient
+ inkscape:collect="always"
+ id="linearGradient5026">
+ <stop
+ style="stop-color:#f2f2f2;stop-opacity:1;"
+ offset="0"
+ id="stop5028" />
+ <stop
+ style="stop-color:#f2f2f2;stop-opacity:0;"
+ offset="1"
+ id="stop5030" />
+ </linearGradient>
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6" />
+ </filter>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-1"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="translate(175.77842,400.29111)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-0"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-9" />
+ </filter>
+ <marker
+ markerWidth="18.960653"
+ markerHeight="11.194658"
+ refX="9.4803267"
+ refY="5.5973287"
+ orient="auto"
+ id="marker4613">
+ <rect
+ y="-5.1589785"
+ x="5.8504119"
+ height="10.317957"
+ width="10.317957"
+ id="rect4212"
+ style="fill:#ffffff;stroke:#000000;stroke-width:0.69143367;stroke-miterlimit:4;stroke-dasharray:none"
+ transform="matrix(0.86111274,0.50841405,-0.86111274,0.50841405,0,0)">
+ <title
+ id="title4262">generation</title>
+ </rect>
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3-9"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(1.2452511,0,0,0.98513016,-190.95632,540.33156)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-5-8"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-3-9" />
+ </filter>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-1">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-9"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3-9-7"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(1.3742742,0,0,0.97786398,-234.52617,654.63367)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-5-8-5"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-3-9-0" />
+ </filter>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-6">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-1"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3-9-4"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(1.3742912,0,0,2.0035845,-468.34428,342.56603)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-5-8-54"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-3-9-7" />
+ </filter>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-1-8">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-9-6"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-1-8-8">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-9-6-9"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-0">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-93"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-0-2">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-93-6"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter5382"
+ x="-0.089695387"
+ width="1.1793908"
+ y="-0.10052069"
+ height="1.2010413">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="0.86758925"
+ id="feGaussianBlur5384" />
+ </filter>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient6830"
+ id="linearGradient6836"
+ x1="362.73923"
+ y1="700.04059"
+ x2="340.4751"
+ y2="678.25488"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="translate(-23.771026,-135.76835)" />
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-6-2">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-1-9"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3-9-7-3"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(1.3742742,0,0,0.97786395,-57.357186,649.55786)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-5-8-5-0"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-3-9-0-2" />
+ </filter>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-1-1">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-9-0"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ </defs>
+ <sodipodi:namedview
+ id="base"
+ pagecolor="#ffffff"
+ bordercolor="#666666"
+ borderopacity="1.0"
+ inkscape:pageopacity="0.0"
+ inkscape:pageshadow="2"
+ inkscape:zoom="0.98994949"
+ inkscape:cx="222.32868"
+ inkscape:cy="370.44492"
+ inkscape:document-units="px"
+ inkscape:current-layer="layer1"
+ showgrid="false"
+ inkscape:window-width="1916"
+ inkscape:window-height="1033"
+ inkscape:window-x="0"
+ inkscape:window-y="22"
+ inkscape:window-maximized="0"
+ fit-margin-right="0.3"
+ inkscape:snap-global="false" />
+ <metadata
+ id="metadata7">
+ <rdf:RDF>
+ <cc:Work
+ rdf:about="">
+ <dc:format>image/svg+xml</dc:format>
+ <dc:type
+ rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
+ <dc:title />
+ </cc:Work>
+ </rdf:RDF>
+ </metadata>
+ <g
+ inkscape:label="Layer 1"
+ inkscape:groupmode="layer"
+ id="layer1"
+ transform="translate(0,-368.50374)">
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3)"
+ id="rect4136-3-6"
+ width="101.07784"
+ height="31.998148"
+ x="283.01144"
+ y="588.80896" />
+ <rect
+ style="fill:url(#linearGradient5032);fill-opacity:1;stroke:#000000;stroke-width:0.6465112"
+ id="rect4136-2"
+ width="101.07784"
+ height="31.998148"
+ x="281.63498"
+ y="586.75739" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="294.21747"
+ y="612.50073"
+ id="text4138-6"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1"
+ x="294.21747"
+ y="612.50073"
+ style="font-size:15px;line-height:1.25">WarpDrive</tspan></text>
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-0)"
+ id="rect4136-3-6-3"
+ width="101.07784"
+ height="31.998148"
+ x="548.7395"
+ y="583.15417" />
+ <rect
+ style="fill:url(#linearGradient5032-1);fill-opacity:1;stroke:#000000;stroke-width:0.6465112"
+ id="rect4136-2-60"
+ width="101.07784"
+ height="31.998148"
+ x="547.36304"
+ y="581.1026" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="557.83484"
+ y="602.32745"
+ id="text4138-6-6"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-2"
+ x="557.83484"
+ y="602.32745"
+ style="font-size:15px;line-height:1.25">user_driver</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4613)"
+ d="m 547.36304,600.78954 -156.58203,0.0691"
+ id="path4855"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8)"
+ id="rect4136-3-6-5-7"
+ width="101.07784"
+ height="31.998148"
+ x="128.74678"
+ y="80.648842"
+ transform="matrix(1.2452511,0,0,0.98513016,113.15182,641.02594)" />
+ <rect
+ style="fill:url(#linearGradient5032-3-9);fill-opacity:1;stroke:#000000;stroke-width:0.71606314"
+ id="rect4136-2-6-3"
+ width="125.86729"
+ height="31.522341"
+ x="271.75983"
+ y="718.45435" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="309.13705"
+ y="745.55371"
+ id="text4138-6-2-6"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-1"
+ x="309.13705"
+ y="745.55371"
+ style="font-size:15px;line-height:1.25">uacce</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2)"
+ d="m 329.57309,619.72453 5.0373,97.14447"
+ id="path4661-3"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-1)"
+ d="m 342.57219,830.63108 -5.67699,-79.2841"
+ id="path4661-3-4"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-5)"
+ id="rect4136-3-6-5-7-3"
+ width="101.07784"
+ height="31.998148"
+ x="128.74678"
+ y="80.648842"
+ transform="matrix(1.3742742,0,0,0.97786398,101.09126,754.58534)" />
+ <rect
+ style="fill:url(#linearGradient5032-3-9-7);fill-opacity:1;stroke:#000000;stroke-width:0.74946606"
+ id="rect4136-2-6-3-6"
+ width="138.90866"
+ height="31.289837"
+ x="276.13297"
+ y="831.44263" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="295.67819"
+ y="852.98224"
+ id="text4138-6-2-6-1"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-1-0"
+ x="295.67819"
+ y="852.98224"
+ style="font-size:15px;line-height:1.25">Device Driver</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-6)"
+ d="m 623.05084,615.00104 0.51369,333.80219"
+ id="path4661-3-5"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="392.63568"
+ y="660.83667"
+ id="text4138-6-2-6-1-6-2-5"><tspan
+ sodipodi:role="line"
+ x="392.63568"
+ y="660.83667"
+ id="tspan4305"
+ style="font-size:15px;line-height:1.25">&lt;&lt;anom_file&gt;&gt;</tspan><tspan
+ sodipodi:role="line"
+ x="392.63568"
+ y="679.58667"
+ style="font-size:15px;line-height:1.25"
+ id="tspan1139">Queue FD</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="389.92969"
+ y="587.44836"
+ id="text4138-6-2-6-1-6-2-56"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-1-0-3-0-9"
+ x="389.92969"
+ y="587.44836"
+ style="font-size:15px;line-height:1.25">1</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="528.64813"
+ y="600.08429"
+ id="text4138-6-2-6-1-6-3"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-1-0-3-7"
+ x="528.64813"
+ y="600.08429"
+ style="font-size:15px;line-height:1.25">*</tspan></text>
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-54)"
+ id="rect4136-3-6-5-7-4"
+ width="101.07784"
+ height="31.998148"
+ x="128.74678"
+ y="80.648842"
+ transform="matrix(1.3745874,0,0,1.8929066,-132.7754,556.04505)" />
+ <rect
+ style="fill:url(#linearGradient5032-3-9-4);fill-opacity:1;stroke:#000000;stroke-width:1.07280123"
+ id="rect4136-2-6-3-4"
+ width="138.91039"
+ height="64.111"
+ x="42.321312"
+ y="704.8371" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="110.30745"
+ y="722.94025"
+ id="text4138-6-2-6-3"><tspan
+ sodipodi:role="line"
+ x="111.99202"
+ y="722.94025"
+ id="tspan4366"
+ style="font-size:15px;line-height:1.25;text-align:center;text-anchor:middle">other standard </tspan><tspan
+ sodipodi:role="line"
+ x="110.30745"
+ y="741.69025"
+ id="tspan4368"
+ style="font-size:15px;line-height:1.25;text-align:center;text-anchor:middle">framework</tspan><tspan
+ sodipodi:role="line"
+ x="110.30745"
+ y="760.44025"
+ style="font-size:15px;line-height:1.25;text-align:center;text-anchor:middle"
+ id="tspan6840">(crypto/nic/others)</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-1-8)"
+ d="M 276.29661,849.04109 134.04449,771.90853"
+ id="path4661-3-4-8"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="313.70813"
+ y="730.06366"
+ id="text4138-6-2-6-36"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-1-7"
+ x="313.70813"
+ y="730.06366"
+ style="font-size:10px;line-height:1.25">&lt;&lt;lkm&gt;&gt;</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:start;letter-spacing:0px;word-spacing:0px;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="259.53165"
+ y="797.8056"
+ id="text4138-6-2-6-1-6-2-5-7-5"><tspan
+ sodipodi:role="line"
+ x="259.53165"
+ y="797.8056"
+ style="font-size:15px;line-height:1.25;text-align:start;text-anchor:start"
+ id="tspan2357">uacce register api</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="29.145819"
+ y="833.44244"
+ id="text4138-6-2-6-1-6-2-5-7-5-2"><tspan
+ sodipodi:role="line"
+ x="29.145819"
+ y="833.44244"
+ id="tspan4301"
+ style="font-size:15px;line-height:1.25">register to other subsystem</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="301.20813"
+ y="597.29437"
+ id="text4138-6-2-6-36-1"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-1-7-2"
+ x="301.20813"
+ y="597.29437"
+ style="font-size:10px;line-height:1.25">&lt;&lt;user_lib&gt;&gt;</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="615.9505"
+ y="739.44012"
+ id="text4138-6-2-6-1-6-2-5-3"><tspan
+ sodipodi:role="line"
+ x="615.9505"
+ y="739.44012"
+ id="tspan4274-7"
+ style="font-size:15px;line-height:1.25">mmapped memory r/w interface</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="371.01291"
+ y="529.23682"
+ id="text4138-6-2-6-1-6-2-5-36"><tspan
+ sodipodi:role="line"
+ x="371.01291"
+ y="529.23682"
+ id="tspan4305-3"
+ style="font-size:15px;line-height:1.25">wd user api</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ d="m 328.19325,585.87943 0,-23.57142"
+ id="path4348"
+ inkscape:connector-curvature="0" />
+ <ellipse
+ style="opacity:1;fill:#ffffff;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0"
+ id="path4350"
+ cx="328.01468"
+ cy="551.95081"
+ rx="11.607142"
+ ry="10.357142" />
+ <path
+ style="opacity:0.444;fill:url(#linearGradient6836);fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;filter:url(#filter5382)"
+ id="path4350-2"
+ sodipodi:type="arc"
+ sodipodi:cx="329.44327"
+ sodipodi:cy="553.37933"
+ sodipodi:rx="11.607142"
+ sodipodi:ry="10.357142"
+ sodipodi:start="0"
+ sodipodi:end="6.2509098"
+ d="m 341.05041,553.37933 a 11.607142,10.357142 0 0 1 -11.51349,10.35681 11.607142,10.357142 0 0 1 -11.69928,-10.18967 11.607142,10.357142 0 0 1 11.32469,-10.52124 11.607142,10.357142 0 0 1 11.88204,10.01988"
+ sodipodi:open="true" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="619.67596"
+ y="978.22363"
+ id="text4138-6-2-6-1-6-2-5-36-3"><tspan
+ sodipodi:role="line"
+ x="619.67596"
+ y="978.22363"
+ id="tspan4305-3-67"
+ style="font-size:15px;line-height:1.25">Device(Hardware)</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-6-2)"
+ d="m 347.51164,865.4527 193.91929,99.10053"
+ id="path4661-3-5-1"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-5-0)"
+ id="rect4136-3-6-5-7-3-1"
+ width="101.07784"
+ height="31.998148"
+ x="128.74678"
+ y="80.648842"
+ transform="matrix(1.3742742,0,0,0.97786395,278.26025,749.50952)" />
+ <rect
+ style="fill:url(#linearGradient5032-3-9-7-3);fill-opacity:1;stroke:#000000;stroke-width:0.74946606"
+ id="rect4136-2-6-3-6-0"
+ width="138.90868"
+ height="31.289839"
+ x="453.30197"
+ y="826.36682" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="493.68158"
+ y="847.90643"
+ id="text4138-6-2-6-1-5"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-1-0-1"
+ x="493.68158"
+ y="847.90643"
+ style="font-size:15px;line-height:1.25;stroke-width:1px">IOMMU</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-1-1)"
+ d="m 389.49372,755.46667 111.75324,68.4507"
+ id="path4661-3-4-85"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;text-align:start;letter-spacing:0px;word-spacing:0px;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="427.70282"
+ y="776.91418"
+ id="text4138-6-2-6-1-6-2-5-7-5-0"><tspan
+ sodipodi:role="line"
+ x="427.70282"
+ y="776.91418"
+ style="font-size:15px;line-height:1.25;text-align:start;text-anchor:start;stroke-width:1px"
+ id="tspan2357-6">manage the driver iommu state</tspan></text>
+ </g>
+</svg>
diff --git a/Documentation/warpdrive/wd.svg b/Documentation/warpdrive/wd.svg
new file mode 100644
index 000000000000..87ab92ebfbc6
--- /dev/null
+++ b/Documentation/warpdrive/wd.svg
@@ -0,0 +1,526 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+
+<svg
+ xmlns:dc="http://purl.org/dc/elements/1.1/"
+ xmlns:cc="http://creativecommons.org/ns#"
+ xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+ xmlns:svg="http://www.w3.org/2000/svg"
+ xmlns="http://www.w3.org/2000/svg"
+ xmlns:xlink="http://www.w3.org/1999/xlink"
+ xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+ xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+ width="210mm"
+ height="116mm"
+ viewBox="0 0 744.09449 411.02338"
+ id="svg2"
+ version="1.1"
+ inkscape:version="0.92.3 (2405546, 2018-03-11)"
+ sodipodi:docname="wd.svg">
+ <defs
+ id="defs4">
+ <linearGradient
+ inkscape:collect="always"
+ id="linearGradient5026">
+ <stop
+ style="stop-color:#f2f2f2;stop-opacity:1;"
+ offset="0"
+ id="stop5028" />
+ <stop
+ style="stop-color:#f2f2f2;stop-opacity:0;"
+ offset="1"
+ id="stop5030" />
+ </linearGradient>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(2.7384117,0,0,0.91666329,-952.8283,571.10143)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-5"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-3" />
+ </filter>
+ <marker
+ markerWidth="18.960653"
+ markerHeight="11.194658"
+ refX="9.4803267"
+ refY="5.5973287"
+ orient="auto"
+ id="marker4613">
+ <rect
+ y="-5.1589785"
+ x="5.8504119"
+ height="10.317957"
+ width="10.317957"
+ id="rect4212"
+ style="fill:#ffffff;stroke:#000000;stroke-width:0.69143367;stroke-miterlimit:4;stroke-dasharray:none"
+ transform="matrix(0.86111274,0.50841405,-0.86111274,0.50841405,0,0)">
+ <title
+ id="title4262">generation</title>
+ </rect>
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3-9"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(1.2452511,0,0,0.98513016,-190.95632,540.33156)" />
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-1">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-9"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-6">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-1"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-1-8">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-9-6"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-1-8-8">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-9-6-9"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-0">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-93"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-0-2">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-93-6"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-2-6-2">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-9-1-9"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3-8"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(1.0104674,0,0,1.0052679,-218.642,661.15448)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-5-8"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-3-9" />
+ </filter>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3-8-2"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(2.1450559,0,0,1.0052679,-521.97704,740.76422)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-5-8-5"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-3-9-1" />
+ </filter>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3-8-0"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(1.0104674,0,0,1.0052679,83.456748,660.20747)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-5-8-6"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-3-9-2" />
+ </filter>
+ <linearGradient
+ inkscape:collect="always"
+ xlink:href="#linearGradient5026"
+ id="linearGradient5032-3-84"
+ x1="353"
+ y1="211.3622"
+ x2="565.5"
+ y2="174.8622"
+ gradientUnits="userSpaceOnUse"
+ gradientTransform="matrix(1.9884948,0,0,0.94903536,-318.42665,564.37696)" />
+ <filter
+ inkscape:collect="always"
+ style="color-interpolation-filters:sRGB"
+ id="filter4169-3-5-4"
+ x="-0.031597666"
+ width="1.0631953"
+ y="-0.099812768"
+ height="1.1996255">
+ <feGaussianBlur
+ inkscape:collect="always"
+ stdDeviation="1.3307599"
+ id="feGaussianBlur4171-6-3-0" />
+ </filter>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-0-0">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-93-8"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ <marker
+ markerWidth="11.227358"
+ markerHeight="12.355258"
+ refX="10"
+ refY="6.177629"
+ orient="auto"
+ id="marker4825-6-3">
+ <path
+ inkscape:connector-curvature="0"
+ id="path4757-1-1"
+ d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
+ </marker>
+ </defs>
+ <sodipodi:namedview
+ id="base"
+ pagecolor="#ffffff"
+ bordercolor="#666666"
+ borderopacity="1.0"
+ inkscape:pageopacity="0.0"
+ inkscape:pageshadow="2"
+ inkscape:zoom="0.98994949"
+ inkscape:cx="457.47339"
+ inkscape:cy="250.14781"
+ inkscape:document-units="px"
+ inkscape:current-layer="layer1"
+ showgrid="false"
+ inkscape:window-width="1916"
+ inkscape:window-height="1033"
+ inkscape:window-x="0"
+ inkscape:window-y="22"
+ inkscape:window-maximized="0"
+ fit-margin-right="0.3" />
+ <metadata
+ id="metadata7">
+ <rdf:RDF>
+ <cc:Work
+ rdf:about="">
+ <dc:format>image/svg+xml</dc:format>
+ <dc:type
+ rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
+ <dc:title></dc:title>
+ </cc:Work>
+ </rdf:RDF>
+ </metadata>
+ <g
+ inkscape:label="Layer 1"
+ inkscape:groupmode="layer"
+ id="layer1"
+ transform="translate(0,-641.33861)">
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5)"
+ id="rect4136-3-6-5"
+ width="101.07784"
+ height="31.998148"
+ x="128.74678"
+ y="80.648842"
+ transform="matrix(2.7384116,0,0,0.91666328,-284.06895,664.79751)" />
+ <rect
+ style="fill:url(#linearGradient5032-3);fill-opacity:1;stroke:#000000;stroke-width:1.02430749"
+ id="rect4136-2-6"
+ width="276.79272"
+ height="29.331528"
+ x="64.723419"
+ y="736.84473" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="78.223282"
+ y="756.79803"
+ id="text4138-6-2"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9"
+ x="78.223282"
+ y="756.79803"
+ style="font-size:15px;line-height:1.25">user application (running by the CPU</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6)"
+ d="m 217.67507,876.6738 113.40331,45.0758"
+ id="path4661"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-0)"
+ d="m 208.10197,767.69811 0.29362,76.03656"
+ id="path4661-6"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8)"
+ id="rect4136-3-6-5-3"
+ width="101.07784"
+ height="31.998148"
+ x="128.74678"
+ y="80.648842"
+ transform="matrix(1.0104673,0,0,1.0052679,28.128628,763.90722)" />
+ <rect
+ style="fill:url(#linearGradient5032-3-8);fill-opacity:1;stroke:#000000;stroke-width:0.65159565"
+ id="rect4136-2-6-6"
+ width="102.13586"
+ height="32.16671"
+ x="156.83217"
+ y="842.91852" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="188.58519"
+ y="864.47125"
+ id="text4138-6-2-8"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-0"
+ x="188.58519"
+ y="864.47125"
+ style="font-size:15px;line-height:1.25;stroke-width:1px">MMU</tspan></text>
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-5)"
+ id="rect4136-3-6-5-3-1"
+ width="101.07784"
+ height="31.998148"
+ x="128.74678"
+ y="80.648842"
+ transform="matrix(2.1450556,0,0,1.0052679,1.87637,843.51696)" />
+ <rect
+ style="fill:url(#linearGradient5032-3-8-2);fill-opacity:1;stroke:#000000;stroke-width:0.94937181"
+ id="rect4136-2-6-6-0"
+ width="216.8176"
+ height="32.16671"
+ x="275.09283"
+ y="922.5282" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="347.81482"
+ y="943.23291"
+ id="text4138-6-2-8-8"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-0-5"
+ x="347.81482"
+ y="943.23291"
+ style="font-size:15px;line-height:1.25;stroke-width:1px">Memory</tspan></text>
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-6)"
+ id="rect4136-3-6-5-3-5"
+ width="101.07784"
+ height="31.998148"
+ x="128.74678"
+ y="80.648842"
+ transform="matrix(1.0104673,0,0,1.0052679,330.22737,762.9602)" />
+ <rect
+ style="fill:url(#linearGradient5032-3-8-0);fill-opacity:1;stroke:#000000;stroke-width:0.65159565"
+ id="rect4136-2-6-6-8"
+ width="102.13586"
+ height="32.16671"
+ x="458.93091"
+ y="841.9715" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="490.68393"
+ y="863.52423"
+ id="text4138-6-2-8-6"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-0-2"
+ x="490.68393"
+ y="863.52423"
+ style="font-size:15px;line-height:1.25;stroke-width:1px">IOMMU</tspan></text>
+ <rect
+ style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-4)"
+ id="rect4136-3-6-5-6"
+ width="101.07784"
+ height="31.998148"
+ x="128.74678"
+ y="80.648842"
+ transform="matrix(1.9884947,0,0,0.94903537,167.19229,661.38193)" />
+ <rect
+ style="fill:url(#linearGradient5032-3-84);fill-opacity:1;stroke:#000000;stroke-width:0.88813609"
+ id="rect4136-2-6-2"
+ width="200.99274"
+ height="30.367374"
+ x="420.4675"
+ y="735.97351" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="441.95297"
+ y="755.9068"
+ id="text4138-6-2-9"><tspan
+ sodipodi:role="line"
+ id="tspan4140-1-9-9"
+ x="441.95297"
+ y="755.9068"
+ style="font-size:15px;line-height:1.25;stroke-width:1px">Hardware Accelerator</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-0-0)"
+ d="m 508.2914,766.55885 0.29362,76.03656"
+ id="path4661-6-1"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-3)"
+ d="M 499.70201,876.47297 361.38296,920.80258"
+ id="path4661-1"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ </g>
+</svg>
diff --git a/Documentation/warpdrive/wd_q_addr_space.svg b/Documentation/warpdrive/wd_q_addr_space.svg
new file mode 100644
index 000000000000..5e6cf8e89908
--- /dev/null
+++ b/Documentation/warpdrive/wd_q_addr_space.svg
@@ -0,0 +1,359 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+
+<svg
+ xmlns:dc="http://purl.org/dc/elements/1.1/"
+ xmlns:cc="http://creativecommons.org/ns#"
+ xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+ xmlns:svg="http://www.w3.org/2000/svg"
+ xmlns="http://www.w3.org/2000/svg"
+ xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+ xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+ width="210mm"
+ height="124mm"
+ viewBox="0 0 210 124"
+ version="1.1"
+ id="svg8"
+ inkscape:version="0.92.3 (2405546, 2018-03-11)"
+ sodipodi:docname="wd_q_addr_space.svg">
+ <defs
+ id="defs2">
+ <marker
+ inkscape:stockid="Arrow1Mend"
+ orient="auto"
+ refY="0"
+ refX="0"
+ id="marker5428"
+ style="overflow:visible"
+ inkscape:isstock="true">
+ <path
+ id="path5426"
+ d="M 0,0 5,-5 -12.5,0 5,5 Z"
+ style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
+ transform="matrix(-0.4,0,0,-0.4,-4,0)"
+ inkscape:connector-curvature="0" />
+ </marker>
+ <marker
+ inkscape:isstock="true"
+ style="overflow:visible"
+ id="marker2922"
+ refX="0"
+ refY="0"
+ orient="auto"
+ inkscape:stockid="Arrow1Mend"
+ inkscape:collect="always">
+ <path
+ transform="matrix(-0.4,0,0,-0.4,-4,0)"
+ style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
+ d="M 0,0 5,-5 -12.5,0 5,5 Z"
+ id="path2920"
+ inkscape:connector-curvature="0" />
+ </marker>
+ <marker
+ inkscape:stockid="Arrow1Mstart"
+ orient="auto"
+ refY="0"
+ refX="0"
+ id="Arrow1Mstart"
+ style="overflow:visible"
+ inkscape:isstock="true">
+ <path
+ id="path840"
+ d="M 0,0 5,-5 -12.5,0 5,5 Z"
+ style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
+ transform="matrix(0.4,0,0,0.4,4,0)"
+ inkscape:connector-curvature="0" />
+ </marker>
+ <marker
+ inkscape:stockid="Arrow1Mend"
+ orient="auto"
+ refY="0"
+ refX="0"
+ id="Arrow1Mend"
+ style="overflow:visible"
+ inkscape:isstock="true"
+ inkscape:collect="always">
+ <path
+ id="path843"
+ d="M 0,0 5,-5 -12.5,0 5,5 Z"
+ style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
+ transform="matrix(-0.4,0,0,-0.4,-4,0)"
+ inkscape:connector-curvature="0" />
+ </marker>
+ <marker
+ inkscape:stockid="Arrow1Mstart"
+ orient="auto"
+ refY="0"
+ refX="0"
+ id="Arrow1Mstart-5"
+ style="overflow:visible"
+ inkscape:isstock="true">
+ <path
+ inkscape:connector-curvature="0"
+ id="path840-1"
+ d="M 0,0 5,-5 -12.5,0 5,5 Z"
+ style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
+ transform="matrix(0.4,0,0,0.4,4,0)" />
+ </marker>
+ <marker
+ inkscape:stockid="Arrow1Mend"
+ orient="auto"
+ refY="0"
+ refX="0"
+ id="Arrow1Mend-1"
+ style="overflow:visible"
+ inkscape:isstock="true">
+ <path
+ inkscape:connector-curvature="0"
+ id="path843-0"
+ d="M 0,0 5,-5 -12.5,0 5,5 Z"
+ style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
+ transform="matrix(-0.4,0,0,-0.4,-4,0)" />
+ </marker>
+ <marker
+ inkscape:isstock="true"
+ style="overflow:visible"
+ id="marker2922-2"
+ refX="0"
+ refY="0"
+ orient="auto"
+ inkscape:stockid="Arrow1Mend"
+ inkscape:collect="always">
+ <path
+ transform="matrix(-0.4,0,0,-0.4,-4,0)"
+ style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
+ d="M 0,0 5,-5 -12.5,0 5,5 Z"
+ id="path2920-9"
+ inkscape:connector-curvature="0" />
+ </marker>
+ <marker
+ inkscape:isstock="true"
+ style="overflow:visible"
+ id="marker2922-27"
+ refX="0"
+ refY="0"
+ orient="auto"
+ inkscape:stockid="Arrow1Mend"
+ inkscape:collect="always">
+ <path
+ transform="matrix(-0.4,0,0,-0.4,-4,0)"
+ style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
+ d="M 0,0 5,-5 -12.5,0 5,5 Z"
+ id="path2920-0"
+ inkscape:connector-curvature="0" />
+ </marker>
+ <marker
+ inkscape:isstock="true"
+ style="overflow:visible"
+ id="marker2922-27-8"
+ refX="0"
+ refY="0"
+ orient="auto"
+ inkscape:stockid="Arrow1Mend"
+ inkscape:collect="always">
+ <path
+ transform="matrix(-0.4,0,0,-0.4,-4,0)"
+ style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
+ d="M 0,0 5,-5 -12.5,0 5,5 Z"
+ id="path2920-0-0"
+ inkscape:connector-curvature="0" />
+ </marker>
+ </defs>
+ <sodipodi:namedview
+ id="base"
+ pagecolor="#ffffff"
+ bordercolor="#666666"
+ borderopacity="1.0"
+ inkscape:pageopacity="0.0"
+ inkscape:pageshadow="2"
+ inkscape:zoom="1.4"
+ inkscape:cx="401.66654"
+ inkscape:cy="218.12255"
+ inkscape:document-units="mm"
+ inkscape:current-layer="layer1"
+ showgrid="false"
+ inkscape:window-width="1916"
+ inkscape:window-height="1033"
+ inkscape:window-x="0"
+ inkscape:window-y="22"
+ inkscape:window-maximized="0" />
+ <metadata
+ id="metadata5">
+ <rdf:RDF>
+ <cc:Work
+ rdf:about="">
+ <dc:format>image/svg+xml</dc:format>
+ <dc:type
+ rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
+ <dc:title />
+ </cc:Work>
+ </rdf:RDF>
+ </metadata>
+ <g
+ inkscape:label="Layer 1"
+ inkscape:groupmode="layer"
+ id="layer1"
+ transform="translate(0,-173)">
+ <rect
+ style="opacity:0.82999998;fill:none;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:0.4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:0.82745098"
+ id="rect815"
+ width="21.262758"
+ height="40.350552"
+ x="55.509361"
+ y="195.00098"
+ ry="0" />
+ <rect
+ style="opacity:0.82999998;fill:none;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:0.4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:0.82745098"
+ id="rect815-1"
+ width="21.24276"
+ height="43.732346"
+ x="55.519352"
+ y="235.26543"
+ ry="0" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="50.549229"
+ y="190.6078"
+ id="text1118"><tspan
+ sodipodi:role="line"
+ id="tspan1116"
+ x="50.549229"
+ y="190.6078"
+ style="stroke-width:0.26458332px">queue file address space</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ d="M 76.818568,194.95453 H 97.229281"
+ id="path1126"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ d="M 76.818568,235.20899 H 96.095361"
+ id="path1126-8"
+ inkscape:connector-curvature="0" />
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ d="m 76.762111,278.99778 h 19.27678"
+ id="path1126-0"
+ inkscape:connector-curvature="0" />
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ d="m 55.519355,265.20165 v 19.27678"
+ id="path1126-2"
+ inkscape:connector-curvature="0" />
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ d="m 76.762111,265.20165 v 19.27678"
+ id="path1126-2-1"
+ inkscape:connector-curvature="0" />
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-start:url(#Arrow1Mstart);marker-end:url(#Arrow1Mend)"
+ d="m 87.590896,194.76554 0,39.87648"
+ id="path1126-2-1-0"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-start:url(#Arrow1Mstart-5);marker-end:url(#Arrow1Mend-1)"
+ d="m 82.48822,235.77596 v 42.90029"
+ id="path1126-2-1-0-8"
+ inkscape:connector-curvature="0"
+ sodipodi:nodetypes="cc" />
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker2922)"
+ d="M 44.123633,195.3325 H 55.651907"
+ id="path2912"
+ inkscape:connector-curvature="0" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="32.217381"
+ y="196.27745"
+ id="text2968"><tspan
+ sodipodi:role="line"
+ id="tspan2966"
+ x="32.217381"
+ y="196.27745"
+ style="stroke-width:0.26458332px">offset 0</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="91.199554"
+ y="216.03946"
+ id="text1118-5"><tspan
+ sodipodi:role="line"
+ id="tspan1116-0"
+ x="91.199554"
+ y="216.03946"
+ style="stroke-width:0.26458332px">device region (mapped to device mmio or shared kernel driver memory)</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="86.188072"
+ y="244.50081"
+ id="text1118-5-6"><tspan
+ sodipodi:role="line"
+ id="tspan1116-0-4"
+ x="86.188072"
+ y="244.50081"
+ style="stroke-width:0.26458332px">static share virtual memory region (for device without share virtual memory)</tspan></text>
+ <flowRoot
+ xml:space="preserve"
+ id="flowRoot5699"
+ style="font-style:normal;font-weight:normal;font-size:11.25px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"><flowRegion
+ id="flowRegion5701"><rect
+ id="rect5703"
+ width="5182.8569"
+ height="385.71429"
+ x="34.285713"
+ y="71.09111" /></flowRegion><flowPara
+ id="flowPara5705" /></flowRoot> <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker2922-2)"
+ d="M 43.679028,206.85268 H 55.207302"
+ id="path2912-1"
+ inkscape:connector-curvature="0" />
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker2922-27)"
+ d="M 44.057004,224.23959 H 55.585278"
+ id="path2912-9"
+ inkscape:connector-curvature="0" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="24.139778"
+ y="202.40636"
+ id="text1118-5-3"><tspan
+ sodipodi:role="line"
+ id="tspan1116-0-6"
+ x="24.139778"
+ y="202.40636"
+ style="stroke-width:0.26458332px">device mmio region</tspan></text>
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="17.010948"
+ y="216.73672"
+ id="text1118-5-3-3"><tspan
+ sodipodi:role="line"
+ id="tspan1116-0-6-6"
+ x="17.010948"
+ y="216.73672"
+ style="stroke-width:0.26458332px">device kernel only region</tspan></text>
+ <path
+ style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker2922-27-8)"
+ d="M 43.981087,235.35153 H 55.509361"
+ id="path2912-9-2"
+ inkscape:connector-curvature="0" />
+ <text
+ xml:space="preserve"
+ style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+ x="17.575975"
+ y="230.53285"
+ id="text1118-5-3-3-0"><tspan
+ sodipodi:role="line"
+ id="tspan1116-0-6-6-5"
+ x="17.575975"
+ y="230.53285"
+ style="stroke-width:0.26458332px">device user share region</tspan></text>
+ </g>
+</svg>
--
2.17.1

2018-11-20 13:56:40

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Tue, Nov 20, 2018 at 11:07:02AM +0800, Kenneth Lee wrote:
> On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote:
> > Date: Mon, 19 Nov 2018 11:49:54 -0700
> > From: Jason Gunthorpe <[email protected]>
> > To: Kenneth Lee <[email protected]>
> > CC: Leon Romanovsky <[email protected]>, Kenneth Lee <[email protected]>,
> > Tim Sell <[email protected]>, [email protected], Alexander
> > Shishkin <[email protected]>, Zaibo Xu
> > <[email protected]>, [email protected], [email protected],
> > [email protected], Christoph Lameter <[email protected]>, Hao Fang
> > <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> > list <[email protected]>, Zhou Wang <[email protected]>,
> > Doug Ledford <[email protected]>, Uwe Kleine-König
> > <[email protected]>, David Kershner
> > <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> > Pitchen <[email protected]>, Sagar Dharia
> > <[email protected]>, Jens Axboe <[email protected]>,
> > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > <[email protected]>, [email protected], Vinod Koul
> > <[email protected]>, [email protected], Philippe Ombredanne
> > <[email protected]>, Sanyog Kale <[email protected]>, "David S.
> > Miller" <[email protected]>, [email protected]
> > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > User-Agent: Mutt/1.9.4 (2018-02-28)
> > Message-ID: <[email protected]>
> >
> > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> >
> > > If the hardware cannot share page table with the CPU, we then need to have
> > > some way to change the device page table. This is what happen in ODP. It
> > > invalidates the page table in device upon mmu_notifier call back. But this cannot
> > > solve the COW problem: if the user process A share a page P with device, and A
> > > forks a new process B, and it continue to write to the page. By COW, the
> > > process B will keep the page P, while A will get a new page P'. But you have
> > > no way to let the device know it should use P' rather than P.
> >
> > Is this true? I thought mmu_notifiers covered all these cases.
> >
> > The mm_notifier for A should fire if B causes the physical address of
> > A's pages to change via COW.
> >
> > And this causes the device page tables to re-synchronize.
>
> I don't see such code. The current do_cow_fault() implemenation has nothing to
> do with mm_notifer.

Well, that sure sounds like it would be a bug in mmu_notifiers..

But considering Jean's SVA stuff seems based on mmu notifiers, I have
a hard time believing that it has any different behavior from RDMA's
ODP, and if it does have different behavior, then it is probably just
a bug in the ODP implementation.

> > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it support
> > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you don't need
> > > to write any code for that. Because it has been done by IOMMU framework. If it
> >
> > Looks like the IOMMU code uses mmu_notifier, so it is identical to
> > IB's ODP. The only difference is that IB tends to have the IOMMU page
> > table in the device, not in the CPU.
> >
> > The only case I know if that is different is the new-fangled CAPI
> > stuff where the IOMMU can directly use the CPU's page table and the
> > IOMMU page table (in device or CPU) is eliminated.
>
> Yes. We are not focusing on the current implementation. As mentioned in the
> cover letter. We are expecting Jean Philips' SVA patch:
> git://linux-arm.org/linux-jpb.

This SVA stuff does not look comparable to CAPI as it still requires
maintaining seperate IOMMU page tables.

Also, those patches from Jean have a lot of references to
mmu_notifiers (ie look at iommu_mmu_notifier).

Are you really sure it is actually any different at all?

> > Anyhow, I don't think a single instance of hardware should justify an
> > entire new subsystem. Subsystems are hard to make and without multiple
> > hardware examples there is no way to expect that it would cover any
> > future use cases.
>
> Yes. That's our first expectation. We can keep it with our driver. But because
> there is no user driver support for any accelerator in mainline kernel. Even the
> well known QuickAssit has to be maintained out of tree. So we try to see if
> people is interested in working together to solve the problem.

Well, you should come with patches ack'ed by these other groups.

> > If all your driver needs is to mmap some PCI bar space, route
> > interrupts and do DMA mapping then mediated VFIO is probably a good
> > choice.
>
> Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion and
> try not to add complexity to the mm subsystem.

Why would a mediated VFIO driver touch the mm subsystem? Sounds like
you don't have a VFIO driver if it needs to do stuff like that...

> > If it needs to do a bunch of other stuff, not related to PCI bar
> > space, interrupts and DMA mapping (ie special code for compression,
> > crypto, AI, whatever) then you should probably do what Jerome said and
> > make a drivers/char/hisillicon_foo_bar.c that exposes just what your
> > hardware does.
>
> Yes. If no other accelerator driver writer is interested. That is the
> expectation:)

I don't think it matters what other drivers do.

If your driver does not need any other kernel code then VFIO is
sensible. In this kind of world you will probably have a RDMA-like
userspace driver that can bring this to a common user space API, even
if one driver use VFIO and a different driver uses something else.

> You create some connections (queues) to NIC, RSA, and AI engine. Then you got
> data direct from the NIC and pass the pointer to RSA engine for decryption. The
> CPU then finish some data taking or operation and then pass through to the AI
> engine for CNN calculation....This will need a place to maintain the same
> address space by some means.

How is this any different from what we have today?

SVA is not something even remotely new, IB has been doing various
versions of it for 20 years.

Jason

2018-11-20 05:42:28

by Jerome Glisse

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 11:53:33AM -0700, Jason Gunthorpe wrote:
> On Mon, Nov 19, 2018 at 01:42:16PM -0500, Jerome Glisse wrote:
> > On Mon, Nov 19, 2018 at 11:27:52AM -0700, Jason Gunthorpe wrote:
> > > On Mon, Nov 19, 2018 at 11:48:54AM -0500, Jerome Glisse wrote:
> > >
> > > > Just to comment on this, any infiniband driver which use umem and do
> > > > not have ODP (here ODP for me means listening to mmu notifier so all
> > > > infiniband driver except mlx5) will be affected by same issue AFAICT.
> > > >
> > > > AFAICT there is no special thing happening after fork() inside any of
> > > > those driver. So if parent create a umem mr before fork() and program
> > > > hardware with it then after fork() the parent might start using new
> > > > page for the umem range while the old memory is use by the child. The
> > > > reverse is also true (parent using old memory and child new memory)
> > > > bottom line you can not predict which memory the child or the parent
> > > > will use for the range after fork().
> > > >
> > > > So no matter what you consider the child or the parent, what the hw
> > > > will use for the mr is unlikely to match what the CPU use for the
> > > > same virtual address. In other word:
> > > >
> > > > Before fork:
> > > > CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
> > > > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
> > > >
> > > > Case 1:
> > > > CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
> > > > CPU child: virtual addr ptr1 -> physical address = 0xDEAD
> > > > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
> > > >
> > > > Case 2:
> > > > CPU parent: virtual addr ptr1 -> physical address = 0xBEEF
> > > > CPU child: virtual addr ptr1 -> physical address = 0xCAFE
> > > > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
> > >
> > > IIRC this is solved in IB by automatically calling
> > > madvise(MADV_DONTFORK) before creating the MR.
> > >
> > > MADV_DONTFORK
> > > .. This is useful to prevent copy-on-write semantics from changing the
> > > physical location of a page if the parent writes to it after a
> > > fork(2) ..
> >
> > This would work around the issue but this is not transparent ie
> > range marked with DONTFORK no longer behave as expected from the
> > application point of view.
>
> Do you know what the difference is? The man page really gives no
> hint..
>
> Does it sometimes unmap the pages during fork?

It is handled in kernel/fork.c look for DONTCOPY, basicaly it just
leave empty page table in the child process so child will have to
fault in new page. This also means that child will get 0 as initial
value for all memory address under DONTCOPY/DONTFORK which breaks
application expectation of what fork() do.

>
> I actually wonder if the kernel is a bit broken here, we have the same
> problem with O_DIRECT and other stuff, right?

No it is not, O_DIRECT is fine. The only corner case i can think
of with O_DIRECT is one thread launching an O_DIRECT that write
to private anonymous memory (other O_DIRECT case do not matter)
while another thread call fork() then what the child get can be
undefined ie either it get the data before the O_DIRECT finish
or it gets the result of the O_DIRECT. But this is realy what
you should expect when doing such thing without synchronization.

So O_DIRECT is fine.

>
> Really, if I have a get_user_pages FOLL_WRITE on a page and we fork,
> then shouldn't the COW immediately be broken during the fork?
>
> The kernel can't guarentee that an ongoing DMA will not write to those
> pages, and it breaks the fork semantic to write to both processes.

Fixing that would incur a high cost: need to grow struct page, need
to copy potentialy gigabyte of memory during fork() ... this would be
a serious performance regression for many folks just to work around an
abuse of device driver. So i don't think anything on that front would
be welcome.

umem without proper ODP and VFIO are the only bad user i know of (for
VFIO you can argue that it is part of the API contract and thus that
it is not an abuse but it is not spell out loud in documentation). I
have been trying to push back on any people trying to push thing that
would make the same mistake or at least making sure they understand
what is happening.

What really need to happen is people fixing their hardware and do the
right thing (good software engineer versus evil hardware engineer ;))

Cheers,
J?r?me

2018-11-20 07:59:02

by Jerome Glisse

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 02:26:38PM -0700, Jason Gunthorpe wrote:
> On Mon, Nov 19, 2018 at 03:26:15PM -0500, Jerome Glisse wrote:
> > On Mon, Nov 19, 2018 at 01:11:56PM -0700, Jason Gunthorpe wrote:
> > > On Mon, Nov 19, 2018 at 02:46:32PM -0500, Jerome Glisse wrote:
> > >
> > > > > ?? How can O_DIRECT be fine but RDMA not? They use exactly the same
> > > > > get_user_pages flow, right? Can we do what O_DIRECT does in RDMA and
> > > > > be fine too?
> > > > >
> > > > > AFAIK the only difference is the length of the race window. You'd have
> > > > > to fork and fault during the shorter time O_DIRECT has get_user_pages
> > > > > open.
> > > >
> > > > Well in O_DIRECT case there is only one page table, the CPU
> > > > page table and it gets updated during fork() so there is an
> > > > ordering there and the race window is small.
> > >
> > > Not really, in O_DIRECT case there is another 'page table', we just
> > > call it a DMA scatter/gather list and it is sent directly to the block
> > > device's DMA HW. The sgl plays exactly the same role as the various HW
> > > page list data structures that underly RDMA MRs.
> > >
> > > It is not a page table that matters here, it is if the DMA address of
> > > the page is active for DMA on HW.
> > >
> > > Like you say, the only difference is that the race is hopefully small
> > > with O_DIRECT (though that is not really small, NVMeof for instance
> > > has windows as large as connection timeouts, if you try hard enough)
> > >
> > > So we probably can trigger this trouble with O_DIRECT and fork(), and
> > > I would call it a bug :(
> >
> > I can not think of any scenario that would be a bug with O_DIRECT.
> > Do you have one in mind ? When you fork() and do other syscall that
> > affect the memory of your process in another thread you should
> > expect non consistant results. Kernel is not here to provide a fully
> > safe environement to user, user can shoot itself in the foot and
> > that's fine as long as it only affect the process itself and no one
> > else. We should not be in the business of making everything baby
> > proof :)
>
> Sure, I setup AIO with O_DIRECT and launch a read.
>
> Then I fork and dirty the READ target memory using the CPU in the
> child.
>
> As you described in this case the fork will retain the physical page
> that is undergoing O_DIRECT DMA, and the parent gets a new copy'd page.
>
> The DMA completes, and the child gets the DMA'd to page. The parent
> gets an unchanged copy'd page.
>
> The parent gets the AIO completion, but can't see the data.
>
> I'd call that a bug with O_DIRECT. The only correct outcome is that
> the parent will always see the O_DIRECT data. Fork should not cause
> the *parent* to malfunction. I agree the child cannot make any
> prediction what memory it will see.
>
> I assume the same flow is possible using threads and read()..
>
> It is really no different than the RDMA bug with fork.
>

Yes and that's expected behavior :) If you fork() and have anything
still in flight at time of fork that can change your process address
space (including data in it) then all bets are of.

At least this is my reading of fork() syscall.

Cheers,
J?r?me

2018-11-24 14:58:48

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Fri, Nov 23, 2018 at 11:05:04AM -0700, Jason Gunthorpe wrote:
> Date: Fri, 23 Nov 2018 11:05:04 -0700
> From: Jason Gunthorpe <[email protected]>
> To: Kenneth Lee <[email protected]>
> CC: Leon Romanovsky <[email protected]>, Kenneth Lee <[email protected]>,
> Tim Sell <[email protected]>, [email protected], Alexander
> Shishkin <[email protected]>, Zaibo Xu
> <[email protected]>, [email protected], [email protected],
> [email protected], Christoph Lameter <[email protected]>, Hao Fang
> <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> list <[email protected]>, Zhou Wang <[email protected]>,
> Doug Ledford <[email protected]>, Uwe Kleine-König
> <[email protected]>, David Kershner
> <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> Pitchen <[email protected]>, Sagar Dharia
> <[email protected]>, Jens Axboe <[email protected]>,
> [email protected], linux-netdev <[email protected]>, Randy Dunlap
> <[email protected]>, [email protected], Vinod Koul
> <[email protected]>, [email protected], Philippe Ombredanne
> <[email protected]>, Sanyog Kale <[email protected]>, "David S.
> Miller" <[email protected]>, [email protected]
> Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> User-Agent: Mutt/1.9.4 (2018-02-28)
> Message-ID: <[email protected]>
>
> On Fri, Nov 23, 2018 at 04:02:42PM +0800, Kenneth Lee wrote:
>
> > It is already part of Jean's patchset. And that's why I built my solution on
> > VFIO in the first place. But I think the concept of SVA and PASID is not
> > compatible with the original VFIO concept space. You would not share your whole
> > address space to a device at all in a virtual machine manager,
> > wouldn't you?
>
> Why not? That seems to fit VFIO's space just fine to me.. You might
> need a new upcall to create a full MM registration, but that doesn't
> seem unsuited.

Because the VM manager (such as qemu) do not want to share its whole space to
the device. It is a security problem.

>
> Part of the point here is you should try to make sensible revisions to
> existing subsystems before just inventing a new thing...
>
> VFIO is deeply connected to the IOMMU, so enabling more general IOMMU
> based approache seems perfectly fine to me..
>
> > > Once the VFIO driver knows about this as a generic capability then the
> > > device it exposes to userspace would use CPU addresses instead of DMA
> > > addresses.
> > >
> > > The question is if your driver needs much more than the device
> > > agnostic generic services VFIO provides.
> > >
> > > I'm not sure what you have in mind with resource management.. It is
> > > hard to revoke resources from userspace, unless you are doing
> > > kernel syscalls, but then why do all this?
> >
> > Say, I have 1024 queues in my accelerator. I can get one by opening the device
> > and attach it with the fd. If the process exit by any means, the queue can be
> > returned with the release of the fd. But if it is mdev, it will still be there
> > and some one should tell the allocator it is available again. This is not easy
> > to design in user space.
>
> ?? why wouldn't the mdev track the queues assigned using the existing
> open/close/ioctl callbacks?
>
> That is basic flow I would expect:
>
> open(/dev/vfio)
> ioctl(unity map entire process MM to mdev with IOMMU)
>
> // Create a HQ queue and link the PASID in the HW to this HW queue
> struct hw queue[..];
> ioctl(create HW queue)
>
> // Get BAR doorbell memory for the queue
> bar = mmap()
>
> // Submit work to the queue using CPU addresses
> queue[0] = ...
> writel(bar [..], &queue);
>
> // Queue, SVA, etc is cleaned up when the VFIO closes
> close()

This is not the way that you can use mdev. To use mdev, you have to:

1. unbind kernel driver from the device, and rebind it to vfio driver
2. for 0 to 1204: uuid > /sys/.../the_dev/mdev/create to create all the mdev
3. a virtual iommu_group will be created in /dev/vfio/* from every mdev

now you can do this in you application (even without considering the pasid) :

container = open(/dev/vfio);
ioctl(container, settting);
group = open(/dev/vfio/my_group_for_particular_mdev);
ioctl(container, attach_group, group);
device = ioctl(group, get_device);
mmap(device);
ioctl(container, set_dma_operation);

Then you have to make a decision, how can you find a available mdev for use and
how to return it.

We have considered creating only one mdev and allocating queue when the device
is openned. But the VFIO maintainer, Alex, did not agree and said it broke the
VFIO origin idea.

-Kenneth
>
> Presumably the kernel has to handle the PASID and related for security
> reasons, so they shouldn't go to userspace?
>
> If there is something missing in vfio to do this is it looks pretty
> small to me..
>
> Jason

--
-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!

2018-11-20 04:52:37

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 11:48:54AM -0500, Jerome Glisse wrote:

> Just to comment on this, any infiniband driver which use umem and do
> not have ODP (here ODP for me means listening to mmu notifier so all
> infiniband driver except mlx5) will be affected by same issue AFAICT.
>
> AFAICT there is no special thing happening after fork() inside any of
> those driver. So if parent create a umem mr before fork() and program
> hardware with it then after fork() the parent might start using new
> page for the umem range while the old memory is use by the child. The
> reverse is also true (parent using old memory and child new memory)
> bottom line you can not predict which memory the child or the parent
> will use for the range after fork().
>
> So no matter what you consider the child or the parent, what the hw
> will use for the mr is unlikely to match what the CPU use for the
> same virtual address. In other word:
>
> Before fork:
> CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
> HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
>
> Case 1:
> CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
> CPU child: virtual addr ptr1 -> physical address = 0xDEAD
> HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
>
> Case 2:
> CPU parent: virtual addr ptr1 -> physical address = 0xBEEF
> CPU child: virtual addr ptr1 -> physical address = 0xCAFE
> HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE

IIRC this is solved in IB by automatically calling
madvise(MADV_DONTFORK) before creating the MR.

MADV_DONTFORK
.. This is useful to prevent copy-on-write semantics from changing the
physical location of a page if the parent writes to it after a
fork(2) ..

Jason

2018-11-21 13:33:37

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Tue, Nov 20, 2018 at 07:17:44AM +0200, Leon Romanovsky wrote:
> Date: Tue, 20 Nov 2018 07:17:44 +0200
> From: Leon Romanovsky <[email protected]>
> To: Kenneth Lee <[email protected]>
> CC: Jason Gunthorpe <[email protected]>, Kenneth Lee <[email protected]>, Tim
> Sell <[email protected]>, [email protected], Alexander
> Shishkin <[email protected]>, Zaibo Xu
> <[email protected]>, [email protected], [email protected],
> [email protected], Christoph Lameter <[email protected]>, Hao Fang
> <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> list <[email protected]>, Zhou Wang <[email protected]>,
> Doug Ledford <[email protected]>, Uwe Kleine-König
> <[email protected]>, David Kershner
> <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> Pitchen <[email protected]>, Sagar Dharia
> <[email protected]>, Jens Axboe <[email protected]>,
> [email protected], linux-netdev <[email protected]>, Randy Dunlap
> <[email protected]>, [email protected], Vinod Koul
> <[email protected]>, [email protected], Philippe Ombredanne
> <[email protected]>, Sanyog Kale <[email protected]>, "David S.
> Miller" <[email protected]>, [email protected]
> Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> User-Agent: Mutt/1.10.1 (2018-07-13)
> Message-ID: <[email protected]>
>
> On Tue, Nov 20, 2018 at 11:07:02AM +0800, Kenneth Lee wrote:
> > On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote:
> > > Date: Mon, 19 Nov 2018 11:49:54 -0700
> > > From: Jason Gunthorpe <[email protected]>
> > > To: Kenneth Lee <[email protected]>
> > > CC: Leon Romanovsky <[email protected]>, Kenneth Lee <[email protected]>,
> > > Tim Sell <[email protected]>, [email protected], Alexander
> > > Shishkin <[email protected]>, Zaibo Xu
> > > <[email protected]>, [email protected], [email protected],
> > > [email protected], Christoph Lameter <[email protected]>, Hao Fang
> > > <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> > > list <[email protected]>, Zhou Wang <[email protected]>,
> > > Doug Ledford <[email protected]>, Uwe Kleine-König
> > > <[email protected]>, David Kershner
> > > <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> > > Pitchen <[email protected]>, Sagar Dharia
> > > <[email protected]>, Jens Axboe <[email protected]>,
> > > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > > <[email protected]>, [email protected], Vinod Koul
> > > <[email protected]>, [email protected], Philippe Ombredanne
> > > <[email protected]>, Sanyog Kale <[email protected]>, "David S.
> > > Miller" <[email protected]>, [email protected]
> > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > User-Agent: Mutt/1.9.4 (2018-02-28)
> > > Message-ID: <[email protected]>
> > >
> > > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> > >
> > > > If the hardware cannot share page table with the CPU, we then need to have
> > > > some way to change the device page table. This is what happen in ODP. It
> > > > invalidates the page table in device upon mmu_notifier call back. But this cannot
> > > > solve the COW problem: if the user process A share a page P with device, and A
> > > > forks a new process B, and it continue to write to the page. By COW, the
> > > > process B will keep the page P, while A will get a new page P'. But you have
> > > > no way to let the device know it should use P' rather than P.
> > >
> > > Is this true? I thought mmu_notifiers covered all these cases.
> > >
> > > The mm_notifier for A should fire if B causes the physical address of
> > > A's pages to change via COW.
> > >
> > > And this causes the device page tables to re-synchronize.
> >
> > I don't see such code. The current do_cow_fault() implemenation has nothing to
> > do with mm_notifer.
> >
> > >
> > > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it support
> > > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you don't need
> > > > to write any code for that. Because it has been done by IOMMU framework. If it
> > >
> > > Looks like the IOMMU code uses mmu_notifier, so it is identical to
> > > IB's ODP. The only difference is that IB tends to have the IOMMU page
> > > table in the device, not in the CPU.
> > >
> > > The only case I know if that is different is the new-fangled CAPI
> > > stuff where the IOMMU can directly use the CPU's page table and the
> > > IOMMU page table (in device or CPU) is eliminated.
> > >
> >
> > Yes. We are not focusing on the current implementation. As mentioned in the
> > cover letter. We are expecting Jean Philips' SVA patch:
> > git://linux-arm.org/linux-jpb.
> >
> > > Anyhow, I don't think a single instance of hardware should justify an
> > > entire new subsystem. Subsystems are hard to make and without multiple
> > > hardware examples there is no way to expect that it would cover any
> > > future use cases.
> >
> > Yes. That's our first expectation. We can keep it with our driver. But because
> > there is no user driver support for any accelerator in mainline kernel. Even the
> > well known QuickAssit has to be maintained out of tree. So we try to see if
> > people is interested in working together to solve the problem.
> >
> > >
> > > If all your driver needs is to mmap some PCI bar space, route
> > > interrupts and do DMA mapping then mediated VFIO is probably a good
> > > choice.
> >
> > Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion and
> > try not to add complexity to the mm subsystem.
> >
> > >
> > > If it needs to do a bunch of other stuff, not related to PCI bar
> > > space, interrupts and DMA mapping (ie special code for compression,
> > > crypto, AI, whatever) then you should probably do what Jerome said and
> > > make a drivers/char/hisillicon_foo_bar.c that exposes just what your
> > > hardware does.
> >
> > Yes. If no other accelerator driver writer is interested. That is the
> > expectation:)
> >
> > But we really like to have a public solution here. Consider this scenario:
> >
> > You create some connections (queues) to NIC, RSA, and AI engine. Then you got
> > data direct from the NIC and pass the pointer to RSA engine for decryption. The
> > CPU then finish some data taking or operation and then pass through to the AI
> > engine for CNN calculation....This will need a place to maintain the same
> > address space by some means.
>
> You are using NIC terminology, in the documentation, you wrote that it is needed
> for DPDK use and I don't really understand, why do we need another shiny new
> interface for DPDK.
>

I'm not a DPDK expert. But we had some discussion with LNG of Linaro. They were
considering to create something similar to simplify the user driver. In most of
case, we use DPDK or ODP (open data plane) just for faster data plane data flow.
But many logic such as setting the hardware mode, mac address and so on is not
necessary. So they were looking for a way to keep the driver in the kernel and
just the ring buffer of some queues to the user space. This may simplified the
user space design.

> >
> > It is not complex, but it is helpful.
> >
> > >
> > > If you have networking involved in here then consider RDMA,
> > > particularly if this functionality is already part of the same
> > > hardware that the hns infiniband driver is servicing.
> > >
> > > 'computational MRs' are a reasonable approach to a side-car offload of
> > > already existing RDMA support.
> >
> > OK. Thanks. I will spend some time on it. But personally, I really don't like
> > RDMA's complexity. I cannot even try one single function without a...some
> > expensive hardwares and complexity connection in the lab. This is not like a
> > open source way.
>
> It is not very accurate. We have RXE driver which is virtual RDMA device
> which is implemented purely in SW. It struggles from bad performance and
> sporadic failures, but it is enough to try RDMA on your laptop in VM.

Woo. This will be helpful. Thank you very much.

>
> Thanks
>
> >
> > >
> > > Jason
> >



--
-Kenneth(Hisilicon)

================================================================================
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed above.
Any use of the
information contained herein in any way (including, but not limited to, total or
partial disclosure, reproduction, or dissemination) by persons other than the
intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify
the sender by phone or email immediately and delete it!

2018-11-12 17:53:29

by Kenneth Lee

[permalink] [raw]
Subject: [RFCv3 PATCH 6/6] uacce: add user sample for uacce/warpdrive

From: Kenneth Lee <[email protected]>

This is the sample code to demostrate how WarpDrive user application
should be. (Uacce is the kernel component for WarpDrive.)

It contains:

1. wd.[ch]: the common library to provide WrapDrive interface.
2. wd_adaptor.[ch]: the adaptor for wd to call different user drivers
3. drv/*: the user driver to access the hardware space
4. test/*, the test application

The Hisilicon HIP08 ZIP accelerator is used in this sample.

Signed-off-by: Zaibo Xu <[email protected]>
Signed-off-by: Kenneth Lee <[email protected]>
Signed-off-by: Hao Fang <[email protected]>
Signed-off-by: Zhou Wang <[email protected]>
---
samples/warpdrive/AUTHORS | 3 +
samples/warpdrive/ChangeLog | 1 +
samples/warpdrive/Makefile.am | 9 +
samples/warpdrive/NEWS | 1 +
samples/warpdrive/README | 32 ++++
samples/warpdrive/autogen.sh | 3 +
samples/warpdrive/cleanup.sh | 13 ++
samples/warpdrive/conf.sh | 4 +
samples/warpdrive/configure.ac | 52 ++++++
samples/warpdrive/drv/hisi_qm_udrv.c | 228 +++++++++++++++++++++++++
samples/warpdrive/drv/hisi_qm_udrv.h | 57 +++++++
samples/warpdrive/drv/wd_drv.h | 19 +++
samples/warpdrive/test/Makefile.am | 7 +
samples/warpdrive/test/test_hisi_zip.c | 150 ++++++++++++++++
samples/warpdrive/wd.c | 96 +++++++++++
samples/warpdrive/wd.h | 97 +++++++++++
samples/warpdrive/wd_adapter.c | 71 ++++++++
samples/warpdrive/wd_adapter.h | 36 ++++
18 files changed, 879 insertions(+)
create mode 100644 samples/warpdrive/AUTHORS
create mode 100644 samples/warpdrive/ChangeLog
create mode 100644 samples/warpdrive/Makefile.am
create mode 100644 samples/warpdrive/NEWS
create mode 100644 samples/warpdrive/README
create mode 100755 samples/warpdrive/autogen.sh
create mode 100755 samples/warpdrive/cleanup.sh
create mode 100755 samples/warpdrive/conf.sh
create mode 100644 samples/warpdrive/configure.ac
create mode 100644 samples/warpdrive/drv/hisi_qm_udrv.c
create mode 100644 samples/warpdrive/drv/hisi_qm_udrv.h
create mode 100644 samples/warpdrive/drv/wd_drv.h
create mode 100644 samples/warpdrive/test/Makefile.am
create mode 100644 samples/warpdrive/test/test_hisi_zip.c
create mode 100644 samples/warpdrive/wd.c
create mode 100644 samples/warpdrive/wd.h
create mode 100644 samples/warpdrive/wd_adapter.c
create mode 100644 samples/warpdrive/wd_adapter.h

diff --git a/samples/warpdrive/AUTHORS b/samples/warpdrive/AUTHORS
new file mode 100644
index 000000000000..bb55d2769147
--- /dev/null
+++ b/samples/warpdrive/AUTHORS
@@ -0,0 +1,3 @@
+Kenneth Lee <[email protected]>
+Zaibo Xu <[email protected]>
+Zhou Wang <[email protected]>
diff --git a/samples/warpdrive/ChangeLog b/samples/warpdrive/ChangeLog
new file mode 100644
index 000000000000..b1b716105590
--- /dev/null
+++ b/samples/warpdrive/ChangeLog
@@ -0,0 +1 @@
+init
diff --git a/samples/warpdrive/Makefile.am b/samples/warpdrive/Makefile.am
new file mode 100644
index 000000000000..41154a880a97
--- /dev/null
+++ b/samples/warpdrive/Makefile.am
@@ -0,0 +1,9 @@
+ACLOCAL_AMFLAGS = -I m4
+AUTOMAKE_OPTIONS = foreign subdir-objects
+AM_CFLAGS=-Wall -O0 -fno-strict-aliasing
+
+lib_LTLIBRARIES=libwd.la
+libwd_la_SOURCES=wd.c wd_adapter.c wd.h wd_adapter.h \
+ drv/hisi_qm_udrv.c drv/hisi_qm_udrv.h
+
+SUBDIRS=. test
diff --git a/samples/warpdrive/NEWS b/samples/warpdrive/NEWS
new file mode 100644
index 000000000000..b1b716105590
--- /dev/null
+++ b/samples/warpdrive/NEWS
@@ -0,0 +1 @@
+init
diff --git a/samples/warpdrive/README b/samples/warpdrive/README
new file mode 100644
index 000000000000..3adf66b112fc
--- /dev/null
+++ b/samples/warpdrive/README
@@ -0,0 +1,32 @@
+WD User Land Demonstration
+==========================
+
+This directory contains some applications and libraries to demonstrate how a
+
+WrapDrive application can be constructed.
+
+
+As a demo, we try to make it simple and clear for understanding. It is not
+
+supposed to be used in business scenario.
+
+
+The directory contains the following elements:
+
+wd.[ch]
+ A demonstration WrapDrive fundamental library which wraps the basic
+ operations to the WrapDrive-ed device.
+
+wd_adapter.[ch]
+ User driver adaptor for wd.[ch]
+
+wd_utils.[ch]
+ Some utitlities function used by WD and its drivers
+
+drv/*
+ User drivers. It helps to fulfill the semantic of wd.[ch] for
+ particular hardware
+
+test/*
+ Test applications to use the wrapdrive library
+
diff --git a/samples/warpdrive/autogen.sh b/samples/warpdrive/autogen.sh
new file mode 100755
index 000000000000..58deaf49de2a
--- /dev/null
+++ b/samples/warpdrive/autogen.sh
@@ -0,0 +1,3 @@
+#!/bin/sh -x
+
+autoreconf -i -f -v
diff --git a/samples/warpdrive/cleanup.sh b/samples/warpdrive/cleanup.sh
new file mode 100755
index 000000000000..c5f3d21e5dc1
--- /dev/null
+++ b/samples/warpdrive/cleanup.sh
@@ -0,0 +1,13 @@
+#!/bin/sh
+
+if [ -r Makefile ]; then
+ make distclean
+fi
+
+FILES="aclocal.m4 autom4te.cache compile config.guess config.h.in config.log \
+ config.status config.sub configure cscope.out depcomp install-sh \
+ libsrc/Makefile libsrc/Makefile.in libtool ltmain.sh Makefile \
+ ar-lib m4 \
+ Makefile.in missing src/Makefile src/Makefile.in test/Makefile.in"
+
+rm -vRf $FILES
diff --git a/samples/warpdrive/conf.sh b/samples/warpdrive/conf.sh
new file mode 100755
index 000000000000..2af8a54c5126
--- /dev/null
+++ b/samples/warpdrive/conf.sh
@@ -0,0 +1,4 @@
+ac_cv_func_malloc_0_nonnull=yes ac_cv_func_realloc_0_nonnull=yes ./configure \
+ --host aarch64-linux-gnu \
+ --target aarch64-linux-gnu \
+ --program-prefix aarch64-linux-gnu-
diff --git a/samples/warpdrive/configure.ac b/samples/warpdrive/configure.ac
new file mode 100644
index 000000000000..53262f3197c2
--- /dev/null
+++ b/samples/warpdrive/configure.ac
@@ -0,0 +1,52 @@
+AC_PREREQ([2.69])
+AC_INIT([wrapdrive], [0.1], [[email protected]])
+AC_CONFIG_SRCDIR([wd.c])
+AM_INIT_AUTOMAKE([1.10 no-define])
+
+AC_CONFIG_MACRO_DIR([m4])
+AC_CONFIG_HEADERS([config.h])
+
+# Checks for programs.
+AC_PROG_CXX
+AC_PROG_AWK
+AC_PROG_CC
+AC_PROG_CPP
+AC_PROG_INSTALL
+AC_PROG_LN_S
+AC_PROG_MAKE_SET
+AC_PROG_RANLIB
+
+AM_PROG_AR
+AC_PROG_LIBTOOL
+AM_PROG_LIBTOOL
+LT_INIT
+AM_PROG_CC_C_O
+
+AC_DEFINE([HAVE_SVA], [0], [enable SVA support])
+AC_ARG_ENABLE([sva],
+ [ --enable-sva enable to support sva feature],
+ AC_DEFINE([HAVE_SVA], [1]))
+
+# Checks for libraries.
+
+# Checks for header files.
+AC_CHECK_HEADERS([fcntl.h stdint.h stdlib.h string.h sys/ioctl.h sys/time.h unistd.h])
+
+# Checks for typedefs, structures, and compiler characteristics.
+AC_CHECK_HEADER_STDBOOL
+AC_C_INLINE
+AC_TYPE_OFF_T
+AC_TYPE_SIZE_T
+AC_TYPE_UINT16_T
+AC_TYPE_UINT32_T
+AC_TYPE_UINT64_T
+AC_TYPE_UINT8_T
+
+# Checks for library functions.
+AC_FUNC_MALLOC
+AC_FUNC_MMAP
+AC_CHECK_FUNCS([memset munmap])
+
+AC_CONFIG_FILES([Makefile
+ test/Makefile])
+AC_OUTPUT
diff --git a/samples/warpdrive/drv/hisi_qm_udrv.c b/samples/warpdrive/drv/hisi_qm_udrv.c
new file mode 100644
index 000000000000..5e623f31e2cb
--- /dev/null
+++ b/samples/warpdrive/drv/hisi_qm_udrv.c
@@ -0,0 +1,228 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <stdlib.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <sys/mman.h>
+#include <assert.h>
+#include <string.h>
+#include <stdint.h>
+#include <fcntl.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <sys/epoll.h>
+#include <sys/eventfd.h>
+
+#include "wd_drv.h"
+#include "hisi_qm_udrv.h"
+
+#define QM_SQE_SIZE 128 /* todo: get it from sysfs */
+#define QM_CQE_SIZE 16
+
+#define DOORBELL_CMD_SQ 0
+#define DOORBELL_CMD_CQ 1
+
+/* cqe shift */
+#define CQE_PHASE(cq) (((*((__u32 *)(cq) + 3)) >> 16) & 0x1)
+#define CQE_SQ_NUM(cq) ((*((__u32 *)(cq) + 2)) >> 16)
+#define CQE_SQ_HEAD_INDEX(cq) ((*((__u32 *)(cq) + 2)) & 0xffff)
+
+struct hisi_acc_qm_sqc {
+ __u16 sqn;
+};
+
+struct hisi_qm_queue_info {
+ void *sq_base;
+ void *cq_base;
+ void *doorbell_base;
+ void *dko_base;
+ __u16 sq_tail_index;
+ __u16 sq_head_index;
+ __u16 cq_head_index;
+ __u16 sqn;
+ bool cqc_phase;
+ void *req_cache[QM_Q_DEPTH];
+ int is_sq_full;
+};
+
+int hacc_db(struct hisi_qm_queue_info *q, __u8 cmd, __u16 index, __u8 priority)
+{
+ void *base = q->doorbell_base;
+ __u16 sqn = q->sqn;
+ __u64 doorbell = 0;
+
+ doorbell = (__u64)sqn | ((__u64)cmd << 16);
+ doorbell |= ((__u64)index | ((__u64)priority << 16)) << 32;
+
+ *((__u64 *)base) = doorbell;
+
+ return 0;
+}
+
+static int hisi_qm_fill_sqe(void *msg, struct hisi_qm_queue_info *info, __u16 i)
+{
+ struct hisi_qm_msg *sqe = (struct hisi_qm_msg *)info->sq_base + i;
+
+ memcpy((void *)sqe, msg, sizeof(struct hisi_qm_msg));
+ assert(!info->req_cache[i]);
+ info->req_cache[i] = msg;
+
+ return 0;
+}
+
+static int hisi_qm_recv_sqe(struct hisi_qm_msg *sqe,
+ struct hisi_qm_queue_info *info, __u16 i)
+{
+ __u32 status = sqe->dw3 & 0xff;
+ __u32 type = sqe->dw9 & 0xff;
+
+ if (status != 0 && status != 0x0d) {
+ fprintf(stderr, "bad status (s=%d, t=%d)\n", status, type);
+ return -EIO;
+ }
+
+ assert(info->req_cache[i]);
+ memcpy((void *)info->req_cache[i], sqe, sizeof(struct hisi_qm_msg));
+ return 0;
+}
+
+int hisi_qm_set_queue_dio(struct wd_queue *q)
+{
+ struct hisi_qm_queue_info *info;
+ void *vaddr;
+ int ret;
+
+ alloc_obj(info);
+ if (!info)
+ return -1;
+
+ q->priv = info;
+
+ vaddr = wd_drv_mmap(q, QM_DUS_SIZE, QM_DUS_START);
+ if (vaddr <= 0) {
+ ret = (intptr_t)vaddr;
+ goto err_with_info;
+ }
+ info->sq_base = vaddr;
+ info->cq_base = vaddr + QM_SQE_SIZE * QM_Q_DEPTH;
+
+ vaddr = wd_drv_mmap(q, QM_DOORBELL_SIZE, QM_DOORBELL_START);
+ if (vaddr <= 0) {
+ ret = (intptr_t)vaddr;
+ goto err_with_dus;
+ }
+ info->doorbell_base = vaddr + QM_DOORBELL_OFFSET;
+ info->sq_tail_index = 0;
+ info->sq_head_index = 0;
+ info->cq_head_index = 0;
+ info->cqc_phase = 1;
+ info->is_sq_full = 0;
+
+ vaddr = wd_drv_mmap(q, QM_DKO_SIZE, QM_DKO_START);
+ if (vaddr <= 0) {
+ ret = (intptr_t)vaddr;
+ goto err_with_db;
+ }
+ info->dko_base = vaddr;
+
+ return 0;
+
+err_with_db:
+ munmap(info->doorbell_base - QM_DOORBELL_OFFSET, QM_DOORBELL_SIZE);
+err_with_dus:
+ munmap(info->sq_base, QM_DUS_SIZE);
+err_with_info:
+ free(info);
+ return ret;
+}
+
+void hisi_qm_unset_queue_dio(struct wd_queue *q)
+{
+ struct hisi_qm_queue_info *info = (struct hisi_qm_queue_info *)q->priv;
+
+ munmap(info->dko_base, QM_DKO_SIZE);
+ munmap(info->doorbell_base - QM_DOORBELL_OFFSET, QM_DOORBELL_SIZE);
+ munmap(info->sq_base, QM_DUS_SIZE);
+ free(info);
+ q->priv = NULL;
+}
+
+int hisi_qm_add_to_dio_q(struct wd_queue *q, void *req)
+{
+ struct hisi_qm_queue_info *info = (struct hisi_qm_queue_info *)q->priv;
+ __u16 i;
+
+ if (info->is_sq_full)
+ return -EBUSY;
+
+ i = info->sq_tail_index;
+
+ hisi_qm_fill_sqe(req, q->priv, i);
+
+ mb(); /* make sure the request is all in memory before doorbell*/
+ fprintf(stderr, "fill sqe\n");
+
+ if (i == (QM_Q_DEPTH - 1))
+ i = 0;
+ else
+ i++;
+
+ hacc_db(info, DOORBELL_CMD_SQ, i, 0);
+ fprintf(stderr, "db\n");
+
+ info->sq_tail_index = i;
+
+ if (i == info->sq_head_index)
+ info->is_sq_full = 1;
+
+ return 0;
+}
+
+int hisi_qm_get_from_dio_q(struct wd_queue *q, void **resp)
+{
+ struct hisi_qm_queue_info *info = (struct hisi_qm_queue_info *)q->priv;
+ __u16 i = info->cq_head_index;
+ struct cqe *cq_base = info->cq_base;
+ struct hisi_qm_msg *sq_base = info->sq_base;
+ struct cqe *cqe = cq_base + i;
+ struct hisi_qm_msg *sqe;
+ int ret;
+
+ if (info->cqc_phase == CQE_PHASE(cqe)) {
+ sqe = sq_base + CQE_SQ_HEAD_INDEX(cqe);
+ ret = hisi_qm_recv_sqe(sqe, info, i);
+ if (ret < 0)
+ return -EIO;
+
+ if (info->is_sq_full)
+ info->is_sq_full = 0;
+ } else {
+ return -EAGAIN;
+ }
+
+ *resp = info->req_cache[i];
+ info->req_cache[i] = NULL;
+
+ if (i == (QM_Q_DEPTH - 1)) {
+ info->cqc_phase = !(info->cqc_phase);
+ i = 0;
+ } else
+ i++;
+
+ hacc_db(info, DOORBELL_CMD_CQ, i, 0);
+
+ info->cq_head_index = i;
+ info->sq_head_index = i;
+
+
+ return ret;
+}
+
+void *hisi_qm_preserve_mem(struct wd_queue *q, size_t size)
+{
+ void *mem = wd_drv_mmap(q, size, QM_SS_START);
+
+ if (mem == MAP_FAILED)
+ return NULL;
+ else
+ return mem;
+}
diff --git a/samples/warpdrive/drv/hisi_qm_udrv.h b/samples/warpdrive/drv/hisi_qm_udrv.h
new file mode 100644
index 000000000000..694eb4dd65de
--- /dev/null
+++ b/samples/warpdrive/drv/hisi_qm_udrv.h
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef __HZIP_DRV_H__
+#define __HZIP_DRV_H__
+
+#include <linux/types.h>
+#include "../wd.h"
+#include "../../drivers/crypto/hisilicon/qm_usr_if.h"
+
+/* this is unnecessary big, the hardware should optimize it */
+struct hisi_qm_msg {
+ __u32 consumed;
+ __u32 produced;
+ __u32 comp_date_length;
+ __u32 dw3;
+ __u32 input_date_length;
+ __u32 lba_l;
+ __u32 lba_h;
+ __u32 dw7;
+ __u32 dw8;
+ __u32 dw9;
+ __u32 dw10;
+ __u32 priv_info;
+ __u32 dw12;
+ __u32 tag;
+ __u32 dest_avail_out;
+ __u32 rsvd0;
+ __u32 comp_head_addr_l;
+ __u32 comp_head_addr_h;
+ __u32 source_addr_l;
+ __u32 source_addr_h;
+ __u32 dest_addr_l;
+ __u32 dest_addr_h;
+ __u32 stream_ctx_addr_l;
+ __u32 stream_ctx_addr_h;
+ __u32 cipher_key1_addr_l;
+ __u32 cipher_key1_addr_h;
+ __u32 cipher_key2_addr_l;
+ __u32 cipher_key2_addr_h;
+ __u32 rsvd1[4];
+};
+
+int hisi_qm_set_queue_dio(struct wd_queue *q);
+void hisi_qm_unset_queue_dio(struct wd_queue *q);
+int hisi_qm_add_to_dio_q(struct wd_queue *q, void *req);
+int hisi_qm_get_from_dio_q(struct wd_queue *q, void **resp);
+void *hisi_qm_preserve_mem(struct wd_queue *q, size_t size);
+
+#define QM_DOORBELL_SIZE (QM_DOORBELL_PAGE_NR * PAGE_SIZE)
+#define QM_DKO_SIZE (QM_DKO_PAGE_NR * PAGE_SIZE)
+#define QM_DUS_SIZE (QM_DUS_PAGE_NR * PAGE_SIZE)
+
+#define QM_DOORBELL_START 0
+#define QM_DKO_START (QM_DOORBELL_START + QM_DOORBELL_SIZE)
+#define QM_DUS_START (QM_DKO_START + QM_DKO_SIZE)
+#define QM_SS_START (QM_DUS_START + QM_DUS_SIZE)
+
+#endif
diff --git a/samples/warpdrive/drv/wd_drv.h b/samples/warpdrive/drv/wd_drv.h
new file mode 100644
index 000000000000..66fa8d889e70
--- /dev/null
+++ b/samples/warpdrive/drv/wd_drv.h
@@ -0,0 +1,19 @@
+#ifndef __WD_DRV_H
+#define __WD_DRV_H
+
+#include "wd.h"
+
+#ifndef PAGE_SHIFT
+#define PAGE_SHIFT 12
+#endif
+
+#ifndef PAGE_SIZE
+#define PAGE_SIZE (1 << PAGE_SHIFT)
+#endif
+
+static inline void *wd_drv_mmap(struct wd_queue *q, size_t size, size_t off)
+{
+ return mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, q->fd, off);
+}
+
+#endif
diff --git a/samples/warpdrive/test/Makefile.am b/samples/warpdrive/test/Makefile.am
new file mode 100644
index 000000000000..ad80e80a47d7
--- /dev/null
+++ b/samples/warpdrive/test/Makefile.am
@@ -0,0 +1,7 @@
+AM_CFLAGS=-Wall -O0 -fno-strict-aliasing
+
+bin_PROGRAMS=test_hisi_zip
+
+test_hisi_zip_SOURCES=test_hisi_zip.c
+
+test_hisi_zip_LDADD=../.libs/libwd.a
diff --git a/samples/warpdrive/test/test_hisi_zip.c b/samples/warpdrive/test/test_hisi_zip.c
new file mode 100644
index 000000000000..5e636482d318
--- /dev/null
+++ b/samples/warpdrive/test/test_hisi_zip.c
@@ -0,0 +1,150 @@
+// SPDX-License-Identifier: GPL-2.0+
+#include <stdio.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/time.h>
+#include <unistd.h>
+#include "../wd.h"
+#include "../drv/hisi_qm_udrv.h"
+
+#if defined(MSDOS) || defined(OS2) || defined(WIN32) || defined(__CYGWIN__)
+# include <fcntl.h>
+# include <io.h>
+# define SET_BINARY_MODE(file) setmode(fileno(file), O_BINARY)
+#else
+# define SET_BINARY_MODE(file)
+#endif
+
+#define SYS_ERR_COND(cond, msg) \
+do { \
+ if (cond) { \
+ perror(msg); \
+ exit(EXIT_FAILURE); \
+ } \
+} while (0)
+
+#define ZLIB 0
+#define GZIP 1
+
+int hizip_deflate(FILE *source, FILE *dest, int type)
+{
+ struct hisi_qm_msg *msg, *recv_msg;
+ struct wd_queue q;
+ __u64 in, out;
+ void *a;
+ char *src, *dst;
+ int ret, total_len, output_num, fd;
+ size_t sz;
+
+ q.dev_path = "/dev/ua1";
+ strncpy(q.hw_type, "hisi_qm_v1", PATH_STR_SIZE);
+ ret = wd_request_queue(&q);
+ SYS_ERR_COND(ret, "wd_request_queue");
+
+ fd = fileno(source);
+ struct stat s;
+
+ if (fstat(fd, &s) < 0)
+ SYS_ERR_COND(-1, "fstat");
+ total_len = s.st_size;
+
+ SYS_ERR_COND(!total_len, "input file length zero");
+
+ SYS_ERR_COND(total_len > 16 * 1024 * 1024,
+ "totoal_len > 16MB)!");
+
+ a = wd_reserve_memory(&q, total_len * 2);
+ SYS_ERR_COND(!a, "memory reserved!");
+
+ fprintf(stderr, "a=%lx\n", (unsigned long)a);
+ memset(a, 0, total_len * 2);
+
+ src = (char *)a;
+ dst = (char *)a + total_len;
+
+ sz = fread(src, 1, total_len, source);
+ SYS_ERR_COND(sz != total_len, "read fail");
+
+ msg = malloc(sizeof(*msg));
+ SYS_ERR_COND(!msg, "alloc msg");
+ memset((void *)msg, 0, sizeof(*msg));
+ msg->input_date_length = total_len;
+ if (type == ZLIB)
+ msg->dw9 = 2;
+ else
+ msg->dw9 = 3;
+ msg->dest_avail_out = 0x800000;
+
+ in = (__u64)src;
+ out = (__u64)dst;
+
+ msg->source_addr_l = in & 0xffffffff;
+ msg->source_addr_h = in >> 32;
+ msg->dest_addr_l = out & 0xffffffff;
+ msg->dest_addr_h = out >> 32;
+
+ ret = wd_send(&q, msg);
+ if (ret == -EBUSY) {
+ usleep(1);
+ goto recv_again;
+ }
+ SYS_ERR_COND(ret, "send");
+
+recv_again:
+ ret = wd_recv(&q, (void **)&recv_msg);
+ SYS_ERR_COND(ret == -EIO, "wd_recv");
+
+ if (ret == -EAGAIN)
+ goto recv_again;
+
+ output_num = recv_msg->produced;
+ /* add zlib compress head and write head + compressed date to a file */
+ char zip_head[2] = {0x78, 0x9c};
+
+ fwrite(zip_head, 1, 2, dest);
+ fwrite((char *)out, 1, output_num, dest);
+ fclose(dest);
+ free(msg);
+ wd_release_queue(&q);
+ return 0;
+}
+
+int main(int argc, char *argv[])
+{
+ int alg_type = 0;
+
+ /* avoid end-of-line conversions */
+ SET_BINARY_MODE(stdin);
+ SET_BINARY_MODE(stdout);
+
+ if (!argv[1]) {
+ fputs("<<use ./test_hisi_zip -h get more details>>\n", stderr);
+ goto EXIT;
+ }
+
+ if (!strcmp(argv[1], "-z"))
+ alg_type = ZLIB;
+ else if (!strcmp(argv[1], "-g")) {
+ alg_type = GZIP;
+ } else if (!strcmp(argv[1], "-h")) {
+ fputs("[version]:1.0.2\n", stderr);
+ fputs("[usage]: ./test_hisi_zip [type] <src_file> dest_file\n",
+ stderr);
+ fputs(" [type]:\n", stderr);
+ fputs(" -z = zlib\n", stderr);
+ fputs(" -g = gzip\n", stderr);
+ fputs(" -h = usage\n", stderr);
+ fputs("Example:\n", stderr);
+ fputs("./test_hisi_zip -z < test.data > out.data\n", stderr);
+ goto EXIT;
+ } else {
+ fputs("Unknown option\n", stderr);
+ fputs("<<use ./test_comp_iommu -h get more details>>\n",
+ stderr);
+ goto EXIT;
+ }
+
+ hizip_deflate(stdin, stdout, alg_type);
+EXIT:
+ return EXIT_SUCCESS;
+}
diff --git a/samples/warpdrive/wd.c b/samples/warpdrive/wd.c
new file mode 100644
index 000000000000..559314a13e38
--- /dev/null
+++ b/samples/warpdrive/wd.c
@@ -0,0 +1,96 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "config.h"
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/queue.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <errno.h>
+#include <sys/mman.h>
+#include <string.h>
+#include <assert.h>
+#include <dirent.h>
+#include <sys/poll.h>
+#include "wd.h"
+#include "wd_adapter.h"
+
+int wd_request_queue(struct wd_queue *q)
+{
+ int ret;
+
+ q->fd = open(q->dev_path, O_RDWR | O_CLOEXEC);
+ if (q->fd == -1)
+ return -ENODEV;
+
+ ret = drv_open(q);
+ if (ret)
+ goto err_with_fd;
+
+ return 0;
+
+err_with_fd:
+ close(q->fd);
+ return ret;
+}
+
+void wd_release_queue(struct wd_queue *q)
+{
+ drv_close(q);
+ close(q->fd);
+}
+
+int wd_send(struct wd_queue *q, void *req)
+{
+ return drv_send(q, req);
+}
+
+int wd_recv(struct wd_queue *q, void **resp)
+{
+ return drv_recv(q, resp);
+}
+
+static int wd_wait(struct wd_queue *q, __u16 ms)
+{
+ struct pollfd fds[1];
+ int ret;
+
+ fds[0].fd = q->fd;
+ fds[0].events = POLLIN;
+ ret = poll(fds, 1, ms);
+ if (ret == -1)
+ return -errno;
+
+ return 0;
+}
+
+int wd_recv_sync(struct wd_queue *q, void **resp, __u16 ms)
+{
+ int ret;
+
+ while (1) {
+ ret = wd_recv(q, resp);
+ if (ret == -EBUSY) {
+ ret = wd_wait(q, ms);
+ if (ret)
+ return ret;
+ } else
+ return ret;
+ }
+}
+
+void wd_flush(struct wd_queue *q)
+{
+ drv_flush(q);
+}
+
+void *wd_reserve_memory(struct wd_queue *q, size_t size)
+{
+ return drv_reserve_mem(q, size);
+}
+
+int wd_share_preserved_memory(struct wd_queue *q, struct wd_queue *target_q)
+{
+ return ioctl(q->fd, UACCE_CMD_SHARE_SVAS, target_q->fd);
+}
diff --git a/samples/warpdrive/wd.h b/samples/warpdrive/wd.h
new file mode 100644
index 000000000000..4c0ecfebdf14
--- /dev/null
+++ b/samples/warpdrive/wd.h
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef __WD_H
+#define __WD_H
+#include <stdlib.h>
+#include <errno.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <limits.h>
+#include "../../include/uapi/linux/uacce.h"
+
+#define SYS_VAL_SIZE 16
+#define PATH_STR_SIZE 256
+#define WD_NAME_SIZE 64
+
+typedef int bool;
+
+#ifndef true
+#define true 1
+#endif
+
+#ifndef false
+#define false 0
+#endif
+
+#ifndef WD_ERR
+#define WD_ERR(format, args...) fprintf(stderr, format, ##args)
+#endif
+
+#if defined(__AARCH64_CMODEL_SMALL__) && __AARCH64_CMODEL_SMALL__
+
+#define dsb(opt) asm volatile("dsb " #opt : : : "memory")
+#define rmb() dsb(ld)
+#define wmb() dsb(st)
+#define mb() dsb(sy)
+
+#else
+
+#define rmb()
+#define wmb()
+#define mb()
+#error "no platform mb, define one before compiling"
+
+#endif
+
+static inline void wd_reg_write(void *reg_addr, uint32_t value)
+{
+ *((volatile uint32_t *)reg_addr) = value;
+ wmb();
+}
+
+static inline uint32_t wd_reg_read(void *reg_addr)
+{
+ uint32_t temp;
+
+ temp = *((volatile uint32_t *)reg_addr);
+ rmb();
+
+ return temp;
+}
+
+#define WD_CAPA_PRIV_DATA_SIZE 64
+
+#define alloc_obj(objp) do { \
+ objp = malloc(sizeof(*objp)); \
+ memset(objp, 0, sizeof(*objp)); \
+} while (0)
+
+#define free_obj(objp) do { \
+ if (objp) \
+ free(objp); \
+} while (0)
+
+struct wd_queue {
+ char hw_type[PATH_STR_SIZE];
+ int hw_type_id;
+ void *priv; /* private data used by the drv layer */
+ int fd;
+ int iommu_type;
+ char *dev_path;
+};
+
+extern int wd_request_queue(struct wd_queue *q);
+extern void wd_release_queue(struct wd_queue *q);
+extern int wd_send(struct wd_queue *q, void *req);
+extern int wd_recv(struct wd_queue *q, void **resp);
+extern void wd_flush(struct wd_queue *q);
+extern int wd_recv_sync(struct wd_queue *q, void **resp, __u16 ms);
+extern void *wd_reserve_memory(struct wd_queue *q, size_t size);
+extern int wd_share_reserved_memory(struct wd_queue *q,
+ struct wd_queue *target_q);
+
+#endif
diff --git a/samples/warpdrive/wd_adapter.c b/samples/warpdrive/wd_adapter.c
new file mode 100644
index 000000000000..5af7254c37a4
--- /dev/null
+++ b/samples/warpdrive/wd_adapter.c
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <stdio.h>
+#include <string.h>
+#include <dirent.h>
+
+
+#include "wd_adapter.h"
+#include "./drv/hisi_qm_udrv.h"
+#include "./drv/wd_drv.h"
+
+static struct wd_drv_dio_if hw_dio_tbl[] = { {
+ .hw_type = "hisi_qm_v1",
+ .ss_offset = QM_SS_START,
+ .open = hisi_qm_set_queue_dio,
+ .close = hisi_qm_unset_queue_dio,
+ .send = hisi_qm_add_to_dio_q,
+ .recv = hisi_qm_get_from_dio_q,
+ },
+ /* Add other drivers direct IO operations here */
+};
+
+/* todo: there should be some stable way to match the device and the driver */
+#define MAX_HW_TYPE (sizeof(hw_dio_tbl) / sizeof(hw_dio_tbl[0]))
+
+int drv_open(struct wd_queue *q)
+{
+ int i;
+
+ //todo: try to find another dev if the user driver is not available
+ for (i = 0; i < MAX_HW_TYPE; i++) {
+ if (!strcmp(q->hw_type,
+ hw_dio_tbl[i].hw_type)) {
+ q->hw_type_id = i;
+ return hw_dio_tbl[q->hw_type_id].open(q);
+ }
+ }
+ WD_ERR("No matching driver to use!\n");
+ errno = ENODEV;
+ return -ENODEV;
+}
+
+void drv_close(struct wd_queue *q)
+{
+ hw_dio_tbl[q->hw_type_id].close(q);
+}
+
+int drv_send(struct wd_queue *q, void *req)
+{
+ return hw_dio_tbl[q->hw_type_id].send(q, req);
+}
+
+int drv_recv(struct wd_queue *q, void **req)
+{
+ return hw_dio_tbl[q->hw_type_id].recv(q, req);
+}
+
+void drv_flush(struct wd_queue *q)
+{
+ if (hw_dio_tbl[q->hw_type_id].flush)
+ hw_dio_tbl[q->hw_type_id].flush(q);
+}
+
+void *drv_reserve_mem(struct wd_queue *q, size_t size)
+{
+ void *mem = wd_drv_mmap(q, size, hw_dio_tbl[q->hw_type_id].ss_offset);
+
+ if (mem == MAP_FAILED)
+ return NULL;
+
+ return mem;
+}
diff --git a/samples/warpdrive/wd_adapter.h b/samples/warpdrive/wd_adapter.h
new file mode 100644
index 000000000000..914cba86198c
--- /dev/null
+++ b/samples/warpdrive/wd_adapter.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* the common drv header define the unified interface for wd */
+#ifndef __WD_ADAPTER_H__
+#define __WD_ADAPTER_H__
+
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+
+
+#include "wd.h"
+
+struct wd_drv_dio_if {
+ char *hw_type;
+ size_t ss_offset;
+ int (*open)(struct wd_queue *q);
+ void (*close)(struct wd_queue *q);
+ int (*send)(struct wd_queue *q, void *req);
+ int (*recv)(struct wd_queue *q, void **req);
+ void (*flush)(struct wd_queue *q);
+};
+
+extern int drv_open(struct wd_queue *q);
+extern void drv_close(struct wd_queue *q);
+extern int drv_send(struct wd_queue *q, void *req);
+extern int drv_recv(struct wd_queue *q, void **req);
+extern void drv_flush(struct wd_queue *q);
+extern void *drv_reserve_mem(struct wd_queue *q, size_t size);
+
+#endif
--
2.17.1

2018-11-27 13:47:25

by Kenneth Lee

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Tue, Nov 20, 2018 at 10:30:55AM +0800, Kenneth Lee wrote:
> Date: Tue, 20 Nov 2018 10:30:55 +0800
> From: Kenneth Lee <[email protected]>
> To: Leon Romanovsky <[email protected]>
> CC: Tim Sell <[email protected]>, [email protected],
> Alexander Shishkin <[email protected]>, Zaibo Xu
> <[email protected]>, [email protected], [email protected],
> [email protected], Christoph Lameter <[email protected]>, Hao Fang
> <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> list <[email protected]>, Zhou Wang <[email protected]>,
> Jason Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> Kleine-König <[email protected]>, David Kershner
> <[email protected]>, Kenneth Lee <[email protected]>, Johan
> Hovold <[email protected]>, Jerome Glisse <[email protected]>, Cyrille
> Pitchen <[email protected]>, Sagar Dharia
> <[email protected]>, Jens Axboe <[email protected]>,
> [email protected], linux-netdev <[email protected]>, Randy Dunlap
> <[email protected]>, [email protected], Vinod Koul
> <[email protected]>, [email protected], Philippe Ombredanne
> <[email protected]>, Sanyog Kale <[email protected]>, "David S.
> Miller" <[email protected]>, [email protected]
> Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> User-Agent: Mutt/1.5.21 (2010-09-15)
> Message-ID: <20181120023055.GG157308@Turing-Arch-b>
>
> On Mon, Nov 19, 2018 at 12:48:01PM +0200, Leon Romanovsky wrote:
> > Date: Mon, 19 Nov 2018 12:48:01 +0200
> > From: Leon Romanovsky <[email protected]>
> > To: Kenneth Lee <[email protected]>
> > CC: Tim Sell <[email protected]>, [email protected],
> > Alexander Shishkin <[email protected]>, Zaibo Xu
> > <[email protected]>, [email protected], [email protected],
> > [email protected], Christoph Lameter <[email protected]>, Hao Fang
> > <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> > list <[email protected]>, Vinod Koul <[email protected]>, Jason
> > Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> > Kleine-König <[email protected]>, David Kershner
> > <[email protected]>, Kenneth Lee <[email protected]>, Johan
> > Hovold <[email protected]>, Cyrille Pitchen
> > <[email protected]>, Sagar Dharia
> > <[email protected]>, Jens Axboe <[email protected]>,
> > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > <[email protected]>, [email protected], Zhou Wang
> > <[email protected]>, [email protected], Philippe
> > Ombredanne <[email protected]>, Sanyog Kale <[email protected]>,
> > "David S. Miller" <[email protected]>,
> > [email protected], Jerome Glisse <[email protected]>
> > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > User-Agent: Mutt/1.10.1 (2018-07-13)
> > Message-ID: <[email protected]>
> >
> > On Mon, Nov 19, 2018 at 05:19:10PM +0800, Kenneth Lee wrote:
> > > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> > > > Date: Mon, 19 Nov 2018 17:14:05 +0800
> > > > From: Kenneth Lee <[email protected]>
> > > > To: Leon Romanovsky <[email protected]>
> > > > CC: Tim Sell <[email protected]>, [email protected],
> > > > Alexander Shishkin <[email protected]>, Zaibo Xu
> > > > <[email protected]>, [email protected], [email protected],
> > > > [email protected], Christoph Lameter <[email protected]>, Hao Fang
> > > > <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> > > > list <[email protected]>, Vinod Koul <[email protected]>, Jason
> > > > Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> > > > Kleine-König <[email protected]>, David Kershner
> > > > <[email protected]>, Kenneth Lee <[email protected]>, Johan
> > > > Hovold <[email protected]>, Cyrille Pitchen
> > > > <[email protected]>, Sagar Dharia
> > > > <[email protected]>, Jens Axboe <[email protected]>,
> > > > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > > > <[email protected]>, [email protected], Zhou Wang
> > > > <[email protected]>, [email protected], Philippe
> > > > Ombredanne <[email protected]>, Sanyog Kale <[email protected]>,
> > > > "David S. Miller" <[email protected]>,
> > > > [email protected]
> > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > > User-Agent: Mutt/1.5.21 (2010-09-15)
> > > > Message-ID: <20181119091405.GE157308@Turing-Arch-b>
> > > >
> > > > On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote:
> > > > > Date: Thu, 15 Nov 2018 16:54:55 +0200
> > > > > From: Leon Romanovsky <[email protected]>
> > > > > To: Kenneth Lee <[email protected]>
> > > > > CC: Kenneth Lee <[email protected]>, Tim Sell <[email protected]>,
> > > > > [email protected], Alexander Shishkin
> > > > > <[email protected]>, Zaibo Xu <[email protected]>,
> > > > > [email protected], [email protected], [email protected],
> > > > > Christoph Lameter <[email protected]>, Hao Fang <[email protected]>, Gavin
> > > > > Schenk <[email protected]>, RDMA mailing list
> > > > > <[email protected]>, Zhou Wang <[email protected]>, Jason
> > > > > Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> > > > > Kleine-König <[email protected]>, David Kershner
> > > > > <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> > > > > Pitchen <[email protected]>, Sagar Dharia
> > > > > <[email protected]>, Jens Axboe <[email protected]>,
> > > > > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > > > > <[email protected]>, [email protected], Vinod Koul
> > > > > <[email protected]>, [email protected], Philippe Ombredanne
> > > > > <[email protected]>, Sanyog Kale <[email protected]>, "David S.
> > > > > Miller" <[email protected]>, [email protected]
> > > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > > > User-Agent: Mutt/1.10.1 (2018-07-13)
> > > > > Message-ID: <[email protected]>
> > > > >
> > > > > On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote:
> > > > > > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote:
> > > > > > > Date: Wed, 14 Nov 2018 18:00:17 +0200
> > > > > > > From: Leon Romanovsky <[email protected]>
> > > > > > > To: Kenneth Lee <[email protected]>
> > > > > > > CC: Tim Sell <[email protected]>, [email protected],
> > > > > > > Alexander Shishkin <[email protected]>, Zaibo Xu
> > > > > > > <[email protected]>, [email protected], [email protected],
> > > > > > > [email protected], Christoph Lameter <[email protected]>, Hao Fang
> > > > > > > <[email protected]>, Gavin Schenk <[email protected]>, RDMA mailing
> > > > > > > list <[email protected]>, Zhou Wang <[email protected]>,
> > > > > > > Jason Gunthorpe <[email protected]>, Doug Ledford <[email protected]>, Uwe
> > > > > > > Kleine-König <[email protected]>, David Kershner
> > > > > > > <[email protected]>, Johan Hovold <[email protected]>, Cyrille
> > > > > > > Pitchen <[email protected]>, Sagar Dharia
> > > > > > > <[email protected]>, Jens Axboe <[email protected]>,
> > > > > > > [email protected], linux-netdev <[email protected]>, Randy Dunlap
> > > > > > > <[email protected]>, [email protected], Vinod Koul
> > > > > > > <[email protected]>, [email protected], Philippe Ombredanne
> > > > > > > <[email protected]>, Sanyog Kale <[email protected]>, Kenneth Lee
> > > > > > > <[email protected]>, "David S. Miller" <[email protected]>,
> > > > > > > [email protected]
> > > > > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > > > > > User-Agent: Mutt/1.10.1 (2018-07-13)
> > > > > > > Message-ID: <[email protected]>
> > > > > > >
> > > > > > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote:
> > > > > > > >
> > > > > > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道:
> > > > > > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote:
> > > > > > > > > > From: Kenneth Lee <[email protected]>
> > > > > > > > > >
> > > > > > > > > > WarpDrive is a general accelerator framework for the user application to
> > > > > > > > > > access the hardware without going through the kernel in data path.
> > > > > > > > > >
> > > > > > > > > > The kernel component to provide kernel facility to driver for expose the
> > > > > > > > > > user interface is called uacce. It a short name for
> > > > > > > > > > "Unified/User-space-access-intended Accelerator Framework".
> > > > > > > > > >
> > > > > > > > > > This patch add document to explain how it works.
> > > > > > > > > + RDMA and netdev folks
> > > > > > > > >
> > > > > > > > > Sorry, to be late in the game, I don't see other patches, but from
> > > > > > > > > the description below it seems like you are reinventing RDMA verbs
> > > > > > > > > model. I have hard time to see the differences in the proposed
> > > > > > > > > framework to already implemented in drivers/infiniband/* for the kernel
> > > > > > > > > space and for the https://github.com/linux-rdma/rdma-core/ for the user
> > > > > > > > > space parts.
> > > > > > > >
> > > > > > > > Thanks Leon,
> > > > > > > >
> > > > > > > > Yes, we tried to solve similar problem in RDMA. We also learned a lot from
> > > > > > > > the exist code of RDMA. But we we have to make a new one because we cannot
> > > > > > > > register accelerators such as AI operation, encryption or compression to the
> > > > > > > > RDMA framework:)
> > > > > > >
> > > > > > > Assuming that you did everything right and still failed to use RDMA
> > > > > > > framework, you was supposed to fix it and not to reinvent new exactly
> > > > > > > same one. It is how we develop kernel, by reusing existing code.
> > > > > >
> > > > > > Yes, but we don't force other system such as NIC or GPU into RDMA, do we?
> > > > >
> > > > > You don't introduce new NIC or GPU, but proposing another interface to
> > > > > directly access HW memory and bypass kernel for the data path. This is
> > > > > whole idea of RDMA and this is why it is already present in the kernel.
> > > > >
> > > > > Various hardware devices are supported in our stack allow a ton of crazy
> > > > > stuff, including GPUs interconnections and NIC functionalities.
> > > >
> > > > Yes. We don't want to invent new wheel. That is why we did it behind VFIO in RFC
> > > > v1 and v2. But finally we were persuaded by Mr. Jerome Glisse that VFIO was not
> > > > a good place to solve the problem.
> >
> > I saw a couple of his responses, he constantly said to you that you are
> > reinventing the wheel.
> > https://lore.kernel.org/lkml/[email protected]/
> >
>
> No. I think he asked me did not create trouble in VFIO but just use common
> interface from dma_buf and iommu itself. That is exactly what I am doing.
>
> > > >
> > > > And currently, as you see, IB is bound with devices doing RDMA. The register
> > > > function, ib_register_device() hint that it is a netdev (get_netdev() callback), it know
> > > > about gid, pkey, and Memory Window. IB is not simply a address space management
> > > > framework. And verbs to IB are not transparent. If we start to add
> > > > compression/decompression, AI (RNN, CNN stuff) operations, and encryption/decryption
> > > > to the verbs set. It will become very complexity. Or maybe I misunderstand the
> > > > IB idea? But I don't see compression hardware is integrated in the mainline
> > > > Kernel. Could you directly point out which one I can used as a reference?
> > > >
> >
> > I strongly advise you to read the code, not all drivers are implementing
> > gids, pkeys and get_netdev() callback.
> >
> > Yes, you are misunderstanding drivers/infiniband subsystem. We have
> > plenty options to expose APIs to the user space applications, starting
> > from standard verbs API and ending with private objects which are
> > understandable by specific device/driver.
> >
> > IB stack provides secure FD to access device, by creating context,
> > after that you can send direct commands to the FW (see mlx5 DEVX
> > or hfi1) in sane way.
> >
> > So actually, you will need to register your device, declare your own
> > set of objects (similar to mlx5 include/uapi/rdma/mlx5_user_ioctl_*.h).
> >
> > In regards to reference of compression hardware, I don't have.
> > But there is an example of how T10-DIF can be implemented in verbs
> > layer:
> > https://www.openfabrics.org/images/2018workshop/presentations/307_TOved_T10-DIFOffload.pdf
> > Or IPsec crypto:
> > https://www.spinics.net/lists/linux-rdma/msg48906.html
> >
>
> OK. I will spend some time on it first. But according to current discussion,
> Don't you think I should avoid all these complexities but simply use SVM/SVA on
> iommu or let the user application use the kernel-allocated VMA and page? It
> does not create anything new. Just a new user of IOMMU and its SVM/SVA
> capability.
>

Hi, Leon,

I have done some architecture and code study to the IB solution these days. Now
I understand why you said WarpDrive was another wheel of IB. At the very
beginning when I understood the verbs concept, I had the same feeling. But when
I considered to merge them together, I finally found that it would be a disaster
to both of them if we do so.

As my understanding, the idea of IB is to manage share memory among "peers".
Verbs are to help the peers to communicate to each other with these share
memory, which is wrapped as MRs. The benefit of IB framework itself is to
provide the communication channel in most efficiency way. To do so, it let the
user process send the communicating data to the hardware directly.

While the idea of WD is simply to provide a channel between the process and the
LOCAL devices and let them share "address space", rather than "memory region".

We can take the device of WD accelerator as a "peer" in IB. But then most of the
semantics in verbs will become worthless, e.g. IBV_WR_RDMA_READ/WRITE. As a
local system, you just need to read or write to it, you don't need to "tell" you
are writing to it:). The semantics of verbs hint MRs are remote.

We can also invent a new "local dma" semantics to the original MRs semantic
space. But it is worthless, because it has already provided. Further, it bring
no benefit to the current IB users.

In another way, WD get very few benefit by integrating into IB framework
either. The verb interface provides a standard interface for memory operation.
But WD only need a pure message channel between the process and the device, it
dose not intercept between them. The ODP feature will be provided by IOMMU
framework if Jean's patchset is upstreamed, we don't need to get it from IB.
Moreover, ODP simply provides the "fault from device" feature. But WD can also be
benefit from the "share page table" feature, but which does no good to IB.

Please understand I have no any motivation to reinvent anything. As a software
architect for 10+ years and coder for 20+ years. I fully understand how hard to
make a module mature. It is not simply the problem of effort. But I also
understand that what is going to happen if we merge improper requirement to a
exist module. I don't think it is wise to merge WD into IB. It hurts both.

So would you change your previous conclusion?

Cheers
-Kenneth

> > > > >
> > > > > >
> > > > > > I assume you would not agree to register a zip accelerator to infiniband? :)
> > > > >
> > > > > "infiniband" name in the "drivers/infiniband/" is legacy one and the
> > > > > current code supports IB, RoCE, iWARP and OmniPath as a transport layers.
> > > > > For a lone time, we wanted to rename that folder to be "drivers/rdma",
> > > > > but didn't find enough brave men/women to do it, due to backport mess
> > > > > for such move.
> > > > >
> > > > > The addition of zip accelerator to RDMA is possible and depends on how
> > > > > you will model such new functionality - new driver, or maybe new ULP.
> > > > >
> > > > > >
> > > > > > Further, I don't think it is wise to break an exist system (RDMA) to fulfill a
> > > > > > totally new scenario. The better choice is to let them run in parallel for some
> > > > > > time and try to merge them accordingly.
> > > > >
> > > > > Awesome, so please run your code out-of-tree for now and once you are ready
> > > > > for submission let's try to merge it.
> > > >
> > > > Yes, yes. We know trust need time to gain. But the fact is that there is no
> > > > accelerator user driver can be added to mainline kernel. We should raise the
> > > > topic time to time. So to help the communication to fix the gap, right?
> > > >
> > > > We are also opened to cooperate with IB to do it within the IB framework. But
> > > > please let me know where to start. I feel it is quite wired to make a
> > > > ib_register_device for a zip or RSA accelerator.
> >
> > Most of ib_ prefixes in drivers/infinband/ are legacy names. You can
> > rename them to be rdma_register_device() if it helps.
> >
> > So from implementation point of view, as I wrote above.
> > Create minimal driver to register, expose MR to user space, add your own
> > objects and capabilities through our new KABI and implement user space part
> > in github.com/linux-rdma/rdma-core.
>
> I don't think it is just a name. But anyway, let me spend some time to try the
> possibility.
>
> >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Another problem we tried to address is the way to pin the memory for dma
> > > > > > > > operation. The RDMA way to pin the memory cannot avoid the page lost due to
> > > > > > > > copy-on-write operation during the memory is used by the device. This may
> > > > > > > > not be important to RDMA library. But it is important to accelerator.
> > > > > > >
> > > > > > > Such support exists in drivers/infiniband/ from late 2014 and
> > > > > > > it is called ODP (on demand paging).
> > > > > >
> > > > > > I reviewed ODP and I think it is a solution bound to infiniband. It is part of
> > > > > > MR semantics and required a infiniband specific hook
> > > > > > (ucontext->invalidate_range()). And the hook requires the device to be able to
> > > > > > stop using the page for a while for the copying. It is ok for infiniband
> > > > > > (actually, only mlx5 uses it). I don't think most accelerators can support
> > > > > > this mode. But WarpDrive works fully on top of IOMMU interface, it has no this
> > > > > > limitation.
> > > > >
> > > > > 1. It has nothing to do with infiniband.
> > > >
> > > > But it must be a ib_dev first.
> >
> > It is just a name.
> >
> > > >
> > > > > 2. MR and uncontext are verbs semantics and needed to ensure that host
> > > > > memory exposed to user is properly protected from security point of view.
> > > > > 3. "stop using the page for a while for the copying" - I'm not fully
> > > > > understand this claim, maybe this article will help you to better
> > > > > describe : https://lwn.net/Articles/753027/
> > > >
> > > > This topic was being discussed in RFCv2. The key problem here is that:
> > > >
> > > > The device need to hold the memory for its own calculation, but the CPU/software
> > > > want to stop it for a while for synchronizing with disk or COW.
> > > >
> > > > If the hardware support SVM/SVA (Shared Virtual Memory/Address), it is easy, the
> > > > device share page table with CPU, the device will raise a page fault when the
> > > > CPU downgrade the PTE to read-only.
> > > >
> > > > If the hardware cannot share page table with the CPU, we then need to have
> > > > some way to change the device page table. This is what happen in ODP. It
> > > > invalidates the page table in device upon mmu_notifier call back. But this cannot
> > > > solve the COW problem: if the user process A share a page P with device, and A
> > > > forks a new process B, and it continue to write to the page. By COW, the
> > > > process B will keep the page P, while A will get a new page P'. But you have
> > > > no way to let the device know it should use P' rather than P.
> >
> > I didn't hear about such issue and we supported fork for a long time.
> >
> > > >
> > > > This may be OK for RDMA application. Because RDMA is a big thing and we can ask
> > > > the programmer to avoid the situation. But for a accelerator, I don't think we
> > > > can ask a programmer to care for this when use a zlib.
> > > >
> > > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it support
> > > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you don't need
> > > > to write any code for that. Because it has been done by IOMMU framework. If it
> > > > dose not, you have to use the kernel allocated memory which has the same IOVA as
> > > > the VA in user space. So we can still maintain a unify address space among the
> > > > devices and the applicatin.
> > > >
> > > > > 4. mlx5 supports ODP not because of being partially IB device,
> > > > > but because HW performance oriented implementation is not an easy task.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Hope this can help the understanding.
> > > > > > >
> > > > > > > Yes, it helped me a lot.
> > > > > > > Now, I'm more than before convinced that this whole patchset shouldn't
> > > > > > > exist in the first place.
> > > > > >
> > > > > > Then maybe you can tell me how I can register my accelerator to the user space?
> > > > >
> > > > > Write kernel driver and write user space part of it.
> > > > > https://github.com/linux-rdma/rdma-core/
> > > > >
> > > > > I have no doubts that your colleagues who wrote and maintain
> > > > > drivers/infiniband/hw/hns driver know best how to do it.
> > > > > They did it very successfully.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > >
> > > > > > > To be clear, NAK.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Hard NAK from RDMA side.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
[...]

2018-11-20 05:52:07

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 02:17:21PM -0500, Jerome Glisse wrote:
> On Mon, Nov 19, 2018 at 11:53:33AM -0700, Jason Gunthorpe wrote:
> > On Mon, Nov 19, 2018 at 01:42:16PM -0500, Jerome Glisse wrote:
> > > On Mon, Nov 19, 2018 at 11:27:52AM -0700, Jason Gunthorpe wrote:
> > > > On Mon, Nov 19, 2018 at 11:48:54AM -0500, Jerome Glisse wrote:
> > > >
> > > > > Just to comment on this, any infiniband driver which use umem and do
> > > > > not have ODP (here ODP for me means listening to mmu notifier so all
> > > > > infiniband driver except mlx5) will be affected by same issue AFAICT.
> > > > >
> > > > > AFAICT there is no special thing happening after fork() inside any of
> > > > > those driver. So if parent create a umem mr before fork() and program
> > > > > hardware with it then after fork() the parent might start using new
> > > > > page for the umem range while the old memory is use by the child. The
> > > > > reverse is also true (parent using old memory and child new memory)
> > > > > bottom line you can not predict which memory the child or the parent
> > > > > will use for the range after fork().
> > > > >
> > > > > So no matter what you consider the child or the parent, what the hw
> > > > > will use for the mr is unlikely to match what the CPU use for the
> > > > > same virtual address. In other word:
> > > > >
> > > > > Before fork:
> > > > > CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
> > > > > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
> > > > >
> > > > > Case 1:
> > > > > CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
> > > > > CPU child: virtual addr ptr1 -> physical address = 0xDEAD
> > > > > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
> > > > >
> > > > > Case 2:
> > > > > CPU parent: virtual addr ptr1 -> physical address = 0xBEEF
> > > > > CPU child: virtual addr ptr1 -> physical address = 0xCAFE
> > > > > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
> > > >
> > > > IIRC this is solved in IB by automatically calling
> > > > madvise(MADV_DONTFORK) before creating the MR.
> > > >
> > > > MADV_DONTFORK
> > > > .. This is useful to prevent copy-on-write semantics from changing the
> > > > physical location of a page if the parent writes to it after a
> > > > fork(2) ..
> > >
> > > This would work around the issue but this is not transparent ie
> > > range marked with DONTFORK no longer behave as expected from the
> > > application point of view.
> >
> > Do you know what the difference is? The man page really gives no
> > hint..
> >
> > Does it sometimes unmap the pages during fork?
>
> It is handled in kernel/fork.c look for DONTCOPY, basicaly it just
> leave empty page table in the child process so child will have to
> fault in new page. This also means that child will get 0 as initial
> value for all memory address under DONTCOPY/DONTFORK which breaks
> application expectation of what fork() do.

Hum, I wonder why this API was selected then..

> > I actually wonder if the kernel is a bit broken here, we have the same
> > problem with O_DIRECT and other stuff, right?
>
> No it is not, O_DIRECT is fine. The only corner case i can think
> of with O_DIRECT is one thread launching an O_DIRECT that write
> to private anonymous memory (other O_DIRECT case do not matter)
> while another thread call fork() then what the child get can be
> undefined ie either it get the data before the O_DIRECT finish
> or it gets the result of the O_DIRECT. But this is realy what
> you should expect when doing such thing without synchronization.
>
> So O_DIRECT is fine.

?? How can O_DIRECT be fine but RDMA not? They use exactly the same
get_user_pages flow, right? Can we do what O_DIRECT does in RDMA and
be fine too?

AFAIK the only difference is the length of the race window. You'd have
to fork and fault during the shorter time O_DIRECT has get_user_pages
open.

> > Really, if I have a get_user_pages FOLL_WRITE on a page and we fork,
> > then shouldn't the COW immediately be broken during the fork?
> >
> > The kernel can't guarentee that an ongoing DMA will not write to those
> > pages, and it breaks the fork semantic to write to both processes.
>
> Fixing that would incur a high cost: need to grow struct page, need
> to copy potentialy gigabyte of memory during fork() ... this would be
> a serious performance regression for many folks just to work around an
> abuse of device driver. So i don't think anything on that front would
> be welcome.

Why? Keep track in each mm if there are any active get_user_pages
FOLL_WRITE pages in the mm, if yes then sweep the VMAs and fix the
issue for the FOLL_WRITE pages.

John is already working on being able to detect pages under GUP, so it
seems like a small step..

Since nearly all cases of fork don't have a GUP FOLL_WRITE active
there would be no performance hit.

> umem without proper ODP and VFIO are the only bad user i know of (for
> VFIO you can argue that it is part of the API contract and thus that
> it is not an abuse but it is not spell out loud in documentation). I
> have been trying to push back on any people trying to push thing that
> would make the same mistake or at least making sure they understand
> what is happening.

It is something we have to live with and support for the foreseeable
future.

> What really need to happen is people fixing their hardware and do the
> right thing (good software engineer versus evil hardware engineer ;))

Even ODP is no pancea, there are performance problems. What we really
need is CAPI like stuff, so you will tell Intel to redesign the CPU??
:)

Jason

2018-11-13 10:19:33

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote:
> From: Kenneth Lee <[email protected]>
>
> WarpDrive is a general accelerator framework for the user application to
> access the hardware without going through the kernel in data path.
>
> The kernel component to provide kernel facility to driver for expose the
> user interface is called uacce. It a short name for
> "Unified/User-space-access-intended Accelerator Framework".
>
> This patch add document to explain how it works.

+ RDMA and netdev folks

Sorry, to be late in the game, I don't see other patches, but from
the description below it seems like you are reinventing RDMA verbs
model. I have hard time to see the differences in the proposed
framework to already implemented in drivers/infiniband/* for the kernel
space and for the https://github.com/linux-rdma/rdma-core/ for the user
space parts.

Hard NAK from RDMA side.

Thanks

>
> Signed-off-by: Kenneth Lee <[email protected]>
> ---
> Documentation/warpdrive/warpdrive.rst | 260 +++++++
> Documentation/warpdrive/wd-arch.svg | 764 ++++++++++++++++++++
> Documentation/warpdrive/wd.svg | 526 ++++++++++++++
> Documentation/warpdrive/wd_q_addr_space.svg | 359 +++++++++
> 4 files changed, 1909 insertions(+)
> create mode 100644 Documentation/warpdrive/warpdrive.rst
> create mode 100644 Documentation/warpdrive/wd-arch.svg
> create mode 100644 Documentation/warpdrive/wd.svg
> create mode 100644 Documentation/warpdrive/wd_q_addr_space.svg
>
> diff --git a/Documentation/warpdrive/warpdrive.rst b/Documentation/warpdrive/warpdrive.rst
> new file mode 100644
> index 000000000000..ef84d3a2d462
> --- /dev/null
> +++ b/Documentation/warpdrive/warpdrive.rst
> @@ -0,0 +1,260 @@
> +Introduction of WarpDrive
> +=========================
> +
> +*WarpDrive* is a general accelerator framework for the user application to
> +access the hardware without going through the kernel in data path.
> +
> +It can be used as the quick channel for accelerators, network adaptors or
> +other hardware for application in user space.
> +
> +This may make some implementation simpler. E.g. you can reuse most of the
> +*netdev* driver in kernel and just share some ring buffer to the user space
> +driver for *DPDK* [4] or *ODP* [5]. Or you can combine the RSA accelerator with
> +the *netdev* in the user space as a https reversed proxy, etc.
> +
> +*WarpDrive* takes the hardware accelerator as a heterogeneous processor which
> +can share particular load from the CPU:
> +
> +.. image:: wd.svg
> + :alt: WarpDrive Concept
> +
> +The virtual concept, queue, is used to manage the requests sent to the
> +accelerator. The application send requests to the queue by writing to some
> +particular address, while the hardware takes the requests directly from the
> +address and send feedback accordingly.
> +
> +The format of the queue may differ from hardware to hardware. But the
> +application need not to make any system call for the communication.
> +
> +*WarpDrive* tries to create a shared virtual address space for all involved
> +accelerators. Within this space, the requests sent to queue can refer to any
> +virtual address, which will be valid to the application and all involved
> +accelerators.
> +
> +The name *WarpDrive* is simply a cool and general name meaning the framework
> +makes the application faster. It includes general user library, kernel
> +management module and drivers for the hardware. In kernel, the management
> +module is called *uacce*, meaning "Unified/User-space-access-intended
> +Accelerator Framework".
> +
> +
> +How does it work
> +================
> +
> +*WarpDrive* uses *mmap* and *IOMMU* to play the trick.
> +
> +*Uacce* creates a chrdev for the device registered to it. A "queue" will be
> +created when the chrdev is opened. The application access the queue by mmap
> +different address region of the queue file.
> +
> +The following figure demonstrated the queue file address space:
> +
> +.. image:: wd_q_addr_space.svg
> + :alt: WarpDrive Queue Address Space
> +
> +The first region of the space, device region, is used for the application to
> +write request or read answer to or from the hardware.
> +
> +Normally, there can be three types of device regions mmio and memory regions.
> +It is recommended to use common memory for request/answer descriptors and use
> +the mmio space for device notification, such as doorbell. But of course, this
> +is all up to the interface designer.
> +
> +There can be two types of device memory regions, kernel-only and user-shared.
> +This will be explained in the "kernel APIs" section.
> +
> +The Static Share Virtual Memory region is necessary only when the device IOMMU
> +does not support "Share Virtual Memory". This will be explained after the
> +*IOMMU* idea.
> +
> +
> +Architecture
> +------------
> +
> +The full *WarpDrive* architecture is represented in the following class
> +diagram:
> +
> +.. image:: wd-arch.svg
> + :alt: WarpDrive Architecture
> +
> +
> +The user API
> +------------
> +
> +We adopt a polling style interface in the user space: ::
> +
> + int wd_request_queue(struct wd_queue *q);
> + void wd_release_queue(struct wd_queue *q);
> +
> + int wd_send(struct wd_queue *q, void *req);
> + int wd_recv(struct wd_queue *q, void **req);
> + int wd_recv_sync(struct wd_queue *q, void **req);
> + void wd_flush(struct wd_queue *q);
> +
> +wd_recv_sync() is a wrapper to its non-sync version. It will trapped into
> +kernel and waits until the queue become available.
> +
> +If the queue do not support SVA/SVM. The following helper function
> +can be used to create Static Virtual Share Memory: ::
> +
> + void *wd_preserve_share_memory(struct wd_queue *q, size_t size);
> +
> +The user API is not mandatory. It is simply a suggestion and hint what the
> +kernel interface is supposed to support.
> +
> +
> +The user driver
> +---------------
> +
> +The queue file mmap space will need a user driver to wrap the communication
> +protocol. *UACCE* provides some attributes in sysfs for the user driver to
> +match the right accelerator accordingly.
> +
> +The *UACCE* device attribute is under the following directory:
> +
> +/sys/class/uacce/<dev-name>/params
> +
> +The following attributes is supported:
> +
> +nr_queue_remained (ro)
> + number of queue remained
> +
> +api_version (ro)
> + a string to identify the queue mmap space format and its version
> +
> +device_attr (ro)
> + attributes of the device, see UACCE_DEV_xxx flag defined in uacce.h
> +
> +numa_node (ro)
> + id of numa node
> +
> +priority (rw)
> + Priority or the device, bigger is higher
> +
> +(This is not yet implemented in RFC version)
> +
> +
> +The kernel API
> +--------------
> +
> +The *uacce* kernel API is defined in uacce.h. If the hardware support SVM/SVA,
> +The driver need only the following API functions: ::
> +
> + int uacce_register(uacce);
> + void uacce_unregister(uacce);
> + void uacce_wake_up(q);
> +
> +*uacce_wake_up* is used to notify the process who epoll() on the queue file.
> +
> +According to the IOMMU capability, *uacce* categories the devices as follow:
> +
> +UACCE_DEV_NOIOMMU
> + The device has no IOMMU. The user process cannot use VA on the hardware
> + This mode is not recommended.
> +
> +UACCE_DEV_SVA (UACCE_DEV_PASID | UACCE_DEV_FAULT_FROM_DEV)
> + The device has IOMMU which can share the same page table with user
> + process
> +
> +UACCE_DEV_SHARE_DOMAIN
> + The device has IOMMU which has no multiple page table and device page
> + fault support
> +
> +If the device works in mode other than UACCE_DEV_NOIOMMU, *uacce* will set its
> +IOMMU to IOMMU_DOMAIN_UNMANAGED. So the driver must not use any kernel
> +DMA API but the following ones from *uacce* instead: ::
> +
> + uacce_dma_map(q, va, size, prot);
> + uacce_dma_unmap(q, va, size, prot);
> +
> +*uacce_dma_map/unmap* is valid only for UACCE_DEV_SVA device. It creates a
> +particular PASID and page table for the kernel in the IOMMU (Not yet
> +implemented in the RFC)
> +
> +For the UACCE_DEV_SHARE_DOMAIN device, uacce_dma_map/unmap is not valid.
> +*Uacce* call back start_queue only when the DUS and DKO region is mmapped. The
> +accelerator driver must use those dma buffer, via uacce_queue->qfrs[], on
> +start_queue call back. The size of the queue file region is defined by
> +uacce->ops->qf_pg_start[].
> +
> +We have to do it this way because most of current IOMMU cannot support the
> +kernel and user virtual address at the same time. So we have to let them both
> +share the same user virtual address space.
> +
> +If the device have to support kernel and user at the same time, both kernel
> +and the user should use these DMA API. This is not convenient. A better
> +solution is to change the future DMA/IOMMU design to let them separate the
> +address space between the user and kernel space. But it is not going to be in
> +a short time.
> +
> +
> +Multiple processes support
> +==========================
> +
> +In the latest mainline kernel (4.19) when this document is written, the IOMMU
> +subsystem do not support multiple process page tables yet.
> +
> +Most IOMMU hardware implementation support multi-process with the concept
> +of PASID. But they may use different name, e.g. it is call sub-stream-id in
> +SMMU of ARM. With PASID or similar design, multi page table can be added to
> +the IOMMU and referred by its PASID.
> +
> +*JPB* has a patchset to enable this[1]_. We have tested it with our hardware
> +(which is known as *D06*). It works well. *WarpDrive* rely on them to support
> +UACCE_DEV_SVA. If it is not enabled, *WarpDrive* can still work. But it
> +support only one process, the device will be set to UACCE_DEV_SHARE_DOMAIN
> +even it is set to UACCE_DEV_SVA initially.
> +
> +Static Share Virtual Memory is mainly used by UACCE_DEV_SHARE_DOMAIN device.
> +
> +
> +Legacy Mode Support
> +===================
> +For the hardware without IOMMU, WarpDrive can still work, the only problem is
> +VA cannot be used in the device. The driver should adopt another strategy for
> +the shared memory. It is only for testing, and not recommended.
> +
> +
> +The Folk Scenario
> +=================
> +For a process with allocated queues and shared memory, what happen if it forks
> +a child?
> +
> +The fd of the queue will be duplicated on folk, so the child can send request
> +to the same queue as its parent. But the requests which is sent from processes
> +except for the one who open the queue will be blocked.
> +
> +It is recommended to add O_CLOEXEC to the queue file.
> +
> +The queue mmap space has a VM_DONTCOPY in its VMA. So the child will lost all
> +those VMAs.
> +
> +This is why *WarpDrive* does not adopt the mode used in *VFIO* and *InfiniBand*.
> +Both solutions can set any user pointer for hardware sharing. But they cannot
> +support fork when the dma is in process. Or the "Copy-On-Write" procedure will
> +make the parent process lost its physical pages.
> +
> +
> +The Sample Code
> +===============
> +There is a sample user land implementation with a simple driver for Hisilicon
> +Hi1620 ZIP Accelerator.
> +
> +To test, do the following in samples/warpdrive (for the case of PC host): ::
> + ./autogen.sh
> + ./conf.sh # or simply ./configure if you build on target system
> + make
> +
> +Then you can get test_hisi_zip in the test subdirectory. Copy it to the target
> +system and make sure the hisi_zip driver is enabled (the major and minor of
> +the uacce chrdev can be gotten from the dmesg or sysfs), and run: ::
> + mknod /dev/ua1 c <major> <minior>
> + test/test_hisi_zip -z < data > data.zip
> + test/test_hisi_zip -g < data > data.gzip
> +
> +
> +References
> +==========
> +.. [1] https://patchwork.kernel.org/patch/10394851/
> +
> +.. vim: tw=78
> diff --git a/Documentation/warpdrive/wd-arch.svg b/Documentation/warpdrive/wd-arch.svg
> new file mode 100644
> index 000000000000..e59934188443
> --- /dev/null
> +++ b/Documentation/warpdrive/wd-arch.svg
> @@ -0,0 +1,764 @@
> +<?xml version="1.0" encoding="UTF-8" standalone="no"?>
> +<!-- Created with Inkscape (http://www.inkscape.org/) -->
> +
> +<svg
> + xmlns:dc="http://purl.org/dc/elements/1.1/"
> + xmlns:cc="http://creativecommons.org/ns#"
> + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> + xmlns:svg="http://www.w3.org/2000/svg"
> + xmlns="http://www.w3.org/2000/svg"
> + xmlns:xlink="http://www.w3.org/1999/xlink"
> + xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
> + xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
> + width="210mm"
> + height="193mm"
> + viewBox="0 0 744.09449 683.85823"
> + id="svg2"
> + version="1.1"
> + inkscape:version="0.92.3 (2405546, 2018-03-11)"
> + sodipodi:docname="wd-arch.svg">
> + <defs
> + id="defs4">
> + <linearGradient
> + inkscape:collect="always"
> + id="linearGradient6830">
> + <stop
> + style="stop-color:#000000;stop-opacity:1;"
> + offset="0"
> + id="stop6832" />
> + <stop
> + style="stop-color:#000000;stop-opacity:0;"
> + offset="1"
> + id="stop6834" />
> + </linearGradient>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="translate(-89.949614,405.94594)" />
> + <linearGradient
> + inkscape:collect="always"
> + id="linearGradient5026">
> + <stop
> + style="stop-color:#f2f2f2;stop-opacity:1;"
> + offset="0"
> + id="stop5028" />
> + <stop
> + style="stop-color:#f2f2f2;stop-opacity:0;"
> + offset="1"
> + id="stop5030" />
> + </linearGradient>
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6" />
> + </filter>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-1"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="translate(175.77842,400.29111)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-0"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-9" />
> + </filter>
> + <marker
> + markerWidth="18.960653"
> + markerHeight="11.194658"
> + refX="9.4803267"
> + refY="5.5973287"
> + orient="auto"
> + id="marker4613">
> + <rect
> + y="-5.1589785"
> + x="5.8504119"
> + height="10.317957"
> + width="10.317957"
> + id="rect4212"
> + style="fill:#ffffff;stroke:#000000;stroke-width:0.69143367;stroke-miterlimit:4;stroke-dasharray:none"
> + transform="matrix(0.86111274,0.50841405,-0.86111274,0.50841405,0,0)">
> + <title
> + id="title4262">generation</title>
> + </rect>
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3-9"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(1.2452511,0,0,0.98513016,-190.95632,540.33156)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-5-8"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-3-9" />
> + </filter>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-1">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-9"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3-9-7"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(1.3742742,0,0,0.97786398,-234.52617,654.63367)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-5-8-5"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-3-9-0" />
> + </filter>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-6">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-1"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3-9-4"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(1.3742912,0,0,2.0035845,-468.34428,342.56603)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-5-8-54"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-3-9-7" />
> + </filter>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-1-8">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-9-6"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-1-8-8">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-9-6-9"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-0">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-93"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-0-2">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-93-6"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter5382"
> + x="-0.089695387"
> + width="1.1793908"
> + y="-0.10052069"
> + height="1.2010413">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="0.86758925"
> + id="feGaussianBlur5384" />
> + </filter>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient6830"
> + id="linearGradient6836"
> + x1="362.73923"
> + y1="700.04059"
> + x2="340.4751"
> + y2="678.25488"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="translate(-23.771026,-135.76835)" />
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-6-2">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-1-9"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3-9-7-3"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(1.3742742,0,0,0.97786395,-57.357186,649.55786)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-5-8-5-0"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-3-9-0-2" />
> + </filter>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-1-1">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-9-0"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + </defs>
> + <sodipodi:namedview
> + id="base"
> + pagecolor="#ffffff"
> + bordercolor="#666666"
> + borderopacity="1.0"
> + inkscape:pageopacity="0.0"
> + inkscape:pageshadow="2"
> + inkscape:zoom="0.98994949"
> + inkscape:cx="222.32868"
> + inkscape:cy="370.44492"
> + inkscape:document-units="px"
> + inkscape:current-layer="layer1"
> + showgrid="false"
> + inkscape:window-width="1916"
> + inkscape:window-height="1033"
> + inkscape:window-x="0"
> + inkscape:window-y="22"
> + inkscape:window-maximized="0"
> + fit-margin-right="0.3"
> + inkscape:snap-global="false" />
> + <metadata
> + id="metadata7">
> + <rdf:RDF>
> + <cc:Work
> + rdf:about="">
> + <dc:format>image/svg+xml</dc:format>
> + <dc:type
> + rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
> + <dc:title />
> + </cc:Work>
> + </rdf:RDF>
> + </metadata>
> + <g
> + inkscape:label="Layer 1"
> + inkscape:groupmode="layer"
> + id="layer1"
> + transform="translate(0,-368.50374)">
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3)"
> + id="rect4136-3-6"
> + width="101.07784"
> + height="31.998148"
> + x="283.01144"
> + y="588.80896" />
> + <rect
> + style="fill:url(#linearGradient5032);fill-opacity:1;stroke:#000000;stroke-width:0.6465112"
> + id="rect4136-2"
> + width="101.07784"
> + height="31.998148"
> + x="281.63498"
> + y="586.75739" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="294.21747"
> + y="612.50073"
> + id="text4138-6"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1"
> + x="294.21747"
> + y="612.50073"
> + style="font-size:15px;line-height:1.25">WarpDrive</tspan></text>
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-0)"
> + id="rect4136-3-6-3"
> + width="101.07784"
> + height="31.998148"
> + x="548.7395"
> + y="583.15417" />
> + <rect
> + style="fill:url(#linearGradient5032-1);fill-opacity:1;stroke:#000000;stroke-width:0.6465112"
> + id="rect4136-2-60"
> + width="101.07784"
> + height="31.998148"
> + x="547.36304"
> + y="581.1026" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="557.83484"
> + y="602.32745"
> + id="text4138-6-6"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-2"
> + x="557.83484"
> + y="602.32745"
> + style="font-size:15px;line-height:1.25">user_driver</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4613)"
> + d="m 547.36304,600.78954 -156.58203,0.0691"
> + id="path4855"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8)"
> + id="rect4136-3-6-5-7"
> + width="101.07784"
> + height="31.998148"
> + x="128.74678"
> + y="80.648842"
> + transform="matrix(1.2452511,0,0,0.98513016,113.15182,641.02594)" />
> + <rect
> + style="fill:url(#linearGradient5032-3-9);fill-opacity:1;stroke:#000000;stroke-width:0.71606314"
> + id="rect4136-2-6-3"
> + width="125.86729"
> + height="31.522341"
> + x="271.75983"
> + y="718.45435" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="309.13705"
> + y="745.55371"
> + id="text4138-6-2-6"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-1"
> + x="309.13705"
> + y="745.55371"
> + style="font-size:15px;line-height:1.25">uacce</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2)"
> + d="m 329.57309,619.72453 5.0373,97.14447"
> + id="path4661-3"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-1)"
> + d="m 342.57219,830.63108 -5.67699,-79.2841"
> + id="path4661-3-4"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-5)"
> + id="rect4136-3-6-5-7-3"
> + width="101.07784"
> + height="31.998148"
> + x="128.74678"
> + y="80.648842"
> + transform="matrix(1.3742742,0,0,0.97786398,101.09126,754.58534)" />
> + <rect
> + style="fill:url(#linearGradient5032-3-9-7);fill-opacity:1;stroke:#000000;stroke-width:0.74946606"
> + id="rect4136-2-6-3-6"
> + width="138.90866"
> + height="31.289837"
> + x="276.13297"
> + y="831.44263" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="295.67819"
> + y="852.98224"
> + id="text4138-6-2-6-1"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-1-0"
> + x="295.67819"
> + y="852.98224"
> + style="font-size:15px;line-height:1.25">Device Driver</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-6)"
> + d="m 623.05084,615.00104 0.51369,333.80219"
> + id="path4661-3-5"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="392.63568"
> + y="660.83667"
> + id="text4138-6-2-6-1-6-2-5"><tspan
> + sodipodi:role="line"
> + x="392.63568"
> + y="660.83667"
> + id="tspan4305"
> + style="font-size:15px;line-height:1.25">&lt;&lt;anom_file&gt;&gt;</tspan><tspan
> + sodipodi:role="line"
> + x="392.63568"
> + y="679.58667"
> + style="font-size:15px;line-height:1.25"
> + id="tspan1139">Queue FD</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="389.92969"
> + y="587.44836"
> + id="text4138-6-2-6-1-6-2-56"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-1-0-3-0-9"
> + x="389.92969"
> + y="587.44836"
> + style="font-size:15px;line-height:1.25">1</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="528.64813"
> + y="600.08429"
> + id="text4138-6-2-6-1-6-3"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-1-0-3-7"
> + x="528.64813"
> + y="600.08429"
> + style="font-size:15px;line-height:1.25">*</tspan></text>
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-54)"
> + id="rect4136-3-6-5-7-4"
> + width="101.07784"
> + height="31.998148"
> + x="128.74678"
> + y="80.648842"
> + transform="matrix(1.3745874,0,0,1.8929066,-132.7754,556.04505)" />
> + <rect
> + style="fill:url(#linearGradient5032-3-9-4);fill-opacity:1;stroke:#000000;stroke-width:1.07280123"
> + id="rect4136-2-6-3-4"
> + width="138.91039"
> + height="64.111"
> + x="42.321312"
> + y="704.8371" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="110.30745"
> + y="722.94025"
> + id="text4138-6-2-6-3"><tspan
> + sodipodi:role="line"
> + x="111.99202"
> + y="722.94025"
> + id="tspan4366"
> + style="font-size:15px;line-height:1.25;text-align:center;text-anchor:middle">other standard </tspan><tspan
> + sodipodi:role="line"
> + x="110.30745"
> + y="741.69025"
> + id="tspan4368"
> + style="font-size:15px;line-height:1.25;text-align:center;text-anchor:middle">framework</tspan><tspan
> + sodipodi:role="line"
> + x="110.30745"
> + y="760.44025"
> + style="font-size:15px;line-height:1.25;text-align:center;text-anchor:middle"
> + id="tspan6840">(crypto/nic/others)</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-1-8)"
> + d="M 276.29661,849.04109 134.04449,771.90853"
> + id="path4661-3-4-8"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="313.70813"
> + y="730.06366"
> + id="text4138-6-2-6-36"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-1-7"
> + x="313.70813"
> + y="730.06366"
> + style="font-size:10px;line-height:1.25">&lt;&lt;lkm&gt;&gt;</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:start;letter-spacing:0px;word-spacing:0px;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="259.53165"
> + y="797.8056"
> + id="text4138-6-2-6-1-6-2-5-7-5"><tspan
> + sodipodi:role="line"
> + x="259.53165"
> + y="797.8056"
> + style="font-size:15px;line-height:1.25;text-align:start;text-anchor:start"
> + id="tspan2357">uacce register api</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="29.145819"
> + y="833.44244"
> + id="text4138-6-2-6-1-6-2-5-7-5-2"><tspan
> + sodipodi:role="line"
> + x="29.145819"
> + y="833.44244"
> + id="tspan4301"
> + style="font-size:15px;line-height:1.25">register to other subsystem</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="301.20813"
> + y="597.29437"
> + id="text4138-6-2-6-36-1"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-1-7-2"
> + x="301.20813"
> + y="597.29437"
> + style="font-size:10px;line-height:1.25">&lt;&lt;user_lib&gt;&gt;</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="615.9505"
> + y="739.44012"
> + id="text4138-6-2-6-1-6-2-5-3"><tspan
> + sodipodi:role="line"
> + x="615.9505"
> + y="739.44012"
> + id="tspan4274-7"
> + style="font-size:15px;line-height:1.25">mmapped memory r/w interface</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="371.01291"
> + y="529.23682"
> + id="text4138-6-2-6-1-6-2-5-36"><tspan
> + sodipodi:role="line"
> + x="371.01291"
> + y="529.23682"
> + id="tspan4305-3"
> + style="font-size:15px;line-height:1.25">wd user api</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + d="m 328.19325,585.87943 0,-23.57142"
> + id="path4348"
> + inkscape:connector-curvature="0" />
> + <ellipse
> + style="opacity:1;fill:#ffffff;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0"
> + id="path4350"
> + cx="328.01468"
> + cy="551.95081"
> + rx="11.607142"
> + ry="10.357142" />
> + <path
> + style="opacity:0.444;fill:url(#linearGradient6836);fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:1;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;filter:url(#filter5382)"
> + id="path4350-2"
> + sodipodi:type="arc"
> + sodipodi:cx="329.44327"
> + sodipodi:cy="553.37933"
> + sodipodi:rx="11.607142"
> + sodipodi:ry="10.357142"
> + sodipodi:start="0"
> + sodipodi:end="6.2509098"
> + d="m 341.05041,553.37933 a 11.607142,10.357142 0 0 1 -11.51349,10.35681 11.607142,10.357142 0 0 1 -11.69928,-10.18967 11.607142,10.357142 0 0 1 11.32469,-10.52124 11.607142,10.357142 0 0 1 11.88204,10.01988"
> + sodipodi:open="true" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="619.67596"
> + y="978.22363"
> + id="text4138-6-2-6-1-6-2-5-36-3"><tspan
> + sodipodi:role="line"
> + x="619.67596"
> + y="978.22363"
> + id="tspan4305-3-67"
> + style="font-size:15px;line-height:1.25">Device(Hardware)</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-6-2)"
> + d="m 347.51164,865.4527 193.91929,99.10053"
> + id="path4661-3-5-1"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-5-0)"
> + id="rect4136-3-6-5-7-3-1"
> + width="101.07784"
> + height="31.998148"
> + x="128.74678"
> + y="80.648842"
> + transform="matrix(1.3742742,0,0,0.97786395,278.26025,749.50952)" />
> + <rect
> + style="fill:url(#linearGradient5032-3-9-7-3);fill-opacity:1;stroke:#000000;stroke-width:0.74946606"
> + id="rect4136-2-6-3-6-0"
> + width="138.90868"
> + height="31.289839"
> + x="453.30197"
> + y="826.36682" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="493.68158"
> + y="847.90643"
> + id="text4138-6-2-6-1-5"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-1-0-1"
> + x="493.68158"
> + y="847.90643"
> + style="font-size:15px;line-height:1.25;stroke-width:1px">IOMMU</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-2-1-1)"
> + d="m 389.49372,755.46667 111.75324,68.4507"
> + id="path4661-3-4-85"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;text-align:start;letter-spacing:0px;word-spacing:0px;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="427.70282"
> + y="776.91418"
> + id="text4138-6-2-6-1-6-2-5-7-5-0"><tspan
> + sodipodi:role="line"
> + x="427.70282"
> + y="776.91418"
> + style="font-size:15px;line-height:1.25;text-align:start;text-anchor:start;stroke-width:1px"
> + id="tspan2357-6">manage the driver iommu state</tspan></text>
> + </g>
> +</svg>
> diff --git a/Documentation/warpdrive/wd.svg b/Documentation/warpdrive/wd.svg
> new file mode 100644
> index 000000000000..87ab92ebfbc6
> --- /dev/null
> +++ b/Documentation/warpdrive/wd.svg
> @@ -0,0 +1,526 @@
> +<?xml version="1.0" encoding="UTF-8" standalone="no"?>
> +<!-- Created with Inkscape (http://www.inkscape.org/) -->
> +
> +<svg
> + xmlns:dc="http://purl.org/dc/elements/1.1/"
> + xmlns:cc="http://creativecommons.org/ns#"
> + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> + xmlns:svg="http://www.w3.org/2000/svg"
> + xmlns="http://www.w3.org/2000/svg"
> + xmlns:xlink="http://www.w3.org/1999/xlink"
> + xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
> + xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
> + width="210mm"
> + height="116mm"
> + viewBox="0 0 744.09449 411.02338"
> + id="svg2"
> + version="1.1"
> + inkscape:version="0.92.3 (2405546, 2018-03-11)"
> + sodipodi:docname="wd.svg">
> + <defs
> + id="defs4">
> + <linearGradient
> + inkscape:collect="always"
> + id="linearGradient5026">
> + <stop
> + style="stop-color:#f2f2f2;stop-opacity:1;"
> + offset="0"
> + id="stop5028" />
> + <stop
> + style="stop-color:#f2f2f2;stop-opacity:0;"
> + offset="1"
> + id="stop5030" />
> + </linearGradient>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(2.7384117,0,0,0.91666329,-952.8283,571.10143)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-5"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-3" />
> + </filter>
> + <marker
> + markerWidth="18.960653"
> + markerHeight="11.194658"
> + refX="9.4803267"
> + refY="5.5973287"
> + orient="auto"
> + id="marker4613">
> + <rect
> + y="-5.1589785"
> + x="5.8504119"
> + height="10.317957"
> + width="10.317957"
> + id="rect4212"
> + style="fill:#ffffff;stroke:#000000;stroke-width:0.69143367;stroke-miterlimit:4;stroke-dasharray:none"
> + transform="matrix(0.86111274,0.50841405,-0.86111274,0.50841405,0,0)">
> + <title
> + id="title4262">generation</title>
> + </rect>
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3-9"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(1.2452511,0,0,0.98513016,-190.95632,540.33156)" />
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-1">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-9"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-6">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-1"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-1-8">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-9-6"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-1-8-8">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-9-6-9"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-0">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-93"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-0-2">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-93-6"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-2-6-2">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-9-1-9"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3-8"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(1.0104674,0,0,1.0052679,-218.642,661.15448)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-5-8"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-3-9" />
> + </filter>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3-8-2"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(2.1450559,0,0,1.0052679,-521.97704,740.76422)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-5-8-5"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-3-9-1" />
> + </filter>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3-8-0"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(1.0104674,0,0,1.0052679,83.456748,660.20747)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-5-8-6"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-3-9-2" />
> + </filter>
> + <linearGradient
> + inkscape:collect="always"
> + xlink:href="#linearGradient5026"
> + id="linearGradient5032-3-84"
> + x1="353"
> + y1="211.3622"
> + x2="565.5"
> + y2="174.8622"
> + gradientUnits="userSpaceOnUse"
> + gradientTransform="matrix(1.9884948,0,0,0.94903536,-318.42665,564.37696)" />
> + <filter
> + inkscape:collect="always"
> + style="color-interpolation-filters:sRGB"
> + id="filter4169-3-5-4"
> + x="-0.031597666"
> + width="1.0631953"
> + y="-0.099812768"
> + height="1.1996255">
> + <feGaussianBlur
> + inkscape:collect="always"
> + stdDeviation="1.3307599"
> + id="feGaussianBlur4171-6-3-0" />
> + </filter>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-0-0">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-93-8"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + <marker
> + markerWidth="11.227358"
> + markerHeight="12.355258"
> + refX="10"
> + refY="6.177629"
> + orient="auto"
> + id="marker4825-6-3">
> + <path
> + inkscape:connector-curvature="0"
> + id="path4757-1-1"
> + d="M 0.42024733,0.42806444 10.231357,6.3500844 0.24347733,11.918544"
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1" />
> + </marker>
> + </defs>
> + <sodipodi:namedview
> + id="base"
> + pagecolor="#ffffff"
> + bordercolor="#666666"
> + borderopacity="1.0"
> + inkscape:pageopacity="0.0"
> + inkscape:pageshadow="2"
> + inkscape:zoom="0.98994949"
> + inkscape:cx="457.47339"
> + inkscape:cy="250.14781"
> + inkscape:document-units="px"
> + inkscape:current-layer="layer1"
> + showgrid="false"
> + inkscape:window-width="1916"
> + inkscape:window-height="1033"
> + inkscape:window-x="0"
> + inkscape:window-y="22"
> + inkscape:window-maximized="0"
> + fit-margin-right="0.3" />
> + <metadata
> + id="metadata7">
> + <rdf:RDF>
> + <cc:Work
> + rdf:about="">
> + <dc:format>image/svg+xml</dc:format>
> + <dc:type
> + rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
> + <dc:title></dc:title>
> + </cc:Work>
> + </rdf:RDF>
> + </metadata>
> + <g
> + inkscape:label="Layer 1"
> + inkscape:groupmode="layer"
> + id="layer1"
> + transform="translate(0,-641.33861)">
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5)"
> + id="rect4136-3-6-5"
> + width="101.07784"
> + height="31.998148"
> + x="128.74678"
> + y="80.648842"
> + transform="matrix(2.7384116,0,0,0.91666328,-284.06895,664.79751)" />
> + <rect
> + style="fill:url(#linearGradient5032-3);fill-opacity:1;stroke:#000000;stroke-width:1.02430749"
> + id="rect4136-2-6"
> + width="276.79272"
> + height="29.331528"
> + x="64.723419"
> + y="736.84473" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="78.223282"
> + y="756.79803"
> + id="text4138-6-2"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9"
> + x="78.223282"
> + y="756.79803"
> + style="font-size:15px;line-height:1.25">user application (running by the CPU</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6)"
> + d="m 217.67507,876.6738 113.40331,45.0758"
> + id="path4661"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-0)"
> + d="m 208.10197,767.69811 0.29362,76.03656"
> + id="path4661-6"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8)"
> + id="rect4136-3-6-5-3"
> + width="101.07784"
> + height="31.998148"
> + x="128.74678"
> + y="80.648842"
> + transform="matrix(1.0104673,0,0,1.0052679,28.128628,763.90722)" />
> + <rect
> + style="fill:url(#linearGradient5032-3-8);fill-opacity:1;stroke:#000000;stroke-width:0.65159565"
> + id="rect4136-2-6-6"
> + width="102.13586"
> + height="32.16671"
> + x="156.83217"
> + y="842.91852" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="188.58519"
> + y="864.47125"
> + id="text4138-6-2-8"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-0"
> + x="188.58519"
> + y="864.47125"
> + style="font-size:15px;line-height:1.25;stroke-width:1px">MMU</tspan></text>
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-5)"
> + id="rect4136-3-6-5-3-1"
> + width="101.07784"
> + height="31.998148"
> + x="128.74678"
> + y="80.648842"
> + transform="matrix(2.1450556,0,0,1.0052679,1.87637,843.51696)" />
> + <rect
> + style="fill:url(#linearGradient5032-3-8-2);fill-opacity:1;stroke:#000000;stroke-width:0.94937181"
> + id="rect4136-2-6-6-0"
> + width="216.8176"
> + height="32.16671"
> + x="275.09283"
> + y="922.5282" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="347.81482"
> + y="943.23291"
> + id="text4138-6-2-8-8"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-0-5"
> + x="347.81482"
> + y="943.23291"
> + style="font-size:15px;line-height:1.25;stroke-width:1px">Memory</tspan></text>
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-8-6)"
> + id="rect4136-3-6-5-3-5"
> + width="101.07784"
> + height="31.998148"
> + x="128.74678"
> + y="80.648842"
> + transform="matrix(1.0104673,0,0,1.0052679,330.22737,762.9602)" />
> + <rect
> + style="fill:url(#linearGradient5032-3-8-0);fill-opacity:1;stroke:#000000;stroke-width:0.65159565"
> + id="rect4136-2-6-6-8"
> + width="102.13586"
> + height="32.16671"
> + x="458.93091"
> + y="841.9715" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="490.68393"
> + y="863.52423"
> + id="text4138-6-2-8-6"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-0-2"
> + x="490.68393"
> + y="863.52423"
> + style="font-size:15px;line-height:1.25;stroke-width:1px">IOMMU</tspan></text>
> + <rect
> + style="fill:#000000;stroke:#000000;stroke-width:0.6465112;filter:url(#filter4169-3-5-4)"
> + id="rect4136-3-6-5-6"
> + width="101.07784"
> + height="31.998148"
> + x="128.74678"
> + y="80.648842"
> + transform="matrix(1.9884947,0,0,0.94903537,167.19229,661.38193)" />
> + <rect
> + style="fill:url(#linearGradient5032-3-84);fill-opacity:1;stroke:#000000;stroke-width:0.88813609"
> + id="rect4136-2-6-2"
> + width="200.99274"
> + height="30.367374"
> + x="420.4675"
> + y="735.97351" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:12px;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="441.95297"
> + y="755.9068"
> + id="text4138-6-2-9"><tspan
> + sodipodi:role="line"
> + id="tspan4140-1-9-9"
> + x="441.95297"
> + y="755.9068"
> + style="font-size:15px;line-height:1.25;stroke-width:1px">Hardware Accelerator</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-0-0)"
> + d="m 508.2914,766.55885 0.29362,76.03656"
> + id="path4661-6-1"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker4825-6-3)"
> + d="M 499.70201,876.47297 361.38296,920.80258"
> + id="path4661-1"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + </g>
> +</svg>
> diff --git a/Documentation/warpdrive/wd_q_addr_space.svg b/Documentation/warpdrive/wd_q_addr_space.svg
> new file mode 100644
> index 000000000000..5e6cf8e89908
> --- /dev/null
> +++ b/Documentation/warpdrive/wd_q_addr_space.svg
> @@ -0,0 +1,359 @@
> +<?xml version="1.0" encoding="UTF-8" standalone="no"?>
> +<!-- Created with Inkscape (http://www.inkscape.org/) -->
> +
> +<svg
> + xmlns:dc="http://purl.org/dc/elements/1.1/"
> + xmlns:cc="http://creativecommons.org/ns#"
> + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> + xmlns:svg="http://www.w3.org/2000/svg"
> + xmlns="http://www.w3.org/2000/svg"
> + xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
> + xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
> + width="210mm"
> + height="124mm"
> + viewBox="0 0 210 124"
> + version="1.1"
> + id="svg8"
> + inkscape:version="0.92.3 (2405546, 2018-03-11)"
> + sodipodi:docname="wd_q_addr_space.svg">
> + <defs
> + id="defs2">
> + <marker
> + inkscape:stockid="Arrow1Mend"
> + orient="auto"
> + refY="0"
> + refX="0"
> + id="marker5428"
> + style="overflow:visible"
> + inkscape:isstock="true">
> + <path
> + id="path5426"
> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> + transform="matrix(-0.4,0,0,-0.4,-4,0)"
> + inkscape:connector-curvature="0" />
> + </marker>
> + <marker
> + inkscape:isstock="true"
> + style="overflow:visible"
> + id="marker2922"
> + refX="0"
> + refY="0"
> + orient="auto"
> + inkscape:stockid="Arrow1Mend"
> + inkscape:collect="always">
> + <path
> + transform="matrix(-0.4,0,0,-0.4,-4,0)"
> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> + id="path2920"
> + inkscape:connector-curvature="0" />
> + </marker>
> + <marker
> + inkscape:stockid="Arrow1Mstart"
> + orient="auto"
> + refY="0"
> + refX="0"
> + id="Arrow1Mstart"
> + style="overflow:visible"
> + inkscape:isstock="true">
> + <path
> + id="path840"
> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> + transform="matrix(0.4,0,0,0.4,4,0)"
> + inkscape:connector-curvature="0" />
> + </marker>
> + <marker
> + inkscape:stockid="Arrow1Mend"
> + orient="auto"
> + refY="0"
> + refX="0"
> + id="Arrow1Mend"
> + style="overflow:visible"
> + inkscape:isstock="true"
> + inkscape:collect="always">
> + <path
> + id="path843"
> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> + transform="matrix(-0.4,0,0,-0.4,-4,0)"
> + inkscape:connector-curvature="0" />
> + </marker>
> + <marker
> + inkscape:stockid="Arrow1Mstart"
> + orient="auto"
> + refY="0"
> + refX="0"
> + id="Arrow1Mstart-5"
> + style="overflow:visible"
> + inkscape:isstock="true">
> + <path
> + inkscape:connector-curvature="0"
> + id="path840-1"
> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> + transform="matrix(0.4,0,0,0.4,4,0)" />
> + </marker>
> + <marker
> + inkscape:stockid="Arrow1Mend"
> + orient="auto"
> + refY="0"
> + refX="0"
> + id="Arrow1Mend-1"
> + style="overflow:visible"
> + inkscape:isstock="true">
> + <path
> + inkscape:connector-curvature="0"
> + id="path843-0"
> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> + transform="matrix(-0.4,0,0,-0.4,-4,0)" />
> + </marker>
> + <marker
> + inkscape:isstock="true"
> + style="overflow:visible"
> + id="marker2922-2"
> + refX="0"
> + refY="0"
> + orient="auto"
> + inkscape:stockid="Arrow1Mend"
> + inkscape:collect="always">
> + <path
> + transform="matrix(-0.4,0,0,-0.4,-4,0)"
> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> + id="path2920-9"
> + inkscape:connector-curvature="0" />
> + </marker>
> + <marker
> + inkscape:isstock="true"
> + style="overflow:visible"
> + id="marker2922-27"
> + refX="0"
> + refY="0"
> + orient="auto"
> + inkscape:stockid="Arrow1Mend"
> + inkscape:collect="always">
> + <path
> + transform="matrix(-0.4,0,0,-0.4,-4,0)"
> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> + id="path2920-0"
> + inkscape:connector-curvature="0" />
> + </marker>
> + <marker
> + inkscape:isstock="true"
> + style="overflow:visible"
> + id="marker2922-27-8"
> + refX="0"
> + refY="0"
> + orient="auto"
> + inkscape:stockid="Arrow1Mend"
> + inkscape:collect="always">
> + <path
> + transform="matrix(-0.4,0,0,-0.4,-4,0)"
> + style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> + id="path2920-0-0"
> + inkscape:connector-curvature="0" />
> + </marker>
> + </defs>
> + <sodipodi:namedview
> + id="base"
> + pagecolor="#ffffff"
> + bordercolor="#666666"
> + borderopacity="1.0"
> + inkscape:pageopacity="0.0"
> + inkscape:pageshadow="2"
> + inkscape:zoom="1.4"
> + inkscape:cx="401.66654"
> + inkscape:cy="218.12255"
> + inkscape:document-units="mm"
> + inkscape:current-layer="layer1"
> + showgrid="false"
> + inkscape:window-width="1916"
> + inkscape:window-height="1033"
> + inkscape:window-x="0"
> + inkscape:window-y="22"
> + inkscape:window-maximized="0" />
> + <metadata
> + id="metadata5">
> + <rdf:RDF>
> + <cc:Work
> + rdf:about="">
> + <dc:format>image/svg+xml</dc:format>
> + <dc:type
> + rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
> + <dc:title />
> + </cc:Work>
> + </rdf:RDF>
> + </metadata>
> + <g
> + inkscape:label="Layer 1"
> + inkscape:groupmode="layer"
> + id="layer1"
> + transform="translate(0,-173)">
> + <rect
> + style="opacity:0.82999998;fill:none;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:0.4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:0.82745098"
> + id="rect815"
> + width="21.262758"
> + height="40.350552"
> + x="55.509361"
> + y="195.00098"
> + ry="0" />
> + <rect
> + style="opacity:0.82999998;fill:none;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:0.4;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:0.82745098"
> + id="rect815-1"
> + width="21.24276"
> + height="43.732346"
> + x="55.519352"
> + y="235.26543"
> + ry="0" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="50.549229"
> + y="190.6078"
> + id="text1118"><tspan
> + sodipodi:role="line"
> + id="tspan1116"
> + x="50.549229"
> + y="190.6078"
> + style="stroke-width:0.26458332px">queue file address space</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + d="M 76.818568,194.95453 H 97.229281"
> + id="path1126"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + d="M 76.818568,235.20899 H 96.095361"
> + id="path1126-8"
> + inkscape:connector-curvature="0" />
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + d="m 76.762111,278.99778 h 19.27678"
> + id="path1126-0"
> + inkscape:connector-curvature="0" />
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + d="m 55.519355,265.20165 v 19.27678"
> + id="path1126-2"
> + inkscape:connector-curvature="0" />
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + d="m 76.762111,265.20165 v 19.27678"
> + id="path1126-2-1"
> + inkscape:connector-curvature="0" />
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-start:url(#Arrow1Mstart);marker-end:url(#Arrow1Mend)"
> + d="m 87.590896,194.76554 0,39.87648"
> + id="path1126-2-1-0"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-start:url(#Arrow1Mstart-5);marker-end:url(#Arrow1Mend-1)"
> + d="m 82.48822,235.77596 v 42.90029"
> + id="path1126-2-1-0-8"
> + inkscape:connector-curvature="0"
> + sodipodi:nodetypes="cc" />
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker2922)"
> + d="M 44.123633,195.3325 H 55.651907"
> + id="path2912"
> + inkscape:connector-curvature="0" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="32.217381"
> + y="196.27745"
> + id="text2968"><tspan
> + sodipodi:role="line"
> + id="tspan2966"
> + x="32.217381"
> + y="196.27745"
> + style="stroke-width:0.26458332px">offset 0</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="91.199554"
> + y="216.03946"
> + id="text1118-5"><tspan
> + sodipodi:role="line"
> + id="tspan1116-0"
> + x="91.199554"
> + y="216.03946"
> + style="stroke-width:0.26458332px">device region (mapped to device mmio or shared kernel driver memory)</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="86.188072"
> + y="244.50081"
> + id="text1118-5-6"><tspan
> + sodipodi:role="line"
> + id="tspan1116-0-4"
> + x="86.188072"
> + y="244.50081"
> + style="stroke-width:0.26458332px">static share virtual memory region (for device without share virtual memory)</tspan></text>
> + <flowRoot
> + xml:space="preserve"
> + id="flowRoot5699"
> + style="font-style:normal;font-weight:normal;font-size:11.25px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"><flowRegion
> + id="flowRegion5701"><rect
> + id="rect5703"
> + width="5182.8569"
> + height="385.71429"
> + x="34.285713"
> + y="71.09111" /></flowRegion><flowPara
> + id="flowPara5705" /></flowRoot> <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker2922-2)"
> + d="M 43.679028,206.85268 H 55.207302"
> + id="path2912-1"
> + inkscape:connector-curvature="0" />
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker2922-27)"
> + d="M 44.057004,224.23959 H 55.585278"
> + id="path2912-9"
> + inkscape:connector-curvature="0" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="24.139778"
> + y="202.40636"
> + id="text1118-5-3"><tspan
> + sodipodi:role="line"
> + id="tspan1116-0-6"
> + x="24.139778"
> + y="202.40636"
> + style="stroke-width:0.26458332px">device mmio region</tspan></text>
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="17.010948"
> + y="216.73672"
> + id="text1118-5-3-3"><tspan
> + sodipodi:role="line"
> + id="tspan1116-0-6-6"
> + x="17.010948"
> + y="216.73672"
> + style="stroke-width:0.26458332px">device kernel only region</tspan></text>
> + <path
> + style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker2922-27-8)"
> + d="M 43.981087,235.35153 H 55.509361"
> + id="path2912-9-2"
> + inkscape:connector-curvature="0" />
> + <text
> + xml:space="preserve"
> + style="font-style:normal;font-weight:normal;font-size:2.9765625px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
> + x="17.575975"
> + y="230.53285"
> + id="text1118-5-3-3-0"><tspan
> + sodipodi:role="line"
> + id="tspan1116-0-6-6-5"
> + x="17.575975"
> + y="230.53285"
> + style="stroke-width:0.26458332px">device user share region</tspan></text>
> + </g>
> +</svg>
> --
> 2.17.1
>


Attachments:
(No filename) (79.55 kB)
signature.asc (801.00 B)
Download all attachments

2018-11-20 05:18:27

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Mon, Nov 19, 2018 at 01:42:16PM -0500, Jerome Glisse wrote:
> On Mon, Nov 19, 2018 at 11:27:52AM -0700, Jason Gunthorpe wrote:
> > On Mon, Nov 19, 2018 at 11:48:54AM -0500, Jerome Glisse wrote:
> >
> > > Just to comment on this, any infiniband driver which use umem and do
> > > not have ODP (here ODP for me means listening to mmu notifier so all
> > > infiniband driver except mlx5) will be affected by same issue AFAICT.
> > >
> > > AFAICT there is no special thing happening after fork() inside any of
> > > those driver. So if parent create a umem mr before fork() and program
> > > hardware with it then after fork() the parent might start using new
> > > page for the umem range while the old memory is use by the child. The
> > > reverse is also true (parent using old memory and child new memory)
> > > bottom line you can not predict which memory the child or the parent
> > > will use for the range after fork().
> > >
> > > So no matter what you consider the child or the parent, what the hw
> > > will use for the mr is unlikely to match what the CPU use for the
> > > same virtual address. In other word:
> > >
> > > Before fork:
> > > CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
> > > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
> > >
> > > Case 1:
> > > CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
> > > CPU child: virtual addr ptr1 -> physical address = 0xDEAD
> > > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
> > >
> > > Case 2:
> > > CPU parent: virtual addr ptr1 -> physical address = 0xBEEF
> > > CPU child: virtual addr ptr1 -> physical address = 0xCAFE
> > > HARDWARE: virtual addr ptr1 -> physical address = 0xCAFE
> >
> > IIRC this is solved in IB by automatically calling
> > madvise(MADV_DONTFORK) before creating the MR.
> >
> > MADV_DONTFORK
> > .. This is useful to prevent copy-on-write semantics from changing the
> > physical location of a page if the parent writes to it after a
> > fork(2) ..
>
> This would work around the issue but this is not transparent ie
> range marked with DONTFORK no longer behave as expected from the
> application point of view.

Do you know what the difference is? The man page really gives no
hint..

Does it sometimes unmap the pages during fork?

I actually wonder if the kernel is a bit broken here, we have the same
problem with O_DIRECT and other stuff, right?

Really, if I have a get_user_pages FOLL_WRITE on a page and we fork,
then shouldn't the COW immediately be broken during the fork?

The kernel can't guarentee that an ongoing DMA will not write to those
pages, and it breaks the fork semantic to write to both processes.

> Also it relies on userspace doing the right thing (which is not
> something i usualy trust :)).

Well, if they do it wrong they get to keep all the broken bits :)

Jason

2018-11-24 04:50:26

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

On Fri, Nov 23, 2018 at 04:02:42PM +0800, Kenneth Lee wrote:

> It is already part of Jean's patchset. And that's why I built my solution on
> VFIO in the first place. But I think the concept of SVA and PASID is not
> compatible with the original VFIO concept space. You would not share your whole
> address space to a device at all in a virtual machine manager,
> wouldn't you?

Why not? That seems to fit VFIO's space just fine to me.. You might
need a new upcall to create a full MM registration, but that doesn't
seem unsuited.

Part of the point here is you should try to make sensible revisions to
existing subsystems before just inventing a new thing...

VFIO is deeply connected to the IOMMU, so enabling more general IOMMU
based approache seems perfectly fine to me..

> > Once the VFIO driver knows about this as a generic capability then the
> > device it exposes to userspace would use CPU addresses instead of DMA
> > addresses.
> >
> > The question is if your driver needs much more than the device
> > agnostic generic services VFIO provides.
> >
> > I'm not sure what you have in mind with resource management.. It is
> > hard to revoke resources from userspace, unless you are doing
> > kernel syscalls, but then why do all this?
>
> Say, I have 1024 queues in my accelerator. I can get one by opening the device
> and attach it with the fd. If the process exit by any means, the queue can be
> returned with the release of the fd. But if it is mdev, it will still be there
> and some one should tell the allocator it is available again. This is not easy
> to design in user space.

?? why wouldn't the mdev track the queues assigned using the existing
open/close/ioctl callbacks?

That is basic flow I would expect:

open(/dev/vfio)
ioctl(unity map entire process MM to mdev with IOMMU)

// Create a HQ queue and link the PASID in the HW to this HW queue
struct hw queue[..];
ioctl(create HW queue)

// Get BAR doorbell memory for the queue
bar = mmap()

// Submit work to the queue using CPU addresses
queue[0] = ...
writel(bar [..], &queue);

// Queue, SVA, etc is cleaned up when the VFIO closes
close()

Presumably the kernel has to handle the PASID and related for security
reasons, so they shouldn't go to userspace?

If there is something missing in vfio to do this is it looks pretty
small to me..

Jason