Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp671748imm; Wed, 1 Aug 2018 03:25:09 -0700 (PDT) X-Google-Smtp-Source: AAOMgpebOb/YFwNJZxP+qtNAvpA9o+H2mdYtsqByDomOyyTxMRpoLWM2ZjkGb9KF/Qqm3aTI90ZG X-Received: by 2002:a63:fa18:: with SMTP id y24-v6mr24036991pgh.362.1533119109600; Wed, 01 Aug 2018 03:25:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533119109; cv=none; d=google.com; s=arc-20160816; b=ya8ygVtob+XtRFnmUs1YuXetI2ijetNZpfaWZlP9MKDd8ZfpxaVpV8FpgVz1QzlLIM fjiuxGOuIAE809DM73GSXS0rq1FtAJm4LBQu+KedlMLENcRLW/GVUKMFcYutGDuHjnYX Wc3Eu0C8tBCzY63V9LwpRLt40seU/y7VaV0YUNcFNZwrfCroUqkdyzWgkjxWKbHyjjeW dJIqUAivp/jzCmSuMXNwMdZHQ/hC/Fmjb+8AucRfW4hHWKMXDc7I5wIii+IZWE/w9jbr Jk5nnWjAaxO/5Ao7Z+CNVbnwUER7bNFNIFoqmo8SsW2gOPnZ7MpXwrdfRD9/yeChnhWT m59A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=OSd5cwh5ucpAZl5U/0Zl3XcUmV4KNvQXG2PCIRbmgao=; b=NI0yUjH4CzxLuBaNOYDT3poEVaU08JpuP+iaLSOxBOqUmkQKGLzf2ricag8x6xD177 dWEJbgWaIdYDlD/xhqDBR9Q5O73Bw+EEryM3xrG23amZSQlX9+4GytsSRfFjuUR8zZ1/ Sr8ymgbWADVi8uHCECJ/KvN2D2iAUUAZF23BkEAwekLJAOBOrHlWB0/sxoX+WuoSGQCi EyW8h3WPMMr0VsGvYcjQ6Th0m0t4xrjHpwxoUAySvNXpkLl3lfVCB/AiYnzorLR3qhvN TdC+gpdbGgVHweC6ta7IAsiaGFZyqGDrvYVTt82cOr46C6+ufYlz2RIKq8u9ONV1k/tS jJ/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ofbTgfRR; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 186-v6si16845218pff.270.2018.08.01.03.24.54; Wed, 01 Aug 2018 03:25:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ofbTgfRR; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387663AbeHAMJD (ORCPT + 99 others); Wed, 1 Aug 2018 08:09:03 -0400 Received: from mail-qt0-f193.google.com ([209.85.216.193]:35829 "EHLO mail-qt0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387528AbeHAMJD (ORCPT ); Wed, 1 Aug 2018 08:09:03 -0400 Received: by mail-qt0-f193.google.com with SMTP id a5-v6so19238412qtp.2; Wed, 01 Aug 2018 03:23:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=OSd5cwh5ucpAZl5U/0Zl3XcUmV4KNvQXG2PCIRbmgao=; b=ofbTgfRRvR+jrKb21pmAfP+V8q4XzTq339i3vcVVVFd8nRGGhfiJzaT2l5Ew1ScjaX iInbv1+2yN28s6a4t0XB6m7/D6ZVk5puQlJa9mWed4WwJkfGdpKPShtcOLfeMHAyUcwQ 4JOSpmuGrX5L16U3BK5yRnnfV0t1afOkfCAOSjMVjJ3JZk8pNjdGtftnHcf5zm6Nf3mx N9/86x6moimxWOt8DhxHHBXr8q0sinDaKepzBLcz/5SeAtwI5g8UV/jVLzvYWWqNtAJ3 w3yDZ8BCoGeUSmlc5ywNM0J53fjjFRi++n/hUFGpF650yKEbcudgOoqaeWZKmqzZBv7e fQYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=OSd5cwh5ucpAZl5U/0Zl3XcUmV4KNvQXG2PCIRbmgao=; b=glShtoAtH7Vp5m+cWahLpSwEf5SBEZmL9Mo9QfenO1uAbKjSLASzUUa6FjawCfRCE0 8oH6LqZWEnhCf0O8GMRkH4T+igkOrwePQRtslT6z3sEHd5GqUtkp17orXaUT5/WB3Drl qBzIb/Wd45XT/LKUlMdY20ikUAqhV3oAbz4ZMzIAj/kjqendGAEKim5DAlbw0Am4UH6J ND+pwdlVjcgMqJqx7Aeaj+K0h+iGxvj3TiQ2h9u4hM24LkXDcow/hc6MpmGBegf2uzR1 SQJ/6Rb40hultdIWdycOcxrelCVa8JLNCO70GinHo6EoxDELf00lKriwTBNDNI/JYXP+ zNNA== X-Gm-Message-State: AOUpUlFdBpvsTOZ8D7zxuMrSTMcz+x/KatrKhbFt0oe11CTgqtPDzJDB aXJ7+m5yZpG6d0KftMSmKvw= X-Received: by 2002:ac8:d4a:: with SMTP id r10-v6mr24246915qti.271.1533119035552; Wed, 01 Aug 2018 03:23:55 -0700 (PDT) Received: from localhost.localdomain ([104.237.86.144]) by smtp.gmail.com with ESMTPSA id s19-v6sm11176890qtj.61.2018.08.01.03.23.42 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 Aug 2018 03:23:54 -0700 (PDT) From: Kenneth Lee To: Jonathan Corbet , Herbert Xu , "David S . Miller" , Joerg Roedel , Alex Williamson , Kenneth Lee , Hao Fang , Zhou Wang , Zaibo Xu , Philippe Ombredanne , Greg Kroah-Hartman , Thomas Gleixner , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org, iommu@lists.linux-foundation.org, kvm@vger.kernel.org, linux-accelerators@lists.ozlabs.org, Lu Baolu , Sanjay Kumar Cc: linuxarm@huawei.com Subject: [RFC PATCH 1/7] vfio/spimdev: Add documents for WarpDrive framework Date: Wed, 1 Aug 2018 18:22:15 +0800 Message-Id: <20180801102221.5308-2-nek.in.cn@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180801102221.5308-1-nek.in.cn@gmail.com> References: <20180801102221.5308-1-nek.in.cn@gmail.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Kenneth Lee WarpDrive is a common user space accelerator framework. Its main component in Kernel is called spimdev, Share Parent IOMMU Mediated Device. It exposes the hardware capabilities to the user space via vfio-mdev. So processes in user land can obtain a "queue" by open the device and direct access the hardware MMIO space or do DMA operation via VFIO interface. WarpDrive is intended to be used with Jean Philippe Brucker's SVA patchset (it is still in RFC stage) to support multi-process. But This is not a must. Without the SVA patches, WarpDrive can still work for one process for every hardware device. This patch add detail documents for the framework. Signed-off-by: Kenneth Lee --- Documentation/00-INDEX | 2 + Documentation/warpdrive/warpdrive.rst | 153 ++++++ Documentation/warpdrive/wd-arch.svg | 732 ++++++++++++++++++++++++++ Documentation/warpdrive/wd.svg | 526 ++++++++++++++++++ 4 files changed, 1413 insertions(+) create mode 100644 Documentation/warpdrive/warpdrive.rst create mode 100644 Documentation/warpdrive/wd-arch.svg create mode 100644 Documentation/warpdrive/wd.svg diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX index 2754fe83f0d4..9959affab599 100644 --- a/Documentation/00-INDEX +++ b/Documentation/00-INDEX @@ -410,6 +410,8 @@ vm/ - directory with info on the Linux vm code. w1/ - directory with documents regarding the 1-wire (w1) subsystem. +warpdrive/ + - directory with documents about WarpDrive accelerator framework. watchdog/ - how to auto-reboot Linux if it has "fallen and can't get up". ;-) wimax/ diff --git a/Documentation/warpdrive/warpdrive.rst b/Documentation/warpdrive/warpdrive.rst new file mode 100644 index 000000000000..3792b2780ea6 --- /dev/null +++ b/Documentation/warpdrive/warpdrive.rst @@ -0,0 +1,153 @@ +Introduction of WarpDrive +========================= + +*WarpDrive* is a general accelerator framework built on top of vfio. +It can be taken as a light weight virtual function, which you can use without +*SR-IOV* like facility and can be shared among multiple processes. + +It can be used as the quick channel for accelerators, network adaptors or +other hardware in user space. It can make some implementation simpler. E.g. +you can reuse most of the *netdev* driver and just share some ring buffer to +the user space driver for *DPDK* or *ODP*. Or you can combine the RSA +accelerator with the *netdev* in the user space as a Web reversed proxy, etc. + +The name *WarpDrive* is simply a cool and general name meaning the framework +makes the application faster. In kernel, the framework is called SPIMDEV, +namely "Share Parent IOMMU Mediated Device". + + +How does it work +================ + +*WarpDrive* takes the Hardware Accelerator as a heterogeneous processor which +can share some load for the CPU: + +.. image:: wd.svg + :alt: This is a .svg image, if your browser cannot show it, + try to download and view it locally + +So it provides the capability to the user application to: + +1. Send request to the hardware +2. Share memory with the application and other accelerators + +These requirements can be fulfilled by VFIO if the accelerator can serve each +application with a separated Virtual Function. But a *SR-IOV* like VF (we will +call it *HVF* hereinafter) design is too heavy for the accelerator which +service thousands of processes. + +And the *HVF* is not good for the scenario that a device keep most of its +resource but share partial of the function to the user space. E.g. a *NIC* +works as a *netdev* but share some hardware queues to the user application to +send packets direct to the hardware. + +*VFIO-mdev* can solve some of the problem here. But *VFIO-mdev* has two problem: + +1. it cannot make use of its parent device's IOMMU. +2. it is assumed to be openned only once. + +So it will need some add-on for better resource control and let the VFIO +driver be aware of this. + + +Architecture +------------ + +The full *WarpDrive* architecture is represented in the following class +diagram: + +.. image:: wd-arch.svg + :alt: This is a .svg image, if your browser cannot show it, + try to download and view it locally + +The idea is: when a device is probed, it can be registered to the general +framework, e.g. *netdev* or *crypto*, and the *SPIMDEV* at the same time. + +If *SPIMDEV* is registered. A *mdev* creation interface is created. Then the +system administrator can create a *mdev* in the user space and set its +parameters via its sysfs interfacev. But not like the other mdev +implementation, hardware resource will not be allocated until it is opened by +an application. + +With this strategy, the hardware resource can be easily scheduled among +multiple processes. + + +The user API +------------ + +We adopt a polling style interface in the user space: :: + + int wd_request_queue(int container, struct wd_queue *q, + const char *mdev) + void wd_release_queue(struct wd_queue *q); + + int wd_send(struct wd_queue *q, void *req); + int wd_recv(struct wd_queue *q, void **req); + int wd_recv_sync(struct wd_queue *q, void **req); + +the ..._sync() interface is a wrapper to the non sync version. They wait on the +device until the queue become available. + +Memory can be done by VFIO DMA API. Or the following helper function can be +adopted: :: + + int wd_mem_share(struct wd_queue *q, const void *addr, + size_t size, int flags); + void wd_mem_unshare(struct wd_queue *q, const void *addr, size_t size); + +Todo: if the IOMMU support *ATS* or *SMMU* stall mode. mem share is not +necessary. This can be check with SPImdev sysfs interface. + +The user API is not mandatory. It is simply a suggestion and hint what the +kernel interface is supposed to support. + + +The user driver +--------------- + +*WarpDrive* expose the hardware IO space to the user process (via *mmap*). So +it will require user driver for implementing the user API. The following API +is suggested for a user driver: :: + + int open(struct wd_queue *q); + int close(struct wd_queue *q); + int send(struct wd_queue *q, void *req); + int recv(struct wd_queue *q, void **req); + +These callback enable the communication between the user application and the +device. You will still need the hardware-depend algorithm driver to access the +algorithm functionality of the accelerator itself. + + +Multiple processes support +========================== + +In the latest mainline kernel (4.18) when this document is written. +Multi-process is not supported in VFIO yet. + +*JPB* has a patchset to enable this[2]_. We have tested it with our hardware +(which is known as *D06*). It works well. *WarpDrive* rely on them to support +multiple processes. If it is not enabled, *WarpDrive* can still work, but it +support only one process, which will share the same io map table with kernel +(but the user application cannot access the kernel address, So it is not going +to be a security problem) + + +Legacy Mode Support +=================== +For the hardware on which IOMMU is not support, WarpDrive can run on *NOIOMMU* +mode. + + +References +========== +.. [1] Accroding to the comment in in mm/gup.c, The *gup* is only safe within + a syscall. Because it can only keep the physical memory in place + without making sure the VMA will always point to it. Maybe we should + raise the VM_PINNED patchset (see + https://lists.gt.net/linux/kernel/1931993) again to solve this problem. +.. [2] https://patchwork.kernel.org/patch/10394851/ +.. [3] https://zhuanlan.zhihu.com/p/35489035 + +.. vim: tw=78 diff --git a/Documentation/warpdrive/wd-arch.svg b/Documentation/warpdrive/wd-arch.svg new file mode 100644 index 000000000000..1b3d1817c4ba --- /dev/null +++ b/Documentation/warpdrive/wd-arch.svg @@ -0,0 +1,732 @@ + + + + + + + + + + + + + + + + + + + + + + + + generation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + WarpDrive + + + user_driver + + + + spimdev + + + + + Device Driver + * + 1 + + <<vfio>>resource management + 1 + * + + + other standard framework(crypto/nic/others) + + <<lkm>> + register as mdev with "share parent iommu" attribute + register to other subsystem + <<user_lib>> + <<vfio>>Hardware Accessing + wd user api + + + + Device(Hardware) + + Share Parent's IOMMU mdev + + diff --git a/Documentation/warpdrive/wd.svg b/Documentation/warpdrive/wd.svg new file mode 100644 index 000000000000..87ab92ebfbc6 --- /dev/null +++ b/Documentation/warpdrive/wd.svg @@ -0,0 +1,526 @@ + + + + + + + + + + + + + + + + generation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + user application (running by the CPU + + + + + MMU + + + Memory + + + IOMMU + + + Hardware Accelerator + + + + -- 2.17.1