Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp1103791pxu; Thu, 17 Dec 2020 02:10:56 -0800 (PST) X-Google-Smtp-Source: ABdhPJz6Vi6k0mvVkjgRoqojZq4fzy7DQ1ryM3zgvqAKVCrwbS6OuRhxFrP1m2OZXndZoB6ZGpK1 X-Received: by 2002:a17:907:3f9e:: with SMTP id hr30mr22673973ejc.258.1608199856563; Thu, 17 Dec 2020 02:10:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608199856; cv=none; d=google.com; s=arc-20160816; b=RnB9hiyI+p6Dx040VwO4/nUaVvn/VwWpW1DBTLeWqg4wn40pqEo9JOMH3+C8cSiGuV OTs9imB2dsfpcG8I994YzvG5FiX1S38taPY0I6FIr4Oujgz8+/EQfpO5mZ2Gctt8dS1q QbnpULIOuTNE3tBdlEHFFLde4zTKxoH6MkYPadseAbfRBtqVIEoOGRydQIOeXUAvPsNc B9o3dd/LT0OAo65OChBTtT60vYDHRVUrROG3KFiPfYYFgMlKatfgtvkKteg2kVag+tcF cQkxws7CNMfLQh3SS4iOffgHiGXZD2yPMSNgglTVQabL9HChKeJRgltczBusf1XO/XGd FZdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=+tHtpcTR/08FM9RSBLxTm9nzhr1PLovQ8Q+Zogm9jkc=; b=OaNjS1eWRq+YNo9WpfiW3Dvgi9UrGfHvTZpdEvipLksq9lUEM8pyYq576WSkynD3qH JJyQNg0zKgAHcLDQUXaE3nSIIC6c1ypj0UHDYh24i11hdO3FVRm9yTy/ZSxOy2euX4Rn lIZ+ODUhTFeRAe0Tm7k5jAZhiozkbJT7oLffT0mIM5lxmRSGVHxRDmNJcrcIoC9j6SKK Gu/QhIVq8fphXleSJusO35DnMLQBcfr/xUAdd0MvEdkmxEpNkrGxhIoL6fUmHRgT62PT pUP5fx8eVdPn1OH3g2ouqepd1OSv8Sor2w/Mt2T2l3Wr6sBfjdgOVEx6V9c5IA6WlMpj f/tg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b="KcAoqw/j"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bd14si4118132edb.478.2020.12.17.02.10.32; Thu, 17 Dec 2020 02:10:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b="KcAoqw/j"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727404AbgLQKHQ (ORCPT + 99 others); Thu, 17 Dec 2020 05:07:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52240 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726601AbgLQKHM (ORCPT ); Thu, 17 Dec 2020 05:07:12 -0500 Received: from mail-ot1-x32a.google.com (mail-ot1-x32a.google.com [IPv6:2607:f8b0:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67B49C0617A7 for ; Thu, 17 Dec 2020 02:06:32 -0800 (PST) Received: by mail-ot1-x32a.google.com with SMTP id o11so26704419ote.4 for ; Thu, 17 Dec 2020 02:06:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+tHtpcTR/08FM9RSBLxTm9nzhr1PLovQ8Q+Zogm9jkc=; b=KcAoqw/jyUdVgzYx1Qus+jkgMPmz9ohPB+OT2c7OHiLbd/QlGQLrgY9+h+jigYMmwG gJcjfB0K9nzwrK/rM8SbjLvNV7TEkAZgfU0G937VK8O6XRsi11cAF+k5c7WGQwmeP0Hg i16xFaSU+39OwdJxuP1wPdxxHPi8aA8t2HBfc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+tHtpcTR/08FM9RSBLxTm9nzhr1PLovQ8Q+Zogm9jkc=; b=W4OugR7V0ode1+96TeNf7BhF6VhsmDShpuy+5PaV9GmagVSNNgC43je+i/o0alyncn JeEvuc4W68M+WaXbh0QeI/UjnHd3lkWoJLyjvSc4ZIgzvzXkkvDngjowCywJFhT1E2tO Taqo9sGjERNJfsyYwN8x6WmXmdSbqsbgCRHrS8iax98M+nHrss0wye0bD8SkaFGglmvo KCbM2b2nyDtbLcZ6+vjB01DLEpfgql4ZkJKpsBflOninOqkQ8OSwT0hH0E1NAfzXYCn1 WoIBx/YlJtBSniJ5ZSIJ9PoQ2bEbW+aVhWEodrbDGXRi97M4PvxVcMBGrA4DJYfDzLgg ZwBg== X-Gm-Message-State: AOAM533Nrkn9x/XFLiQnxLuVJMRWnCAA/d6vymFwoVUCrdfvGkuuA1tA mSdEF5ZXtGO/RZrknwxEFhPXLrzV4ghqvU00ChnDoQ== X-Received: by 2002:a05:6830:1bef:: with SMTP id k15mr20187278otb.303.1608199591499; Thu, 17 Dec 2020 02:06:31 -0800 (PST) MIME-Version: 1.0 References: <1606722505-16194-1-git-send-email-wendy.liang@xilinx.com> <83491d22-64b6-68bd-b7e2-787e0826712c@xilinx.com> In-Reply-To: <83491d22-64b6-68bd-b7e2-787e0826712c@xilinx.com> From: Daniel Vetter Date: Thu, 17 Dec 2020 11:06:20 +0100 Message-ID: Subject: Re: [PATCH v3 0/9] Xilinx AI engine kernel driver To: Jiaying Liang Cc: Alex Deucher , tejas.patel@xilinx.com, ravi.patel@xilinx.com, rajan.vaja@xilinx.com, Arnd Bergmann , devicetree , Greg KH , Dragan Cvetic , Michal Simek , Maling list - DRI developers , Rob Herring , manish.narani@xilinx.com, linux-arm-kernel , Derek Kiernan , Christian Koenig , LKML , linux-media Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 17, 2020 at 9:40 AM Jiaying Liang wrote: > > > On 12/15/20 7:23 AM, Alex Deucher wrote: > > On Mon, Dec 14, 2020 at 7:24 PM Jiaying Liang wrote: > >> On 12/11/20 11:39 AM, Daniel Vetter wrote: > >>> Hi all > >>> > >>> On Fri, Dec 11, 2020 at 8:03 PM Alex Deucher wrote: > >>>> On Mon, Nov 30, 2020 at 3:25 AM Wendy Liang wrote: > >>>>> AI engine is the acceleration engine provided by Xilinx. These engines > >>>>> provide high compute density for vector-based algorithms, and flexible > >>>>> custom compute and data movement. It has core tiles for compute and > >>>>> shim tiles to interface the FPGA fabric. > >>>>> > >>>>> You can check the AI engine architecture document for more hardware details: > >>>>> https://www.xilinx.com/support/documentation/architecture-manuals/am009-versal-ai-engine.pdf > >>>>> > >>>>> This patch series adds a Linux kernel driver to manage the Xilinx AI > >>>>> engine array device and AI engine partitions (groups of AI engine tiles > >>>>> dedicated to an application). > >>>> Hi Wendy, > >>>> > >>>> I think it would be good to provide an overview of how your stack > >>>> works in general. That would give reviewers a better handle on how > >>>> all of this fits together. I'd suggest including an overview in the > >>>> cover letter and also in the commit message and/or as a comment in the > >>>> code in one of the patches. I'm not really an expert when it comes to > >>>> FPGAs, but this basically looks like a pretty low level interface to > >>>> set up the data fabric for a kernel that will run on the soft logic or > >>>> maybe the microcontroller on the board. It doesn't have to be super > >>>> detailed, just a nice flow for how you might use this. E.g., > >>>> > >>>> Userspace uses ioctls X, Y, Z to configure the data fabric for the > >>>> FPGA kernel. The kernels can run on... . DMA access to system memory > >>>> for data sets can be allocated using ioctl A. DMA access is limited > >>>> by... . The user can then load the FPGA kernel on to one of the > >>>> engines using ioctl B and finally they can kick off the whole thing > >>>> using ioctl C. FPGA kernels are compiled using YYY toolchain and use > >>>> use the following runtime (link to runtime) to configure the data > >>>> fabric using ioctls X, Y, Z. > >>> At least for drm drivers we ideally have that as a .rst file in > >>> Documentation/. With that you can even do full svg graphs, or just dot > >>> graphs, of the overall stack if you really want to go overboard :-) > >>> > >>>> It would also be good to go over the security implications of the > >>>> design. E.g., can the FPGA kernel(s) access the DMA engine directly, > >>>> or is it limited to just the DMA regions set up by the ioctls? Also, > >>>> does the hardware and software design allow for multiple users? If > >>>> so, how does that work? > >>> I've also seen indications that there's some on-chip or on-card > >>> memory. How that's planned to be used and whether we want to manage > >>> this (maybe even with something like ttm) would be good to understand. > >>> > >>> All excellent questions from Alex, just figured I add some more. > >>> > >>> Cheers, Daniel > >> Hi Alex, Daniel, > >> > >> Below is an overview of the driver. > >> > >> AI engine kernel driver manages Xilinx AI engine device. An AI engine device > >> contains cores tiles and SHIM tiles. Core tiles are the computation tiles > >> , the SHIM tiles are the tiles interfacing to external components. > >> > >> +--------+--------+--------+--------+ > >> | Core | Core | Core | Core | ... > >> | | | | | > >> +-----------------------------------+ > >> | Core | Core | Core | Core | ... > >> | | | | | > >> +--------+--------+--------+--------- > >> ... > >> +--------+--------+-----------------+ > >> | SHIM | SHIM | SHIM |SHIM | > >> | PL | PL | PL |PL | NOC | > >> +---+----+---+----+---+-----+-------+ > >> AXI Streams | | | | |AXI MM > >> | | | | | > >> Events Singals | | | | | > >> | | | | | > >> | | | | | > >> +---+--------+--------+-----+ +--+------+ > >> | FPGA | | > >> NOC | > >> | | | | > >> +---------------------------+ +--+-------+ > >> | > >> | > >> +---+------+ > >> | DDR | > >> +----------+ > >> > >> Each Core tile contains computing module, local memory and DMA module. The > >> local memory DMA module takes data from or to the AXI streams and writes > >> it to or reads it from the local memory. The computing module can also > >> directly get/put data from/to the AXI stream. The AIE SHIM enables AIE tiles > >> to get/put data from/to AXI streams from FPGA, enables external master to > >> access AI engine address space through AXI MM. SHIM NoC module has DMA > >> engine, > >> which can access extern memory though AXI MM and push it to internal AXI > >> streams. > >> > >> At runtime, the AI engine tiles interconnection needs to be configured > >> so that > >> it can get fetch data from external components or adjacent tiles, and AI > >> engine > >> core program needs to be loaded. And then user application can push data > >> to the > >> AI engine array and start/stop AI engine core. AI engine device errors > >> can be > >> raised as events, the AI engine kernel driver listens to the events > >> interrupt > >> to monitor runtime async device errors. > >> > >> Instead of application directly interacting with the AI engine kernel > >> APIs, user > >> application/libraries interacts with AI engine userspace library: > >> https://github.com/Xilinx/embeddedsw/tree/master/XilinxProcessorIPLib/drivers/aienginev2 > >> It provides cross OSes low level functional abstraction such as how to > >> connect one > >> stream port to another stream port, how to configure core tile local DMA. > >> > >> The AI engine library can be used by other runtime libraries such as > >> Xilinx runtime (XRT) > >> library:https://xilinx.github.io/XRT/master/html/index.html, > >> which provides acceleration abstraction for Xilinx accelerators, it has > >> extensions > >> to interface to other acceleration framework such as OpenCL. > >> XRT provides buffer handling abstractions for user application to share > >> data between > >> applicaiton and devices. > >> > >> Here is an example of application runtime stack: > >> > >> +----------------------------+ > >> | Application | > >> | | > >> +----------------------------+ > >> | XRT | > >> | | > >> +----------------------------+ > >> | AIE Library | > >> | | > >> +----------------------------+ > >> +----------------------------------------+ > >> Kern +----------------------------+ > >> | AIE Partition +--+ > >> +----------------------------+ | > >> |----------------------------+ > >> +----------------------------+ > >> | AIE Device | > >> | | > >> +----------------------------+ > >> > >> > >> > >> The AI engine kernel driver provides the following user interfaces: > >> * AIE device driver is the root device driver to manage the partitions of > >> of the AI engine device array. AI engine array can be partitioned into > >> column wised isolated partitions. Each applicaiton can only access its > >> own partitions. > >> * AIE device driver monitors the interrupt from the AI enigne device. All > >> AI engine tiles shared the same interrupt for error events. > >> * AIE partition driver controls address mapping and access of the > >> registers/local memories of the tiles within a partition. > >> * It provides mmap operation to enable application to direclty > >> access the > >> tiles local memories for small data update such as parameter > >> update for > >> performance. > >> * It provides mmap operatio to map all the registers as readonly for > >> application to poll registers efficiently to check status. > >> * It provides ioctl for userspace to pass I/O commands to write/mask > >> write > >> the registers. How to configure is defined by userspace. Userspace > >> will > >> pass the I/O commands sequence to the kernel driver, and kernel driver > >> will validate the commands before it writes to the registers. > >> * It provides ioctl to import dmabuf and ioctl to configure the the > >> DMA module > >> in the SHIM tile which can access memory outside AI engine array. > >> > >> The buffer management is out of this driver. In the above example, user > >> application > >> uses Xilinx runtime(XRT), XRT is the one to manage the buffers. > >> > > So if I understand this correctly, this driver handles the resource > > management for the AI engines, PLs (programmable logic), and DMA > > streams. I think it's important to understand that there are multiple > > address spaces here. Normally when we talk about DMA in the kernel we > > are referring to devices accessing an external resource like system > > memory on the host CPU or another device's MMIO space (e.g., another > > PCIe device). It would be good to clarify which address spaces the > > DMAs in your diagram refer to. I think the DMAs in the AI engines are > > specifically for DMAs within the AI engine logic (e.g., between AIs in > > a partition). How is DMA to system memory handled? What about > > dedicated memory on the FPGA (e.g., HBM or DDR on the FPGA itself)? > > Is that what you are exposing as DMA bufs? When you allocate a > > DMA-buf for a partition, is that partition only allowed to access > > memory that is part of that DMA buf? I presume there is some > > scatter/gather table that sets up the DMA range that the partition can > > access? Who loads the soft logic (Is that the PL or some other IP)? > > Is the soft logic partitioned as well? If I had some soft logic I > > wanted to run on the FPGA, what would the kernel driver interaction > > sequence look like? Maybe using the OpenCL soft logic would be a good > > example. E.g., > > The AI engine driver only manage the resources within the AI > > engine array. There are two types of DMAs of the AI engine device. > > one is the AI engine tile local memory DMA which can only access the local > > memory. There is another type of DMA which is in the SHIM tile. This > > DMA can access external address space such as DDR. Although it can acess > > the memory on fpga if user configure the platform that way, it is > preferred to > > use PL data mover to move data between FPGA memory and AI engine device. > > The PL data mover will not be managed by the AI engine driver. > > One SHIM DMA has up to 16 buffer descriptors to use. > > Xilinx FPGA manager is the one used to program the FPGA soft logic. > > E.g. when XRT is used, if AI engine is connected to FPGA logic, the XRT > stack is > > the one to manage the configuration sequence. > > > 1. user has soft logic blob generated by their soft logic compiler (is > > this compiler open source?) > The soft logic blob is generated by Xilinx tools which is not open > source yet. > > 2. user calls AI engine kernel driver to allocate the required > > resources (AI engines, AI engine DMAs, doorbells of some sort? etc.) > > User will call AI engine kernel driver to allocate required resources within > > the AI engine array at runtime. > > However the patches for it is not in this patch set. > > > 3. user calls AI engine kernel driver to allocate system memory and/or > > FGPA memory that can be used by the soft logic blob > > AI engine kernel driver doesn't allocate system memory. User can use other > > kernel driver to allocate memory. > > E.g. when XRT is used, user calls XRT kernel driver (zocl) to allocate > system memory. > > So far, the FPGA memory is usually assigned to a soft data mover when > the platform is > > created. Are you considering to have the FPGA memory in the DMA pool of the > > system? If it is dedicated to a device, can reserved memory solve this > problem? > > The AI engine kernel driver doesn't consider this yet. > > > 4. user calls AI engine kernel driver to load soft logic > > I assume you are referring to the soft logic on the FPGA side which is not > > part of the AI engine device. FPGA manager is the one to load the soft > logic on FPGA. > > > 5. user interfaces with soft logic (how? presumably via some memory > > resource allocated in 2 and 3?) > > I assume you are referring to the soft logic on the FPGA side (not the > AI engine device) > > The user interface with soft logic is managed by the soft logic IP driver. > > Each soft logic has some memory mapped control registers. User can > access those > > registers through the soft logic IP driver. > > About memory allocation, I think it is better to manage the shared > memory out of > > a specific device driver. Are you looking for memory management which covers > > both the system memory and fpga memory, and the device can specify which > memory > > it prefers? Ok, I think the picture is getting clearer. But now I'm wondering why you have any interactions with dma-buf in this patch series here? -Daniel > Thanks, > > Wendy > > > > > Thanks, > > > > Alex > > > > > >> Best Regards, > >> > >> Wendy > >> > >>>> Thanks, > >>>> > >>>> Alex > >>>> > >>>> > >>>>> v3: > >>>>> * unlock AIE dev mutex after failed to gain the partition lock in > >>>>> errors handing > >>>>> * replace pointer with __u64 and enum with __u32 in ioctl > >>>>> > >>>>> v2: > >>>>> * Fix dtschema check errors > >>>>> * Fix test bot warning on interrupt implementation. Removed set but > >>>>> unused varaible. > >>>>> * Fix compilation unused function warning of firmware change in case > >>>>> ZynqMP firmware is not configured > >>>>> * There are other warning on ZynqMP firmware reported from testbot > >>>>> which is not introduced by this patch set. > >>>>> "[PATCH] firmware: xlnx-zynqmp: fix compilation warning" is submitted > >>>>> for those fixes. > >>>>> > >>>>> > >>>>> Izhar Ameer Shaikh (1): > >>>>> firmware: xilinx: Add IOCTL support for AIE ISR Clear > >>>>> > >>>>> Nishad Saraf (2): > >>>>> misc: xilinx-ai-engine: Add support to request device management > >>>>> services > >>>>> misc: xilinx-ai-engine: Add support for servicing error interrupts > >>>>> > >>>>> Wendy Liang (6): > >>>>> dt-binding: soc: xilinx: ai-engine: Add AI engine binding > >>>>> misc: Add Xilinx AI engine device driver > >>>>> misc: xilinx-ai-engine: Implement AI engine cleanup sequence > >>>>> misc: xilinx-ai-engine: expose AI engine tile memories to userspace > >>>>> misc: xilinx-ai-engine: add setting shim dma bd operation > >>>>> misc: xilinx-ai-engine: add request and release tiles > >>>>> > >>>>> .../bindings/soc/xilinx/xlnx,ai-engine.yaml | 126 ++++ > >>>>> MAINTAINERS | 8 + > >>>>> drivers/firmware/xilinx/zynqmp.c | 14 + > >>>>> drivers/misc/Kconfig | 12 + > >>>>> drivers/misc/Makefile | 1 + > >>>>> drivers/misc/xilinx-ai-engine/Makefile | 16 + > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-aie.c | 608 +++++++++++++++++++ > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-clock.c | 245 ++++++++ > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-dev.c | 496 ++++++++++++++++ > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-dma.c | 481 +++++++++++++++ > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-internal.h | 519 ++++++++++++++++ > >>>>> .../misc/xilinx-ai-engine/ai-engine-interrupt.c | 659 +++++++++++++++++++++ > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-mem.c | 275 +++++++++ > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-part.c | 635 ++++++++++++++++++++ > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-res.c | 219 +++++++ > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-reset.c | 159 +++++ > >>>>> include/linux/firmware/xlnx-zynqmp.h | 8 + > >>>>> include/uapi/linux/xlnx-ai-engine.h | 238 ++++++++ > >>>>> 18 files changed, 4719 insertions(+) > >>>>> create mode 100644 Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/Makefile > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-aie.c > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-clock.c > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-dev.c > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-dma.c > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-internal.h > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-interrupt.c > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-mem.c > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-part.c > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-res.c > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-reset.c > >>>>> create mode 100644 include/uapi/linux/xlnx-ai-engine.h > >>>>> > >>>>> -- > >>>>> 2.7.4 > >>>>> > >>>>> _______________________________________________ > >>>>> dri-devel mailing list > >>>>> dri-devel@lists.freedesktop.org > >>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >>>> _______________________________________________ > >>>> dri-devel mailing list > >>>> dri-devel@lists.freedesktop.org > >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch