Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp626788imu; Tue, 20 Nov 2018 04:34:41 -0800 (PST) X-Google-Smtp-Source: AFSGD/UoO59lxV2Dm1etzbD1R1U/JRcGxDul26Z94XZM+4rUc9c9EscQbJvWAk6TNYtNTQJ5Jlgw X-Received: by 2002:a17:902:6946:: with SMTP id k6mr2058917plt.101.1542717281091; Tue, 20 Nov 2018 04:34:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542717281; cv=none; d=google.com; s=arc-20160816; b=Kv+x4qW+d4wGUDgxBJs23tfKhfiZklVbRfdEevx3si+autfcIk3nriogRUFtmrdAUg +Edzrvw4LBV/3tq4fNmrcntCfTvstPJisCbB2YQJUOlsVZ2oHw2x8nYAwyHHISsg8Hf0 rJCVmQOf555oPc920Iam02lrkTIzQjCe00Pa1113+BFWM3ruWVXSskaserRcdnY03Exk nNlfNIdnKMtTIrND99Lk4WfeOAZHcRDLizAO2M89n7hGuUu8mLgLjRNZuRbgo55cAf88 E5aclKDvXC+rDQsNKnWO0zsKWDLGfubG2lriBs5e9sD0qvCG73Xs7rX7YYvXQK540j7V COLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date; bh=1ypENzY2hNotQXJ/m2Bt/pwXZ4V3XJi+VCoTa8qiC94=; b=ov+s1gaDY/Pmfwvsr8eAU7lbEgQg5GdRhnnlbZkQUg7G9CyJCXrP6Pi6YPALjmWvsv nZ6ypmLkD9nsYmCiiOKeb0xYcmKvQY/faOsUJQFou1C4o8oatrPYKdIrJRsEcZx5hjBp w+5TqHWSkvhyFDqI/mlgZgDCCevn5ucaLXns9e2ewzZ0iZpCVHSRqPDhgWREF2mm0N6E ho4gTOO16NMdu0EKyGreXL1X7JkkpbFKJPyVxsc+RorBZwsJ58770wdtNOk+n9h16dGV YfEE8nS2yjiOpVrX3Me4TzSEjq9eAc08nQlux5qGtG0/s3v8GkMs2A1nvPl0vkdpXcrk ftJw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h7si22531342pls.326.2018.11.20.04.34.25; Tue, 20 Nov 2018 04:34:41 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727215AbeKTTp2 convert rfc822-to-8bit (ORCPT + 99 others); Tue, 20 Nov 2018 14:45:28 -0500 Received: from szxga07-in.huawei.com ([45.249.212.35]:58870 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726479AbeKTTp2 (ORCPT ); Tue, 20 Nov 2018 14:45:28 -0500 Received: from DGGEMS410-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 6AC8BD0E67533; Tue, 20 Nov 2018 17:17:18 +0800 (CST) Received: from localhost (10.202.226.46) by DGGEMS410-HUB.china.huawei.com (10.3.19.210) with Microsoft SMTP Server id 14.3.408.0; Tue, 20 Nov 2018 17:17:09 +0800 Date: Tue, 20 Nov 2018 09:16:50 +0000 From: Jonathan Cameron To: Jason Gunthorpe CC: Kenneth Lee , Leon Romanovsky , Kenneth Lee , Tim Sell , , Alexander Shishkin , Zaibo Xu , , , , Christoph Lameter , Hao Fang , Gavin Schenk , "RDMA mailing list" , Zhou Wang , "Doug Ledford" , Uwe =?ISO-8859-1?Q?Kleine-K=F6nig?= , David Kershner , Johan Hovold , Cyrille Pitchen , Sagar Dharia , Jens Axboe , , linux-netdev , Randy Dunlap , , Vinod Koul , , Philippe Ombredanne , Sanyog Kale , "David S. Miller" , , "Jean-Philippe Brucker" , Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce Message-ID: <20181120091650.0000419a@huawei.com> In-Reply-To: <20181120032939.GR4890@ziepe.ca> References: <20181112075807.9291-1-nek.in.cn@gmail.com> <20181112075807.9291-2-nek.in.cn@gmail.com> <20181113002354.GO3695@mtr-leonro.mtl.com> <95310df4-b32c-42f0-c750-3ad5eb89b3dd@gmail.com> <20181114160017.GI3759@mtr-leonro.mtl.com> <20181115085109.GD157308@Turing-Arch-b> <20181115145455.GN3759@mtr-leonro.mtl.com> <20181119091405.GE157308@Turing-Arch-b> <20181119184954.GB4890@ziepe.ca> <20181120030702.GH157308@Turing-Arch-b> <20181120032939.GR4890@ziepe.ca> Organization: Huawei X-Mailer: Claws Mail 3.16.0 (GTK+ 2.24.32; i686-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 8BIT X-Originating-IP: [10.202.226.46] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org +CC Jean-Phillipe and iommu list. On Mon, 19 Nov 2018 20:29:39 -0700 Jason Gunthorpe wrote: > On Tue, Nov 20, 2018 at 11:07:02AM +0800, Kenneth Lee wrote: > > On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote: > > > Date: Mon, 19 Nov 2018 11:49:54 -0700 > > > From: Jason Gunthorpe > > > To: Kenneth Lee > > > CC: Leon Romanovsky , Kenneth Lee , > > > Tim Sell , linux-doc@vger.kernel.org, Alexander > > > Shishkin , Zaibo Xu > > > , zhangfei.gao@foxmail.com, linuxarm@huawei.com, > > > haojian.zhuang@linaro.org, Christoph Lameter , Hao Fang > > > , Gavin Schenk , RDMA mailing > > > list , Zhou Wang , > > > Doug Ledford , Uwe Kleine-K?nig > > > , David Kershner > > > , Johan Hovold , Cyrille > > > Pitchen , Sagar Dharia > > > , Jens Axboe , > > > guodong.xu@linaro.org, linux-netdev , Randy Dunlap > > > , linux-kernel@vger.kernel.org, Vinod Koul > > > , linux-crypto@vger.kernel.org, Philippe Ombredanne > > > , Sanyog Kale , "David S. > > > Miller" , linux-accelerators@lists.ozlabs.org > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > > User-Agent: Mutt/1.9.4 (2018-02-28) > > > Message-ID: <20181119184954.GB4890@ziepe.ca> > > > > > > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote: > > > > > > > If the hardware cannot share page table with the CPU, we then need to have > > > > some way to change the device page table. This is what happen in ODP. It > > > > invalidates the page table in device upon mmu_notifier call back. But this cannot > > > > solve the COW problem: if the user process A share a page P with device, and A > > > > forks a new process B, and it continue to write to the page. By COW, the > > > > process B will keep the page P, while A will get a new page P'. But you have > > > > no way to let the device know it should use P' rather than P. > > > > > > Is this true? I thought mmu_notifiers covered all these cases. > > > > > > The mm_notifier for A should fire if B causes the physical address of > > > A's pages to change via COW. > > > > > > And this causes the device page tables to re-synchronize. > > > > I don't see such code. The current do_cow_fault() implemenation has nothing to > > do with mm_notifer. > > Well, that sure sounds like it would be a bug in mmu_notifiers.. > > But considering Jean's SVA stuff seems based on mmu notifiers, I have > a hard time believing that it has any different behavior from RDMA's > ODP, and if it does have different behavior, then it is probably just > a bug in the ODP implementation. > > > > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it support > > > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you don't need > > > > to write any code for that. Because it has been done by IOMMU framework. If it > > > > > > Looks like the IOMMU code uses mmu_notifier, so it is identical to > > > IB's ODP. The only difference is that IB tends to have the IOMMU page > > > table in the device, not in the CPU. > > > > > > The only case I know if that is different is the new-fangled CAPI > > > stuff where the IOMMU can directly use the CPU's page table and the > > > IOMMU page table (in device or CPU) is eliminated. > > > > Yes. We are not focusing on the current implementation. As mentioned in the > > cover letter. We are expecting Jean Philips' SVA patch: > > git://linux-arm.org/linux-jpb. > > This SVA stuff does not look comparable to CAPI as it still requires > maintaining seperate IOMMU page tables. > > Also, those patches from Jean have a lot of references to > mmu_notifiers (ie look at iommu_mmu_notifier). > > Are you really sure it is actually any different at all? > > > > Anyhow, I don't think a single instance of hardware should justify an > > > entire new subsystem. Subsystems are hard to make and without multiple > > > hardware examples there is no way to expect that it would cover any > > > future use cases. > > > > Yes. That's our first expectation. We can keep it with our driver. But because > > there is no user driver support for any accelerator in mainline kernel. Even the > > well known QuickAssit has to be maintained out of tree. So we try to see if > > people is interested in working together to solve the problem. > > Well, you should come with patches ack'ed by these other groups. > > > > If all your driver needs is to mmap some PCI bar space, route > > > interrupts and do DMA mapping then mediated VFIO is probably a good > > > choice. > > > > Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion and > > try not to add complexity to the mm subsystem. > > Why would a mediated VFIO driver touch the mm subsystem? Sounds like > you don't have a VFIO driver if it needs to do stuff like that... > > > > If it needs to do a bunch of other stuff, not related to PCI bar > > > space, interrupts and DMA mapping (ie special code for compression, > > > crypto, AI, whatever) then you should probably do what Jerome said and > > > make a drivers/char/hisillicon_foo_bar.c that exposes just what your > > > hardware does. > > > > Yes. If no other accelerator driver writer is interested. That is the > > expectation:) > > I don't think it matters what other drivers do. > > If your driver does not need any other kernel code then VFIO is > sensible. In this kind of world you will probably have a RDMA-like > userspace driver that can bring this to a common user space API, even > if one driver use VFIO and a different driver uses something else. > > > You create some connections (queues) to NIC, RSA, and AI engine. Then you got > > data direct from the NIC and pass the pointer to RSA engine for decryption. The > > CPU then finish some data taking or operation and then pass through to the AI > > engine for CNN calculation....This will need a place to maintain the same > > address space by some means. > > How is this any different from what we have today? > > SVA is not something even remotely new, IB has been doing various > versions of it for 20 years. > > Jason