Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp125346imu; Mon, 19 Nov 2018 19:07:50 -0800 (PST) X-Google-Smtp-Source: AJdET5d0+LLwiqPcaArwPhGCSTzliTVoY16zcoUvwNHEkuGo2r+cjGjcW5jJVehIBiGI+FUmR6Hi X-Received: by 2002:a62:c302:: with SMTP id v2mr324404pfg.155.1542683270591; Mon, 19 Nov 2018 19:07:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542683270; cv=none; d=google.com; s=arc-20160816; b=VU+9VX9IDWPzkIiB1ffzXvtFTxXTIoJhsYIIVkxFldN72oN0LHb6mL3J+zH2TwRLb+ FRrvNGp0SN4aa9rkNnmE63UvYsgp4U21sqSDjkRRdUISRAShFxAQfSsks3MNGv1y9KvK 8GBemiNp6jTPjINUf94FWayqbH6q+VTVbuQWELCFU5jN1DuVBEq8O2+/IooASc/3M46H cEvr9RV4dsLWlvLdGTV3IWFo9H/tBG8WZRrpixhVloLNounWNZjymT9oIe2Gxq6qU61K qMw/zUh65cLmRNUHtL/nF0qUoUd6N3XoGtEmSFpFOQiZog7gpzsrCAMAi237wS1doaUA oQ8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=eSfThcmsP3VGT281WJFOqhCe0k09PKrsfbmBpsTQ6Lg=; b=jH72XoZm8Nw4nHwa0aq7wBTkJeYsM9NS3FvToHUEuwJcyEyAW7cFnQzOpSC92/lAH/ +RCFLwIYgOXsrNd+qfDc6WUHZJKvvTbEt+LvlJlSjqWwNjPaLeDgZueKfsn9nuBAWpCO VEMQpBYhZNMclB5jIEm3hQPa3XjVMKJACBH4YKkWmaU2UR4tL0ROh4jQ3bIP+bqNmPJF Us/DJF0NKkKUcD8oTfC5oO0E7SzBZtbwzn/WeNQtUpdoiW5M1VrVJaQLaQ1T2F8m/q2H R+IBSfv7380y1Mzq7DJwcno+XSVjyLp1MoQG7DBwXBA7XMhzVD/AQ4Cwhv/4lYf0avnP hkJQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a21-v6si32942897pls.13.2018.11.19.19.07.35; Mon, 19 Nov 2018 19:07:50 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730382AbeKTNcV (ORCPT + 99 others); Tue, 20 Nov 2018 08:32:21 -0500 Received: from szxga06-in.huawei.com ([45.249.212.32]:45205 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730039AbeKTNcU (ORCPT ); Tue, 20 Nov 2018 08:32:20 -0500 Received: from DGGEMS410-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 5C67B7E8332F2; Tue, 20 Nov 2018 11:05:25 +0800 (CST) Received: from localhost (10.67.212.75) by DGGEMS410-HUB.china.huawei.com (10.3.19.210) with Microsoft SMTP Server (TLS) id 14.3.408.0; Tue, 20 Nov 2018 11:05:18 +0800 Date: Tue, 20 Nov 2018 11:07:02 +0800 From: Kenneth Lee To: Jason Gunthorpe CC: Leon Romanovsky , Kenneth Lee , "Tim Sell" , , "Alexander Shishkin" , Zaibo Xu , , , , Christoph Lameter , Hao Fang , Gavin Schenk , "RDMA mailing list" , Zhou Wang , "Doug Ledford" , Uwe =?iso-8859-1?Q?Kleine-K=F6nig?= , David Kershner , Johan Hovold , Cyrille Pitchen , Sagar Dharia , Jens Axboe , , linux-netdev , Randy Dunlap , , Vinod Koul , , Philippe Ombredanne , Sanyog Kale , "David S. Miller" , Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce Message-ID: <20181120030702.GH157308@Turing-Arch-b> References: <20181112075807.9291-1-nek.in.cn@gmail.com> <20181112075807.9291-2-nek.in.cn@gmail.com> <20181113002354.GO3695@mtr-leonro.mtl.com> <95310df4-b32c-42f0-c750-3ad5eb89b3dd@gmail.com> <20181114160017.GI3759@mtr-leonro.mtl.com> <20181115085109.GD157308@Turing-Arch-b> <20181115145455.GN3759@mtr-leonro.mtl.com> <20181119091405.GE157308@Turing-Arch-b> <20181119184954.GB4890@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20181119184954.GB4890@ziepe.ca> User-Agent: Mutt/1.5.21 (2010-09-15) X-Originating-IP: [10.67.212.75] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote: > Date: Mon, 19 Nov 2018 11:49:54 -0700 > From: Jason Gunthorpe > To: Kenneth Lee > CC: Leon Romanovsky , Kenneth Lee , > Tim Sell , linux-doc@vger.kernel.org, Alexander > Shishkin , Zaibo Xu > , zhangfei.gao@foxmail.com, linuxarm@huawei.com, > haojian.zhuang@linaro.org, Christoph Lameter , Hao Fang > , Gavin Schenk , RDMA mailing > list , Zhou Wang , > Doug Ledford , Uwe Kleine-K?nig > , David Kershner > , Johan Hovold , Cyrille > Pitchen , Sagar Dharia > , Jens Axboe , > guodong.xu@linaro.org, linux-netdev , Randy Dunlap > , linux-kernel@vger.kernel.org, Vinod Koul > , linux-crypto@vger.kernel.org, Philippe Ombredanne > , Sanyog Kale , "David S. > Miller" , linux-accelerators@lists.ozlabs.org > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > User-Agent: Mutt/1.9.4 (2018-02-28) > Message-ID: <20181119184954.GB4890@ziepe.ca> > > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote: > > > If the hardware cannot share page table with the CPU, we then need to have > > some way to change the device page table. This is what happen in ODP. It > > invalidates the page table in device upon mmu_notifier call back. But this cannot > > solve the COW problem: if the user process A share a page P with device, and A > > forks a new process B, and it continue to write to the page. By COW, the > > process B will keep the page P, while A will get a new page P'. But you have > > no way to let the device know it should use P' rather than P. > > Is this true? I thought mmu_notifiers covered all these cases. > > The mm_notifier for A should fire if B causes the physical address of > A's pages to change via COW. > > And this causes the device page tables to re-synchronize. I don't see such code. The current do_cow_fault() implemenation has nothing to do with mm_notifer. > > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it support > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you don't need > > to write any code for that. Because it has been done by IOMMU framework. If it > > Looks like the IOMMU code uses mmu_notifier, so it is identical to > IB's ODP. The only difference is that IB tends to have the IOMMU page > table in the device, not in the CPU. > > The only case I know if that is different is the new-fangled CAPI > stuff where the IOMMU can directly use the CPU's page table and the > IOMMU page table (in device or CPU) is eliminated. > Yes. We are not focusing on the current implementation. As mentioned in the cover letter. We are expecting Jean Philips' SVA patch: git://linux-arm.org/linux-jpb. > Anyhow, I don't think a single instance of hardware should justify an > entire new subsystem. Subsystems are hard to make and without multiple > hardware examples there is no way to expect that it would cover any > future use cases. Yes. That's our first expectation. We can keep it with our driver. But because there is no user driver support for any accelerator in mainline kernel. Even the well known QuickAssit has to be maintained out of tree. So we try to see if people is interested in working together to solve the problem. > > If all your driver needs is to mmap some PCI bar space, route > interrupts and do DMA mapping then mediated VFIO is probably a good > choice. Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion and try not to add complexity to the mm subsystem. > > If it needs to do a bunch of other stuff, not related to PCI bar > space, interrupts and DMA mapping (ie special code for compression, > crypto, AI, whatever) then you should probably do what Jerome said and > make a drivers/char/hisillicon_foo_bar.c that exposes just what your > hardware does. Yes. If no other accelerator driver writer is interested. That is the expectation:) But we really like to have a public solution here. Consider this scenario: You create some connections (queues) to NIC, RSA, and AI engine. Then you got data direct from the NIC and pass the pointer to RSA engine for decryption. The CPU then finish some data taking or operation and then pass through to the AI engine for CNN calculation....This will need a place to maintain the same address space by some means. It is not complex, but it is helpful. > > If you have networking involved in here then consider RDMA, > particularly if this functionality is already part of the same > hardware that the hns infiniband driver is servicing. > > 'computational MRs' are a reasonable approach to a side-car offload of > already existing RDMA support. OK. Thanks. I will spend some time on it. But personally, I really don't like RDMA's complexity. I cannot even try one single function without a...some expensive hardwares and complexity connection in the lab. This is not like a open source way. > > Jason