Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4927368imu; Tue, 29 Jan 2019 09:48:08 -0800 (PST) X-Google-Smtp-Source: ALg8bN4XnCtSkRqtAGwlEcCc8vacigcZPQYwpl9fBKfN7p8u5sK2D17u3/X/wi9g6VAdZk7j49yO X-Received: by 2002:a62:fc8a:: with SMTP id e132mr26692140pfh.176.1548784088030; Tue, 29 Jan 2019 09:48:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548784088; cv=none; d=google.com; s=arc-20160816; b=N1mP3uMW9g3Q+QhJmXEGzZKhKZZTX5TpAp+RRU0Jkc/fCZ7dhJ7uJZa3sMNOFFcNdL hICFrwdzvg6JvFjdvz7bKMgKGHwJ8aNK2EXdFnEJdpHYcPQLd1jRGSGjxY2csLHIk3je O2agNDhKzD2vRHfdLE3zrRnKbSQ3sC2n2sJL0dAdy6mCzOnHkpDznMTvgt5L+WG0wYnT 01h84ROXMd7Lls/9GvaGXQAyTRFUj+NS0GzFkWxuUrmAfAfSiJkNQbhwBix82NkEM92Y LbZr6wakGsdQjvJlrB7ZdCChZWw3Yfv7z7b5lMGlMN02YlQBVBtu+sP2Wu++ZhWuVore lKVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=7t824AnphP/zqnjb4IosB6tDF0+Ohdb/AEh+UEV+FqI=; b=Jb1q9DbATay4ouZglB8gfjSSIbqlgtGYTFfcV4rAYLWxWURyrPI4g8jsOgkkeUuske RSe82QRuuKkKfra7W1YuiR8qdnH1qXLbWHg8NnAB435htggrHSStF9n3E3fEIFs7v4iq eeYZ5+xRkySwGN7QQxYup6rDeQYvX3p3asBAII18HfXTceqLia8L6DOVbda3CgULTvdY kkeHGwuE3cvoy7geFVCTYFS3kHX8PBWG4jaK4aH+xUrl+20FzqE0zVho3+KMIIFnde9k PlVdSBxJXzs1ZZl7m8nQsILIwAfci49UIDfEqdUX0c8Z95vfqrRE5Np4d5/YFSLdOvQi JAvQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j10si28835597pgn.365.2019.01.29.09.47.52; Tue, 29 Jan 2019 09:48:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728741AbfA2Rrj (ORCPT + 99 others); Tue, 29 Jan 2019 12:47:39 -0500 Received: from mx1.redhat.com ([209.132.183.28]:39354 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728667AbfA2Rrj (ORCPT ); Tue, 29 Jan 2019 12:47:39 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id ADC1D9B308; Tue, 29 Jan 2019 17:47:37 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0B55B5D97E; Tue, 29 Jan 2019 17:47:32 +0000 (UTC) From: jglisse@redhat.com To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , Logan Gunthorpe , Greg Kroah-Hartman , "Rafael J . Wysocki" , Bjorn Helgaas , Christian Koenig , Felix Kuehling , Jason Gunthorpe , linux-pci@vger.kernel.org, dri-devel@lists.freedesktop.org, Christoph Hellwig , Marek Szyprowski , Robin Murphy , Joerg Roedel , iommu@lists.linux-foundation.org Subject: [RFC PATCH 0/5] Device peer to peer (p2p) through vma Date: Tue, 29 Jan 2019 12:47:23 -0500 Message-Id: <20190129174728.6430-1-jglisse@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Tue, 29 Jan 2019 17:47:38 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Jérôme Glisse This patchset add support for peer to peer between device in two manner. First for device memory use through HMM in process regular address space (ie inside a regular vma that is not an mmap of device file or special file). Second for special vma ie mmap of a device file, in this case some device driver might want to allow other device to directly access memory use for those special vma (not that the memory might not even be map to CPU in this case). They are many use cases for this they mainly fall into 2 category: [A]-Allow device to directly map and control another device command queue. [B]-Allow device to access another device memory without disrupting the other device computation. Corresponding workloads: [1]-Network device directly access an control a block device command queue so that it can do storage access without involving the CPU. This fall into [A] [2]-Accelerator device doing heavy computation and network device is monitoring progress. Direct accelerator's memory access by the network device avoid the need to use much slower system memory. This fall into [B]. [3]-Accelerator device doing heavy computation and network device is streaming out the result. This avoid the need to first bounce the result through system memory (it saves both system memory and bandwidth). This fall into [B]. [4]-Chaining device computation. For instance a camera device take a picture, stream it to a color correction device that stream it to final memory. This fall into [A and B]. People have more ideas on how to use this than i can list here. The intention of this patchset is to provide the means to achieve those and much more. I have done a testing using nouveau and Mellanox mlx5 where the mlx5 device can directly access GPU memory [1]. I intend to use this inside nouveau and help porting AMD ROCm RDMA to use this [2]. I believe other people have express interest in working on using this with network device and block device. From implementation point of view this just add 2 new call back to vm_operations struct (for special device vma support) and 2 new call back to HMM device memory structure for HMM device memory support. For now it needs IOMMU off with ACS disabled and for both device to be on same PCIE sub-tree (can not cross root complex). However the intention here is different from some other peer to peer work in that we do want to support IOMMU and are fine with going through the root complex in that case. In other words, the bandwidth advantage of avoiding the root complex is of less importance than the programming model for the feature. We do actualy expect that this will be use mostly with IOMMU enabled and thus with having to go through the root bridge. Another difference from other p2p solution is that we do require that the importing device abide to mmu notifier invalidation so that the exporting device can always invalidate a mapping at any point in time. For this reasons we do not need a struct page for the device memory. Also in all the cases the policy and final decision on wether to map or not is solely under the control of the exporting device. Finaly the device memory might not even be map to the CPU and thus we have to go through the exporting device driver to get the physical address at which the memory is accessible. The core change are minimal (adding new call backs to some struct). IOMMU support will need little change too. Most of the code is in driver to implement export policy and BAR space management. Very gross playground with IOMMU support in [3] (top 3 patches). Cheers, Jérôme [1] https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-p2p [2] https://github.com/RadeonOpenCompute/ROCnRDMA [3] https://cgit.freedesktop.org/~glisse/linux/log/?h=wip-hmm-p2p Cc: Logan Gunthorpe Cc: Greg Kroah-Hartman Cc: Rafael J. Wysocki Cc: Bjorn Helgaas Cc: Christian Koenig Cc: Felix Kuehling Cc: Jason Gunthorpe Cc: linux-pci@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org Jérôme Glisse (5): pci/p2p: add a function to test peer to peer capability drivers/base: add a function to test peer to peer capability mm/vma: add support for peer to peer to device vma mm/hmm: add support for peer to peer to HMM device memory mm/hmm: add support for peer to peer to special device vma drivers/base/core.c | 20 ++++ drivers/pci/p2pdma.c | 27 +++++ include/linux/device.h | 1 + include/linux/hmm.h | 53 +++++++++ include/linux/mm.h | 38 +++++++ include/linux/pci-p2pdma.h | 6 + mm/hmm.c | 219 ++++++++++++++++++++++++++++++------- 7 files changed, 325 insertions(+), 39 deletions(-) -- 2.17.2