Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755002AbcKUUgd (ORCPT ); Mon, 21 Nov 2016 15:36:33 -0500 Received: from mail-sn1nam02on0088.outbound.protection.outlook.com ([104.47.36.88]:38720 "EHLO NAM02-SN1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753434AbcKUUga (ORCPT ); Mon, 21 Nov 2016 15:36:30 -0500 From: "Deucher, Alexander" To: "'linux-kernel@vger.kernel.org'" , "'linux-rdma@vger.kernel.org'" , "'linux-nvdimm@lists.01.org'" , "'Linux-media@vger.kernel.org'" , "'dri-devel@lists.freedesktop.org'" , "'linux-pci@vger.kernel.org'" CC: "Koenig, Christian" , "Sagalovitch, Serguei" , "Blinzer, Paul" , "Kuehling, Felix" , "Sander, Ben" , "Suthikulpanit, Suravee" , "Bridgman, John" Subject: Enabling peer to peer device transactions for PCIe devices Thread-Topic: Enabling peer to peer device transactions for PCIe devices Thread-Index: AdJENuonJPasaqxFT7iHs+MJbpSfBg== Date: Mon, 21 Nov 2016 20:36:25 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Alexander.Deucher@amd.com; x-originating-ip: [208.255.205.5] x-microsoft-exchange-diagnostics: 1;MWHPR12MB1359;7:rcaz2LoPCXAueFL6CaNYt1/I3hGHBlmwRj1VCfFSxPZBlL8FhDDMhkGdNtD8YfEt/JgRx6FH8gScSeCNU0H5d4tnphHUWg1L6NX4R5emz9BcgV+iLMsGpb6rN7rfYhYyZ472+HNOSb/hz/UKnRinfjGqwEvb/aWz4XuJhsJuqrJBWwH+AkUw9Y6E2lxOZpJdM+gG17iIsMc3D2EkvdP7/cdBKcTHqhSYCsCQi7Sdj+MOYrjps1aCMNxJQCxzyHIPe9iVfHrJuJjHR4BM6Ive1tzmUJXuC1cdK0AvOOWygddKaNxplgluyhKzLABRpJ92L+Itq4YgnK9bHG452DM9qnqeWg5N9Jp0KtyM+JYv90M=;20:LTzQS0Ggejczsieoxdc1clbMtoMfESIqywyz1PrhnNJw/4p7jbU/M+C0h46AUzRGrbrp7PyHzQGllTpdRJSuqw97Oz+6rn7RKVum8h0GD37h3CW9eM4vtpFYhIj31MzywA1HDhHVtz48OhCIeJZ0KquKCmDY+Wmxaqw1irKI+IblpmZbvDWbg3JEgCnheu343etP81O49zvHxnmpOPP2JsFAkYoO3WEzE/G9D97gdjY5HAViqhcady5ylrwSmtzz x-forefront-antispam-report: SFV:SKI;SCL:-1SFV:NSPM;SFS:(10009020)(6009001)(7916002)(189002)(199003)(11905935001)(189998001)(77096005)(6506003)(97736004)(5001770100001)(2900100001)(76576001)(4326007)(8676002)(68736007)(105586002)(33656002)(2906002)(102836003)(6116002)(87936001)(3846002)(99286002)(9686002)(5660300001)(106356001)(50986999)(92566002)(54356999)(122556002)(101416001)(81156014)(7846002)(3660700001)(66066001)(38730400001)(3280700002)(8936002)(81166006)(74316002)(86362001)(7696004)(7736002)(305945005)(491001);DIR:OUT;SFP:1101;SCL:1;SRVR:MWHPR12MB1359;H:MWHPR12MB1694.namprd12.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; x-ms-office365-filtering-correlation-id: fe7b2151-362b-4756-8cf5-08d4124e1070 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:MWHPR12MB1359; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(36064498253994)(100405760836317); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040307)(6045199)(6060326)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(6055026)(6041248)(6061324)(6042181);SRVR:MWHPR12MB1359;BCL:0;PCL:0;RULEID:;SRVR:MWHPR12MB1359; x-forefront-prvs: 01334458E5 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Nov 2016 20:36:25.6861 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR12MB1359 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id uALKaaWN026179 Content-Length: 2573 Lines: 40 This is certainly not the first time this has been brought up, but I'd like to try and get some consensus on the best way to move this forward. Allowing devices to talk directly improves performance and reduces latency by avoiding the use of staging buffers in system memory. Also in cases where both devices are behind a switch, it avoids the CPU entirely. Most current APIs (DirectGMA, PeerDirect, CUDA, HSA) that deal with this are pointer based. Ideally we'd be able to take a CPU virtual address and be able to get to a physical address taking into account IOMMUs, etc. Having struct pages for the memory would allow it to work more generally and wouldn't require as much explicit support in drivers that wanted to use it. Some use cases: 1. Storage devices streaming directly to GPU device memory 2. GPU device memory to GPU device memory streaming 3. DVB/V4L/SDI devices streaming directly to GPU device memory 4. DVB/V4L/SDI devices streaming directly to storage devices Here is a relatively simple example of how this could work for testing. This is obviously not a complete solution. - Device memory will be registered with Linux memory sub-system by created corresponding struct page structures for device memory - get_user_pages_fast() will return corresponding struct pages when CPU address points to the device memory - put_page() will deal with struct pages for device memory Previously proposed solutions and related proposals: 1.P2P DMA DMA-API/PCI map_peer_resource support for peer-to-peer (http://www.spinics.net/lists/linux-pci/msg44560.html) Pros: Low impact, already largely reviewed. Cons: requires explicit support in all drivers that want to support it, doesn't handle S/G in device memory. 2. ZONE_DEVICE IO Direct I/O and DMA for persistent memory (https://lwn.net/Articles/672457/) Add support for ZONE_DEVICE IO memory with struct pages. (https://patchwork.kernel.org/patch/8583221/) Pro: Doesn't waste system memory for ZONE metadata Cons: CPU access to ZONE metadata slow, may be lost, corrupted on device reset. 3. DMA-BUF RDMA subsystem DMA-BUF support (http://www.spinics.net/lists/linux-rdma/msg38748.html) Pros: uses existing dma-buf interface Cons: dma-buf is handle based, requires explicit dma-buf support in drivers. 4. iopmem iopmem : A block device for PCIe memory (https://lwn.net/Articles/703895/) 5. HMM Heterogeneous Memory Management (http://lkml.iu.edu/hypermail/linux/kernel/1611.2/02473.html) 6. Some new mmap-like interface that takes a userptr and a length and returns a dma-buf and offset? Alex