Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754996AbcK1TyF (ORCPT ); Mon, 28 Nov 2016 14:54:05 -0500 Received: from mail-db5eur01on0058.outbound.protection.outlook.com ([104.47.2.58]:19072 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751223AbcK1Tx4 (ORCPT ); Mon, 28 Nov 2016 14:53:56 -0500 From: Haggai Eran To: "jgunthorpe@obsidianresearch.com" CC: "linux-kernel@vger.kernel.org" , "linux-rdma@vger.kernel.org" , "linux-nvdimm@ml01.01.org" , "christian.koenig@amd.com" , "Suravee.Suthikulpanit@amd.com" , "John.Bridgman@amd.com" , "Alexander.Deucher@amd.com" , "Linux-media@vger.kernel.org" , "dan.j.williams@intel.com" , "logang@deltatee.com" , "dri-devel@lists.freedesktop.org" , "Max Gurtovoy" , "linux-pci@vger.kernel.org" , "serguei.sagalovitch@amd.com" , "Paul.Blinzer@amd.com" , "Felix.Kuehling@amd.com" , "ben.sander@amd.com" Subject: Re: Enabling peer to peer device transactions for PCIe devices Thread-Topic: Enabling peer to peer device transactions for PCIe devices Thread-Index: AdJENuonJPasaqxFT7iHs+MJbpSfBgAtPueAADBFIk0AA+LngAAAVDEAAACfJQAAAOnzAAABOAgAAAFTTIAAAYaPAAAHVzCAACAKy4AAK0mygAAM8UYAAF06SwAANDuPgAAC21mA Date: Mon, 28 Nov 2016 18:19:40 +0000 Message-ID: <1480357179.19407.13.camel@mellanox.com> References: <20161123193228.GC12146@obsidianresearch.com> <20161123203332.GA15062@obsidianresearch.com> <20161123215510.GA16311@obsidianresearch.com> <91d28749-bc64-622f-56a1-26c00e6b462a@deltatee.com> <20161124164249.GD20818@obsidianresearch.com> <3f2d2db3-fb75-2422-2a18-a8497fd5d70e@amd.com> <20161125193252.GC16504@obsidianresearch.com> <20161128165751.GB28381@obsidianresearch.com> In-Reply-To: <20161128165751.GB28381@obsidianresearch.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=haggaie@mellanox.com; x-originating-ip: [46.121.82.195] x-microsoft-exchange-diagnostics: 1;VI1PR0501MB2848;7:romMrSzNVjUX+OJaNngQFFhzMRyD2LChaOv+wX8PKA8T4EiCvWaMl/4PrC++NLI6dbJrLDU/FzCA4ITrwwE3z5YzbUucJMy3K/uyjo8zsUzR8H7KGtIME0e5znFRfRa41vMHF8DurYhNlbW4ZKQVklyg5jgiHUXEmR0YVxmDvcWu3s465pIlwO2Q6/IYYBRE8o6+s7KpCdA1sZL0j2YEdLweBj4E/W/mN4NPZYwxlYpG1zSACmTgS77aWHJ7T2B5nUrTZ8zpzvRSMuuYrQQM49oKcv+sc2L5E/RBXaQDu+BOwJi6z9Vm7Rw+DOBBstAaf8T1vZ8KdMXU2AhEKm+OnvRJdn8pKMOMMNHDoNYreFg5daK6ZGFkcAXr5ZO/FjQ425vWttzkP5UJzLRibG8J1B/ktI9j4dIWa983TDZFclwvLdAKwzWtnOswpFPxtphYUbkOBSz1kEud4T20WUq6rA== x-forefront-antispam-report: SFV:SKI;SCL:-1SFV:NSPM;SFS:(10009020)(6009001)(7916002)(24454002)(377424004)(199003)(189002)(39380400001)(39410400001)(3846002)(39400400001)(7416002)(77096006)(86362001)(39450400002)(92566002)(93886004)(76176999)(5640700001)(229853002)(6116002)(68736007)(4001150100001)(102836003)(97736004)(101416001)(38730400001)(6486002)(33646002)(36756003)(50986999)(4326007)(2351001)(54356999)(2906002)(110136003)(5660300001)(6506003)(6512003)(2950100002)(105586002)(81166006)(3660700001)(7736002)(3280700002)(7846002)(6916009)(106356001)(122556002)(189998001)(305945005)(103116003)(8666005)(81156014)(1730700003)(8936002)(66066001)(2501003)(8676002)(2900100001)(7059030);DIR:OUT;SFP:1101;SCL:1;SRVR:VI1PR0501MB2848;H:AM5PR0502MB3107.eurprd05.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; x-ms-office365-filtering-correlation-id: 9e256253-f719-4672-7256-08d417bb1efb x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:VI1PR0501MB2848; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(278428928389397); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6060326)(6040361)(6045199)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6055026)(6041248)(6061324)(20161123562025)(20161123555025)(20161123560025)(20161123564025);SRVR:VI1PR0501MB2848;BCL:0;PCL:0;RULEID:;SRVR:VI1PR0501MB2848; x-forefront-prvs: 01401330D1 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-7" Content-ID: <15C391010025F34496D3F092DA4A205C@eurprd05.prod.outlook.com> MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Nov 2016 18:19:40.9829 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0501MB2848 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id uASJs88q024701 Content-Length: 2193 Lines: 43 On Mon, 2016-11-28 at 09:57 -0700, Jason Gunthorpe wrote: +AD4- On Sun, Nov 27, 2016 at 04:02:16PM +-0200, Haggai Eran wrote: +AD4- +AD4- I think blocking mmu notifiers against something that is basically +AD4- +AD4- controlled by user-space can be problematic. This can block things +AD4- +AD4- like +AD4- +AD4- memory reclaim. If you have user-space access to the device's +AD4- +AD4- queues, +AD4- +AD4- user-space can block the mmu notifier forever. +AD4- Right, I mentioned that.. Sorry, I must have missed it. +AD4- +AD4- On PeerDirect, we have some kind of a middle-ground solution for +AD4- +AD4- pinning +AD4- +AD4- GPU memory. We create a non-ODP MR pointing to VRAM but rely on +AD4- +AD4- user-space and the GPU not to migrate it. If they do, the MR gets +AD4- +AD4- destroyed immediately. +AD4- That sounds horrible. How can that possibly work? What if the MR is +AD4- being used when the GPU decides to migrate? Naturally this doesn't support migration. The GPU is expected to pin these pages as long as the MR lives. The MR invalidation is done only as a last resort to keep system correctness. I think it is similar to how non-ODP MRs rely on user-space today to keep them correct. If you do something like madvise(MADV+AF8-DONTNEED) on a non-ODP MR's pages, you can still get yourself into a data corruption situation (HCA sees one page and the process sees another for the same virtual address). The pinning that we use only guarentees the HCA's page won't be reused. +AD4- I would not support that +AD4- upstream without a lot more explanation.. +AD4- +AD4- I know people don't like requiring new hardware, but in this case we +AD4- really do need ODP hardware to get all the semantics people want.. +AD4- +AD4- +AD4- +AD4- +AD4- Another thing I think is that while HMM is good for user-space +AD4- +AD4- applications, for kernel p2p use there is no need for that. Using +AD4- From what I understand we are not really talking about kernel p2p, +AD4- everything proposed so far is being mediated by a userspace VMA, so +AD4- I'd focus on making that work. Fair enough, although we will need both eventually, and I hope the infrastructure can be shared to some degree.