Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp7497204imu; Thu, 31 Jan 2019 11:04:37 -0800 (PST) X-Google-Smtp-Source: ALg8bN6AOTZOSyd0HeG5CN6gfqBaqqtyNpCprlDijqIP8wiyce8zeheNrvlKshvqvsziQJ2jtj/R X-Received: by 2002:a17:902:be11:: with SMTP id r17mr36230760pls.308.1548961477435; Thu, 31 Jan 2019 11:04:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548961477; cv=none; d=google.com; s=arc-20160816; b=spoe7qb9FYcE+wx/gKnXMX5E2FJFQ9YL3uvKixR5cwJp3jLshFor7+I2IGHi5MRYgC gF9bWGuQ+m7064XzwW1tGyeUH45lKZyiG/8TtYP0WInVziN1mBoWDGu4S5FJTDWgTMP6 6gfbNBzHQ9KFRld6d9+BZAsnAUxe0a4Dsv2We/Vvbnp8UG+/5JLIwYgVxi7HSc+egUqI sn0eyVpjhq50u164xslHVd4TttWbgsQvKb6LaanDIW43TjJZw2NeJJ30CM9SlWGFQy4f DL7nGbO3L5WVjhtPwu3nUrtAFadewS+a6dXwBfSSjl6w2lUDbkyc35H19nli1xNV8670 eB4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-id:content-language:accept-language:in-reply-to:references :message-id:date:thread-index:thread-topic:subject:cc:to:from :dkim-signature; bh=3nufCx3hqfLVglKqxGmnoTk/Ptp5QELUM4LvKps5rQs=; b=HBtUoF+ZaNf/cS/llkLf+DmdtojsqqAAmVBsyiGdfKZXQX9QjeviHnfBr+JfSRmgAH yZC1DOpCDertzaDSh5s1CktFsTIg3WbZ+JfC04VQepF6J9M1d+i7Y9S/MYUXjauy+pvn mMWb4Ag8A13dLIS2aA1tLtAUZvkL3klAMn1opl3fYdrUigM4FAmcBU+6bR7Bun8QNuSO MSyg2eSxbc6tjJpnC9ed4mhcoJzkV4pNekN8chVKrujue6IM4Sazji5bFXmP8uM3RO/j NppO6CKIGtGqNm9TtrioD5cgll25lOdMXAMUXLVBceoGyamVd5/dWpitDSpZsAJLXxev souQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@Mellanox.com header.s=selector1 header.b=YXefsJum; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mellanox.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m75si5075576pga.432.2019.01.31.11.04.22; Thu, 31 Jan 2019 11:04:37 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@Mellanox.com header.s=selector1 header.b=YXefsJum; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mellanox.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728266AbfAaTCV (ORCPT + 99 others); Thu, 31 Jan 2019 14:02:21 -0500 Received: from mail-eopbgr10061.outbound.protection.outlook.com ([40.107.1.61]:27568 "EHLO EUR02-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726488AbfAaTCU (ORCPT ); Thu, 31 Jan 2019 14:02:20 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3nufCx3hqfLVglKqxGmnoTk/Ptp5QELUM4LvKps5rQs=; b=YXefsJumO8hpqJLahYiT0QQbysGPrF2c0hG5ycpe37QETVPi6t8drxKRdI1cCof1JgGNZz9L8Zrb+DUxJJ6Hw2HH6flqW2W1q8iOpRmcXJy7P4UHxouLSdCWaTtnkpWCX1VhARBs85yq4Xz0j+Dr86xCmS17gSTUgh1mJMdbgGA= Received: from DBBPR05MB6426.eurprd05.prod.outlook.com (20.179.42.80) by DBBPR05MB6572.eurprd05.prod.outlook.com (20.179.44.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1580.17; Thu, 31 Jan 2019 19:02:15 +0000 Received: from DBBPR05MB6426.eurprd05.prod.outlook.com ([fe80::24c2:321d:8b27:ae59]) by DBBPR05MB6426.eurprd05.prod.outlook.com ([fe80::24c2:321d:8b27:ae59%5]) with mapi id 15.20.1580.017; Thu, 31 Jan 2019 19:02:15 +0000 From: Jason Gunthorpe To: Christoph Hellwig CC: Logan Gunthorpe , Jerome Glisse , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Greg Kroah-Hartman , "Rafael J . Wysocki" , Bjorn Helgaas , Christian Koenig , Felix Kuehling , "linux-pci@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , Marek Szyprowski , Robin Murphy , Joerg Roedel , "iommu@lists.linux-foundation.org" Subject: Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma Thread-Topic: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma Thread-Index: AQHUt/rA/dLikqWEmEaIytHIBNLPlqXGkyOAgAAJwICAAAX+AIAAEreAgAAFCQCAAAk3gIAABX0AgAATFYCAAA25AIAAGRqAgAAykICAAD3dAIAAukqAgAAK3wCAAAOzAIAAEXyAgAANnoCAABFLgIAAnPCAgAC1FQA= Date: Thu, 31 Jan 2019 19:02:15 +0000 Message-ID: <20190131190202.GC7548@mellanox.com> References: <655a335c-ab91-d1fc-1ed3-b5f0d37c6226@deltatee.com> <20190130041841.GB30598@mellanox.com> <20190130080006.GB29665@lst.de> <20190130190651.GC17080@mellanox.com> <840256f8-0714-5d7d-e5f5-c96aec5c2c05@deltatee.com> <20190130195900.GG17080@mellanox.com> <35bad6d5-c06b-f2a3-08e6-2ed0197c8691@deltatee.com> <20190130215019.GL17080@mellanox.com> <07baf401-4d63-b830-57e1-5836a5149a0c@deltatee.com> <20190131081355.GC26495@lst.de> In-Reply-To: <20190131081355.GC26495@lst.de> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: MWHPR2201CA0056.namprd22.prod.outlook.com (2603:10b6:301:16::30) To DBBPR05MB6426.eurprd05.prod.outlook.com (2603:10a6:10:c9::16) authentication-results: spf=none (sender IP is ) smtp.mailfrom=jgg@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [174.3.196.123] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;DBBPR05MB6572;6:HNkZF7avC8/n5VWQOBdA1HlI8rj2UKHNvQF8/BvCNCCEo352KPB2cF+1ylV+WbGOOkcT+wHNBbAlh6wCgFSo1XC4CdlO1f++D1ae8peTgMnWTiN2CGYUCV1AvYGGHRMqrKk/0KyHgxJZhNYYCWSMXFiSYIIuYwoqLGhk936XzOmw8opAnL1EJ6YkVC2DRqdTjeY7wRweKnm9HyW/MygFtp7arLshc7P/RQb6Zjr4t5ar2m2mZWlo71ies/22f8BylJEi7nS33UMf5oHK+LDqiY14XYMdBLil5vtRnPQMV9Yesgk+dRpSfOR0E6wxQtryIHQ8AjEcwELM9XpsQM+p4hXNT26osd9ffHmrpchN+wnpAYFTzHLLZm0G5KcVtVnSWQKB/30z4L2c1Wtka0kJPvarytFsdMa6WEeKMSmwdBxa2A2/p52jRSTfFXUrYt+5j36WVysA+SGYaIlVjway1g==;5:BRoqPTooKEsVS+ZNcMN0we1IS1yCPVExUUlmkyoymiOlHKyz6ZyttYyhO/cJnLDLZaWhjFqPj7q/iR9ZvA63rRIBPn4ZHNWOyaHIJM9RV5LFm3peDlRzEQpj1gKfeY1PlcXU3kRySu61a+FZGAEMokI8U9W9m7BcfgDe2TFCimGOujMcXSPfrkki2Ok8JU7bK670Vz8PVRkBZwb3giB4xA==;7:o373IAlTU66yM22edgyqTz+VhUhiV9JdG9gWD+VFEXH+gLHl4MGcyLBGoJEku9MZKpo0lIao7r+eRrVeEspre0Fkemcy7MTNu3AHABp3rpy8z4C0vWyy192lLttP0ugsaid+/SyODmYZ5GKznoWtJQ== x-ms-office365-filtering-correlation-id: 2f9dac1c-7542-49db-bafd-08d687ae9d04 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0;PCL:0;RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600110)(711020)(4605077)(4618075)(2017052603328)(7153060)(7193020);SRVR:DBBPR05MB6572; x-ms-traffictypediagnostic: DBBPR05MB6572: x-microsoft-antispam-prvs: x-forefront-prvs: 09347618C4 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(136003)(396003)(366004)(39860400002)(346002)(376002)(189003)(199004)(51444003)(486006)(11346002)(305945005)(36756003)(81166006)(8936002)(7416002)(71190400001)(71200400001)(8676002)(2906002)(68736007)(81156014)(97736004)(476003)(105586002)(2616005)(6916009)(7736002)(106356001)(446003)(1076003)(186003)(66066001)(256004)(14444005)(217873002)(478600001)(6486002)(86362001)(93886005)(386003)(52116002)(6506007)(25786009)(316002)(229853002)(102836004)(54906003)(6512007)(33656002)(6116002)(3846002)(6436002)(4326008)(53936002)(76176011)(99286004)(6246003)(14454004)(26005);DIR:OUT;SFP:1101;SCL:1;SRVR:DBBPR05MB6572;H:DBBPR05MB6426.eurprd05.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: 09p5BRgWBkIqlg8tnVg1jiC0Aeq8rYPttLCDa+ObAmnsN0Yl051RsM19dqA2D2lh9JIVbRMlaJ6lRJSI5yZCUOzWNEypRax7cm6S8kVD5PA10cOVH1ZzsNkTl3XgrLx4y+N4IaZe/9/mxHj/BoYjZf1TMjqN8f5y7/gpVtpHjOlu9EbmdbUEpiLqqrtB10BsfiEPu6s0DWttu87AmTlo2iMMLbI8w5rR90FoPl3kz6UPrjBD7NK4bQ7Rbuwlk5sIDjC8aFxVhUXzvbA36UnDeILc9cT0dedug8L0utacZh3bapVXmpoXfnjX//VP8gCWZIaffAjf/qAn7FjrtIzZE4hl5pNuFjohUY2be55wAsmOCpuRd04NfBskCqMk27hdorWaFluV96Ni+mu8sEfcRGbqmMH4nvK4I3k9atdDfaQ= Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 2f9dac1c-7542-49db-bafd-08d687ae9d04 X-MS-Exchange-CrossTenant-originalarrivaltime: 31 Jan 2019 19:02:14.7876 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBBPR05MB6572 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 31, 2019 at 09:13:55AM +0100, Christoph Hellwig wrote: > On Wed, Jan 30, 2019 at 03:52:13PM -0700, Logan Gunthorpe wrote: > > > *shrug* so what if the special GUP called a VMA op instead of > > > traversing the VMA PTEs today? Why does it really matter? It could > > > easily change to a struct page flow tomorrow.. > >=20 > > Well it's so that it's composable. We want the SGL->DMA side to work fo= r > > APIs from kernel space and not have to run a completely different flow > > for kernel drivers than from userspace memory. >=20 > Yes, I think that is the important point. >=20 > All the other struct page discussion is not about anyone of us wanting > struct page - heck it is a pain to deal with, but then again it is > there for a reason. >=20 > In the typical GUP flows we have three uses of a struct page: >=20 > (1) to carry a physical address. This is mostly through > struct scatterlist and struct bio_vec. We could just store > a magic PFN-like value that encodes the physical address > and allow looking up a page if it exists, and we had at least > two attempts at it. In some way I think that would actually > make the interfaces cleaner, but Linus has NACKed it in the > past, so we'll have to convince him first that this is the > way forward Something like this (and more) has always been the roadblock with trying to mix BAR memory into SGL. I think it is such a big problem as to be unsolvable in one step..=20 Struct page doesn't even really help anything beyond dma_map as we still can't pretend that __iomem is normal memory for general SGL users. > (2) to keep a reference to the memory so that it doesn't go away > under us due to swapping, process exit, unmapping, etc. > No idea how we want to solve this, but I guess you have > some smart ideas? Jerome, how does this work anyhow? Did you do something to make the VMA lifetime match the p2p_map/unmap? Or can we get into a situation were the VMA is destroyed and the importing driver can't call the unmap anymore? I know in the case of notifiers the VMA liftime should be strictly longer than the map/unmap - but does this mean we can never support non-notifier users via this scheme? > (3) to make the PTEs dirty after writing to them. Again no sure > what our preferred interface here would be This need doesn't really apply to BAR memory.. > If we solve all of the above problems I'd be more than happy to > go with a non-struct page based interface for BAR P2P. But we'll > have to solve these issues in a generic way first. I still think the right direction is to build on what Logan has done - realize that he created a DMA-only SGL - make that a formal type of the kernel and provide the right set of APIs to work with this type, without being forced to expose struct page. Basically invert the API flow - the DMA map would be done close to GUP, not buried in the driver. This absolutely doesn't work for every flow we have, but it does enable the ones that people seem to care about when talking about P2P. To get to where we are today we'd need a few new IB APIs, and some nvme change to work with DMA-only SGL's and so forth, but that doesn't seem so bad. The API also seems much more safe and understandable than todays version that is trying to hope that the SGL is never touched by the CPU. It also does present a path to solve some cases of the O_DIRECT problems if the block stack can develop some way to know if an IO will go down a DMA-only IO path or not... This seems less challenging that auditing every SGL user for iomem safety?? Yes we end up with a duality, but we already basically have that with the p2p flow today.. Jason