Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp6220470imu; Wed, 30 Jan 2019 10:51:05 -0800 (PST) X-Google-Smtp-Source: ALg8bN5TUzyCTyhNxmVrlxWnbtL2wUrMzWVjcXBaPcr4HlUqLyCKqgwsRYO8YXUEJVkV3gRyTX/s X-Received: by 2002:a62:1212:: with SMTP id a18mr32633920pfj.217.1548874265294; Wed, 30 Jan 2019 10:51:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548874265; cv=none; d=google.com; s=arc-20160816; b=DeKcDAbFqAbszf/LSmiYxRWD80uBt5hHSJeUhpLBGVPV/hjKlKvKYBWKvdfldBgwNM HhbgoPdq5iMIFCImRLa1jLsaDW32MDA/IcCf8Cq0jagiZCytT0bKSHPfQBwfgn/kwVdK OugfqjJwPEBgAbdxn3ChDfproHHTHgQrMY6teTt3KfHZT5Mn6AeH76hpfe4U94Y8Lxwe exguUsKIZHKSYHZGl6EPGHA+pbij7KrsMDkHRsV2FKuNeXpd9rAzQxT1YxJLuUDSk17a zXi+EqQFwGF91cKko/vBl7cs+AU7Gt4YiGTb9A62E9dWe60M/KwNgGziJwHzmaMSxtob 73/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=Niu/IArgHrwijwlEhFIy2nKUYFs6ASpDdFajt4FuJz4=; b=PeX+LmNVSuMdikurZ4zj9VMe9URhPmV2AfxA+n8c41Yy3yZwUkAs/stbUkXc6PxCk1 SA4BHHu+EYWIeaVX5+r1dVg6NPA0gzPv00OQK01z75pgmPHNSBbs1HE5CLHmI7RGZRMj D0Nyki4/YNMAV3P/cNvyPbAUD/oSouPatYwR0R2fvwxy0m6Rh+246r6iONWdvN2A/pEs JwneY/BYD6fDrNk5Pmr/blb1kusOtSxiHgbCWQQTw+A3ylohj3uAl2KIFAN/ZsNPzwtE JCo5CHKRq/yM35zJXaHf5B2F44z6/alRNMMi7QFmYUhPWTVCpRvJPU1SsOuM5wBitQCP BjaQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y73si1900334pgd.478.2019.01.30.10.50.49; Wed, 30 Jan 2019 10:51:05 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733255AbfA3Sue (ORCPT + 99 others); Wed, 30 Jan 2019 13:50:34 -0500 Received: from mx1.redhat.com ([209.132.183.28]:36170 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725786AbfA3Sud (ORCPT ); Wed, 30 Jan 2019 13:50:33 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D086EC7C29; Wed, 30 Jan 2019 18:50:31 +0000 (UTC) Received: from redhat.com (ovpn-126-0.rdu2.redhat.com [10.10.126.0]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A635E60C55; Wed, 30 Jan 2019 18:50:29 +0000 (UTC) Date: Wed, 30 Jan 2019 13:50:27 -0500 From: Jerome Glisse To: Logan Gunthorpe Cc: Jason Gunthorpe , Christoph Hellwig , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Greg Kroah-Hartman , "Rafael J . Wysocki" , Bjorn Helgaas , Christian Koenig , Felix Kuehling , "linux-pci@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , Marek Szyprowski , Robin Murphy , Joerg Roedel , "iommu@lists.linux-foundation.org" Subject: Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma Message-ID: <20190130185027.GC5061@redhat.com> References: <20190129174728.6430-1-jglisse@redhat.com> <20190129174728.6430-4-jglisse@redhat.com> <20190129191120.GE3176@redhat.com> <20190129193250.GK10108@mellanox.com> <99c228c6-ef96-7594-cb43-78931966c75d@deltatee.com> <20190129205827.GM10108@mellanox.com> <20190130080208.GC29665@lst.de> <20190130174424.GA17080@mellanox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 30 Jan 2019 18:50:33 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 30, 2019 at 11:13:11AM -0700, Logan Gunthorpe wrote: > > > On 2019-01-30 10:44 a.m., Jason Gunthorpe wrote: > > I don't see why a special case with a VMA is really that different. > > Well one *really* big difference is the VMA changes necessarily expose > specialized new functionality to userspace which has to be supported > forever and may be difficult to change. The p2pdma code is largely > in-kernel and we can rework and change the interfaces all we want as we > improve our struct page infrastructure. I do not see how VMA changes are any different than using struct page in respect to userspace exposure. Those vma callback do not need to be set by everyone, in fact expectation is that only handful of driver will set those. How can we do p2p between RDMA and GPU for instance, without exposure to userspace ? At some point you need to tell userspace hey this kernel does allow you to do that :) RDMA works on vma, and GPU driver can easily setup vma for an object hence why vma sounds like a logical place. In fact vma (mmap of a device file) is very common device driver pattern. In the model i am proposing the exporting device is in control of policy ie wether to allow or not the peer to peer mapping. So each device driver can define proper device specific API to enable and expose that feature to userspace. If they do, the only thing we have to preserve is the end result for the user. The userspace does not care one bit if we achieve this in the kernel with a set of new callback within the vm_operations struct or in some other way. Only the end result matter. So question is do we want to allow RDMA to access GPU driver object ? I believe we do, they are people using non upstream solution with open source driver to do just that, so it is a testimony that they are users for this. More use case have been propose too. > > I'd also argue that p2pdma isn't nearly as specialized as this VMA thing > and can be used pretty generically to do other things. Though, the other > ideas we've talked about doing are pretty far off and may have other > challenges. I believe p2p is highly specialize on non cache-coherent inter-connect platform like x86 with PCIE. So i do not think that using struct page for this is a good idea, it is not warranted/needed, and it can only be problematic if some random kernel code get holds of those struct page without understanding it is not regular memory. I believe the vma callback are the simplest solution with the minimum burden for the device driver and for the kernel. If they are any better solution that emerge there is nothing that would block us to remove this to replace it with the other solution. Cheers, J?r?me