Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp5041614imu; Tue, 29 Jan 2019 11:45:00 -0800 (PST) X-Google-Smtp-Source: ALg8bN5/H0QHuxkPuef2EzHXHJv6hxuouhnIE/9LZyrQCEwOMVEAmLK6tlhJjeXsI7IP5EVdr5NU X-Received: by 2002:a17:902:7d90:: with SMTP id a16mr26249756plm.249.1548791100283; Tue, 29 Jan 2019 11:45:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548791100; cv=none; d=google.com; s=arc-20160816; b=URI8I5tDGmVaxpzDTxXEuGE0pjO7fEd1ErLpwGt7stedfD2lu6PwKN+T2vMKZHFogn 09h6YmkGXaOJs3lJJviIJK4fnyLRGhWrri7pVkxTZLJR/wM/bYsj8edqy1kSElT4HFCC 8i1ISQKuZBFdWuYgeqrigZFsoblNN4Z+vpy5Ssqsz/hZBxp7op9/p+AV6sbw6Jgc1Vcm MUpgPy2IdaeFXKcUfP2sMkFCDmxtEoNJ5Q9daHi6A4JhPjywjSH+HsjAErBrFW23ZQKD ypKJj6yOOl2WAlDR4QW8c6D0skoMicoJlS/IusNJR6rsnw/JCVnqCwk7yCdLlhIUSRmI 1K1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=nQkS3sGBWmLslZw4oFu7yWrDIic6QlifppELyWfZ3X4=; b=y5MCUBauAZwfTIfzPVS8442+h+giLq5C/U2YNhPlFYSH6IIjrMFP3dP8cQpdKgle1O HBS1hN9cMJqSQDCVqWYruVHyym186wIMt6j0TRA8IS/6kwUcrhM0QYL0K35nwdjSDdRG vGPvWwFPcTGw/l9kJOwOmhGv18nyBoMPh/2GurPIAhfp94X2VPIyLj6ZRZo+o6b0+hOT lCuIj6+xjFIwfqANZ/f3Ea56FMDeN2plyVBT/xIn3TIJlszCT8adHZiZB4jpwPWsMHxQ T3bRseRvqCFSatmMVod4C2MGkCpXGIwdnRvcyhMRsPEkNGYswd+HPsC9j4hkoYBsWfdz cO0Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d4si36067259pfa.150.2019.01.29.11.44.43; Tue, 29 Jan 2019 11:45:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729192AbfA2ToZ (ORCPT + 99 others); Tue, 29 Jan 2019 14:44:25 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45330 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726828AbfA2ToZ (ORCPT ); Tue, 29 Jan 2019 14:44:25 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3F7F0C002966; Tue, 29 Jan 2019 19:44:24 +0000 (UTC) Received: from redhat.com (ovpn-122-2.rdu2.redhat.com [10.10.122.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0624880D1; Tue, 29 Jan 2019 19:44:20 +0000 (UTC) Date: Tue, 29 Jan 2019 14:44:19 -0500 From: Jerome Glisse To: Logan Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Greg Kroah-Hartman , "Rafael J . Wysocki" , Bjorn Helgaas , Christian Koenig , Felix Kuehling , Jason Gunthorpe , linux-pci@vger.kernel.org, dri-devel@lists.freedesktop.org, Christoph Hellwig , Marek Szyprowski , Robin Murphy , Joerg Roedel , iommu@lists.linux-foundation.org Subject: Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma Message-ID: <20190129194418.GG3176@redhat.com> References: <20190129174728.6430-1-jglisse@redhat.com> <20190129174728.6430-4-jglisse@redhat.com> <20190129191120.GE3176@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 29 Jan 2019 19:44:24 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 29, 2019 at 12:24:04PM -0700, Logan Gunthorpe wrote: > > > On 2019-01-29 12:11 p.m., Jerome Glisse wrote: > > On Tue, Jan 29, 2019 at 11:36:29AM -0700, Logan Gunthorpe wrote: > >> > >> > >> On 2019-01-29 10:47 a.m., jglisse@redhat.com wrote: > >> > >>> + /* > >>> + * Optional for device driver that want to allow peer to peer (p2p) > >>> + * mapping of their vma (which can be back by some device memory) to > >>> + * another device. > >>> + * > >>> + * Note that the exporting device driver might not have map anything > >>> + * inside the vma for the CPU but might still want to allow a peer > >>> + * device to access the range of memory corresponding to a range in > >>> + * that vma. > >>> + * > >>> + * FOR PREDICTABILITY IF DRIVER SUCCESSFULY MAP A RANGE ONCE FOR A > >>> + * DEVICE THEN FURTHER MAPPING OF THE SAME IF THE VMA IS STILL VALID > >>> + * SHOULD ALSO BE SUCCESSFUL. Following this rule allow the importing > >>> + * device to map once during setup and report any failure at that time > >>> + * to the userspace. Further mapping of the same range might happen > >>> + * after mmu notifier invalidation over the range. The exporting device > >>> + * can use this to move things around (defrag BAR space for instance) > >>> + * or do other similar task. > >>> + * > >>> + * IMPORTER MUST OBEY mmu_notifier NOTIFICATION AND CALL p2p_unmap() > >>> + * WHEN A NOTIFIER IS CALL FOR THE RANGE ! THIS CAN HAPPEN AT ANY > >>> + * POINT IN TIME WITH NO LOCK HELD. > >>> + * > >>> + * In below function, the device argument is the importing device, > >>> + * the exporting device is the device to which the vma belongs. > >>> + */ > >>> + long (*p2p_map)(struct vm_area_struct *vma, > >>> + struct device *device, > >>> + unsigned long start, > >>> + unsigned long end, > >>> + dma_addr_t *pa, > >>> + bool write); > >>> + long (*p2p_unmap)(struct vm_area_struct *vma, > >>> + struct device *device, > >>> + unsigned long start, > >>> + unsigned long end, > >>> + dma_addr_t *pa); > >> > >> I don't understand why we need new p2p_[un]map function pointers for > >> this. In subsequent patches, they never appear to be set anywhere and > >> are only called by the HMM code. I'd have expected it to be called by > >> some core VMA code and set by HMM as that's what vm_operations_struct is > >> for. > >> > >> But the code as all very confusing, hard to follow and seems to be > >> missing significant chunks. So I'm not really sure what is going on. > > > > It is set by device driver when userspace do mmap(fd) where fd comes > > from open("/dev/somedevicefile"). So it is set by device driver. HMM > > has nothing to do with this. It must be set by device driver mmap > > call back (mmap callback of struct file_operations). For this patch > > you can completely ignore all the HMM patches. Maybe posting this as > > 2 separate patchset would make it clearer. > > > > For instance see [1] for how a non HMM driver can export its memory > > by just setting those callback. Note that a proper implementation of > > this should also include some kind of driver policy on what to allow > > to map and what to not allow ... All this is driver specific in any > > way. > > I'd suggest [1] should be a part of the patchset so we can actually see > a user of the stuff you're adding. I did not wanted to clutter patchset with device driver specific usage of this. As the API can be reason about in abstract way. > > But it still doesn't explain everything as without the HMM code nothing > calls the new vm_ops. And there's still no callers for the p2p_test > functions you added. And I still don't understand why we need the new > vm_ops or who calls them and when. Why can't drivers use the existing > 'fault' vm_op and call a new helper function to map p2p when appropriate > or a different helper function to map a large range in its mmap > operation? Just like regular mmap code... HMM code is just one user, if you have a driver that use HMM mirror then your driver get support for this for free. If you do not want to use HMM then you can directly call this in your driver. The flow is, device driver want to setup some mapping for a range of virtual address [va_start, va_end]: 1 - Lookup vma for the range 2 - If vma is regular vma (not an mmap of a file) then use GUP If vma is a mmap of a file and they are p2p_map call back then call p2p_map() 3 - Use either the result of GUP or p2p_map() to program the device The p2p test function is use by device driver implementing the call back for instance see: https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-p2p&id=401a567696eafb1d4faf7054ab0d7c3a16a5ef06 The vm_fault callback is not suited because here we are mapping to another device so this will need special handling, someone must have both device struct pointer in end and someone must be allow to make decission on what to allow and what not to allow. Moreover exporting driver like GPU driver might have complex policy in place for which they will only allow export of some memory to a peer device but not other. In the end it means that it easier and simpler to add new call back specificaly for that, so the intention is clear to both the caller and the callee. The exporting device can then do the proper check using the core helper (ie checking that the device can actually peer to each other) and if that works it can then decide wether or not it wants to allow this other device to access its memory or if it prefer to use main memory for this. To add this to the fault callback we would need to define a bunch of new flags, setup fake page table so that we can populate pte and then have the importing device re-interpret everything specialy because it comes from another device. It would look ugly and it would need to modify bunch of core mm code. Note that this callback solution also allow an exporting device to allow peer access while CPU can not access the memory ie the pte entry for the range are pte_none. Cheers, J?r?me