Received: by 10.223.164.221 with SMTP id h29csp1913483wrb; Thu, 12 Oct 2017 08:38:51 -0700 (PDT) X-Google-Smtp-Source: AOwi7QB4FkptHUZIcUjBhRS86hNsypQyCzh1HaAyGenCe0qmTzM0FiSM56VVHQm6d6++JeT/1XxI X-Received: by 10.84.175.195 with SMTP id t61mr557145plb.59.1507822731011; Thu, 12 Oct 2017 08:38:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1507822730; cv=none; d=google.com; s=arc-20160816; b=FCW3Vka4iDFfLcaqgkwPc9PHTpBlvPmQkV8/pBSL3ZNjBB0JSfSpMpDvEkAp0RlEdN TBDrHiJJpu1sirI2u7HBYnSZuRLVmThslIUaYZ7/QJLeiAVHH2abYQhVst1BjKbeCODe HOTxuXt603tHCJ5rYoRzsQdFVEy/9nbTsJE/Sf9MZt0amExqwWpC1H3JtTWCPDoWBHIX jtf0MYRL2M8tehmy9wmSUIEN0CiHl+CEOn+L8RFXpzIbJ4/TIt+SNuh8R30lPR1h50US m476rzxflJQWjPxvhXtx6nWYjZZsV0GaGUbgpB6zxRpvOKzgbGkkUHmmmP7jlottc0RP KSdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dmarc-filter :arc-authentication-results; bh=7exNbPzAx/zfA0OhPR6FbCPRIpQiqdPplj46mKTcB4w=; b=futkurcP62RvcoBOPfiIGMOkvetHrN0Ycbhrs1k+sB8t0IVFABl2JA3x6mBVLtBzFz MEfGe8NeZg/Y9nw1PwbXDhf/BZkN0RLnKhrBTqwMxeHwzmRPM8vlOpROcVHx74V/E7/d MwxPDiE+U5ran7/VuWO646Kdgd1WH5A/xukI63kU1MnSfuCDbSLNstxuOJDWlYBN9Pv0 1IOmSYdxVy9sw1L1AAl4BoF6aIPyV1cxO4nC42xmFkhipdvDW6Aztuq3f9Vno+hfBdS0 WnTsEGwsUE1WB/KphlqjxqtKWGSjqxfjofU1efRCVXdtCQz+e9WasazIH4Yy6zgE0Chb 6s3Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c41si3274740plj.279.2017.10.12.08.38.37; Thu, 12 Oct 2017 08:38:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753190AbdJLPha (ORCPT + 99 others); Thu, 12 Oct 2017 11:37:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52698 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753092AbdJLPh1 (ORCPT ); Thu, 12 Oct 2017 11:37:27 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9E1B3C2D0D37; Thu, 12 Oct 2017 15:37:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 9E1B3C2D0D37 Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=jglisse@redhat.com Received: from redhat.com (ovpn-125-176.rdu2.redhat.com [10.10.125.176]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4A39D5C88D; Thu, 12 Oct 2017 15:37:24 +0000 (UTC) Date: Thu, 12 Oct 2017 11:37:22 -0400 From: Jerome Glisse To: Bob Liu Cc: Bob Liu , Dan Williams , "linux-kernel@vger.kernel.org" , Linux MM , John Hubbard , David Nellans , Balbir Singh , Michal Hocko , Andrew Morton Subject: Re: [PATCH 0/6] Cache coherent device memory (CDM) with HMM v5 Message-ID: <20171012153721.GA2986@redhat.com> References: <20170721014106.GB25991@redhat.com> <20170905193644.GD19397@redhat.com> <20170911233649.GA4892@redhat.com> <20170926161635.GA3216@redhat.com> <0d7273c3-181c-6d68-3c5f-fa518e782374@huawei.com> <20170930224927.GC6775@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Thu, 12 Oct 2017 15:37:27 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 11, 2017 at 09:15:57PM +0800, Bob Liu wrote: > On Sun, Oct 1, 2017 at 6:49 AM, Jerome Glisse wrote: > > On Sat, Sep 30, 2017 at 10:57:38AM +0800, Bob Liu wrote: > >> On 2017/9/27 0:16, Jerome Glisse wrote: > >> > On Tue, Sep 26, 2017 at 05:56:26PM +0800, Bob Liu wrote: > >> >> On Tue, Sep 12, 2017 at 7:36 AM, Jerome Glisse wrote: > >> >>> On Sun, Sep 10, 2017 at 07:22:58AM +0800, Bob Liu wrote: > >> >>>> On Wed, Sep 6, 2017 at 3:36 AM, Jerome Glisse wrote: > >> >>>>> On Thu, Jul 20, 2017 at 08:48:20PM -0700, Dan Williams wrote: > >> >>>>>> On Thu, Jul 20, 2017 at 6:41 PM, Jerome Glisse wrote: > >> [...] > >> >>>>> So i pushed a branch with WIP for nouveau to use HMM: > >> >>>>> > >> >>>>> https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-nouveau > >> >>>>> > >> >>>> > >> >>>> Nice to see that. > >> >>>> Btw, do you have any plan for a CDM-HMM driver? CPU can write to > >> >>>> Device memory directly without extra copy. > >> >>> > >> >>> Yes nouveau CDM support on PPC (which is the only CDM platform commercialy > >> >>> available today) is on the TODO list. Note that the driver changes for CDM > >> >>> are minimal (probably less than 100 lines of code). From the driver point > >> >>> of view this is memory and it doesn't matter if it is CDM or not. > >> >>> > >> >> > >> >> It seems have to migrate/copy memory between system-memory and > >> >> device-memory even in HMM-CDM solution. > >> >> Because device-memory is not added into buddy system, the page fault > >> >> for normal malloc() always allocate memory from system-memory!! > >> >> If the device then access the same virtual address, the data is copied > >> >> to device-memory. > >> >> > >> >> Correct me if I misunderstand something. > >> >> @Balbir, how do you plan to make zero-copy work if using HMM-CDM? > >> > > >> > Device can access system memory so copy to device is _not_ mandatory. Copying > >> > data to device is for performance only ie the device driver take hint from > >> > userspace and monitor device activity to decide which memory should be migrated > >> > to device memory to maximize performance. > >> > > >> > Moreover in some previous version of the HMM patchset we had an helper that > >> > >> Could you point in which version? I'd like to have a look. > > > > I will need to dig in. > > > > Thank you. I forgot about this, sorry i was traveling i am still catching up. I will send you those patches once i unearth where i end up backing them. > > >> > >> > allowed to directly allocate device memory on device page fault. I intend to > >> > post this helper again. With that helper you can have zero copy when device > >> > is the first to access the memory. > >> > > >> > Plan is to get what we have today work properly with the open source driver > >> > and make it perform well. Once we get some experience with real workload we > >> > might look into allowing CPU page fault to be directed to device memory but > >> > at this time i don't think we need this. > >> > > >> > >> For us, we need this feature that CPU page fault can be direct to device memory. > >> So that don't need to copy data from system memory to device memory. > >> Do you have any suggestion on the implementation? I'll try to make a prototype patch. > > > > Why do you need that ? What is the device and what are the requirement ? > > > > You may think it as a CCIX device or CAPI device. > The requirement is eliminate any extra copy. > A typical usecase/requirement is malloc() and madvise() allocate from > device memory, then CPU write data to device memory directly and > trigger device to read the data/do calculation. I suggest you rely on the device driver userspace API to do a migration after malloc then. Something like: ptr = malloc(size); my_device_migrate(ptr, size); Which would call an ioctl of the device driver which itself would migrate memory or allocate device memory for the range if pointer return by malloc is not yet back by any pages. There has been several discussions already about madvise/mbind/set_mempolicy/ move_pages and at this time i don't think we want to add or change any of them to understand device memory. My personal opinion is that we first need to have enough upstream user and understand of how it is actualy use before it make sense to try to formalize and define a syscall or change an existing one. User facing API are set in stone and i don't want to design them by making broad assumption on how i think device memory will be use. So for time being i think it is better to use existing device API to manage and give hint to the kernel on where memory should be (ie should device memory be use for some range). The first user of this are GPU and they already have a lot of ioctl to manage and propagate hint from user space. So at this time i suggest that you piggy back on any existing ioctl of your device or add new ioctl. Hope this help. J�r�me From 1580967199042673638@xxx Wed Oct 11 13:16:52 +0000 2017 X-GM-THRID: 1572843623662560165 X-Gmail-Labels: Inbox,Category Forums