Received: by 10.223.164.202 with SMTP id h10csp2515602wrb; Thu, 16 Nov 2017 17:09:49 -0800 (PST) X-Google-Smtp-Source: AGs4zMaMLOPm7RscDRt3hBcSwb+HFcS4UTWK/4F7tAzgc5ArlGsPHUSiDoigBGbKcwrnB7NXdXAR X-Received: by 10.84.133.111 with SMTP id 102mr3534952plf.136.1510880989191; Thu, 16 Nov 2017 17:09:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510880989; cv=none; d=google.com; s=arc-20160816; b=qvxtWs9j335xdW14mn9fpl3w5zTW52pKtVISDgVKNFq+ttrpBe0CEj6hXwBxN9ItgB Cxu1ac72y9RhUSvb1Udn1tsc9+asPyAIicQwdU+zyqcYhsxzmNm6LJbHMu83bsvY86yl p5zhczao6GBIQWL3WYM6ljMMMbSoSSTJEpos1RQdp4y8/wEAA5juPknA8YJVeRNN4KzG xWMQPo7oUHwzo8Bz7SF8cMO207oiHutw+rrobRq3fIO1lc7S5bxOpP3MuWtdVwSkTXl0 Yac5O8QKEsbWyoOjeKMmvopU4F/vO9tQ0ysIeHuE5JzJiOzpSHKcvEfH1Yw1kLYsoR6r noLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=AGPdcRGaCUYunnFAwqGEXarek6ka0JjK6vH9lpxnP9Y=; b=wpg4Z4aR40nYEiMa2SicZpoUZ+hS3Xmr13UaxvbOt40Xghr7iS3JQnmlKhEJSxxMPK kMU4GBSDD/r9qY6ZkUcdwwmQSucoN73VkHgzFC1RkB6riC3Y5I1NYyfCGMk8eKuTMNY9 gN2cGJaYBNO5WneNb9XQG8E7RBAVen/FcZ84V5Fu9zPquT0eNm1O4RrQprW80IYfal3+ 9eTVX1QbboptnyD7mDnJU+onCEq8ZwHPVNCH9T7kzC/6M0Gr2VraCvS0PAtvTvvF5Z31 Uirwzjg4SXfM7C7VmYtCYeiLhubuqAxOYOQU8uxEeLWUADzfBXQYbQUH3LSNMITHZ1BW ex3w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ay4si1839279plb.478.2017.11.16.17.09.36; Thu, 16 Nov 2017 17:09:49 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935160AbdKPV3O (ORCPT + 92 others); Thu, 16 Nov 2017 16:29:14 -0500 Received: from mx1.redhat.com ([209.132.183.28]:37634 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758992AbdKPV3I (ORCPT ); Thu, 16 Nov 2017 16:29:08 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 64B50C057FA1; Thu, 16 Nov 2017 21:29:08 +0000 (UTC) Received: from redhat.com (ovpn-120-7.rdu2.redhat.com [10.10.120.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C8B225C899; Thu, 16 Nov 2017 21:29:06 +0000 (UTC) Date: Thu, 16 Nov 2017 16:29:04 -0500 From: Jerome Glisse To: chetan L Cc: Bob Liu , David Nellans , John Hubbard , Balbir Singh , "linux-kernel@vger.kernel.org" , Michal Hocko , Linux MM , Dan Williams , Andrew Morton , linux-accelerators@lists.ozlabs.org Subject: Re: [PATCH 0/6] Cache coherent device memory (CDM) with HMM v5 Message-ID: <20171116212904.GA4823@redhat.com> References: <20170926161635.GA3216@redhat.com> <0d7273c3-181c-6d68-3c5f-fa518e782374@huawei.com> <20170930224927.GC6775@redhat.com> <20171012153721.GA2986@redhat.com> <20171116024425.GC2934@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.9.1 (2017-09-22) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Thu, 16 Nov 2017 21:29:08 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 15, 2017 at 07:29:10PM -0800, chetan L wrote: > On Wed, Nov 15, 2017 at 7:23 PM, chetan L wrote: > > On Wed, Nov 15, 2017 at 6:44 PM, Jerome Glisse wrote: > >> On Wed, Nov 15, 2017 at 06:10:08PM -0800, chet l wrote: > >>> >> You may think it as a CCIX device or CAPI device. > >>> >> The requirement is eliminate any extra copy. > >>> >> A typical usecase/requirement is malloc() and madvise() allocate from > >>> >> device memory, then CPU write data to device memory directly and > >>> >> trigger device to read the data/do calculation. > >>> > > >>> > I suggest you rely on the device driver userspace API to do a migration after malloc > >>> > then. Something like: > >>> > ptr = malloc(size); > >>> > my_device_migrate(ptr, size); > >>> > > >>> > Which would call an ioctl of the device driver which itself would migrate memory or > >>> > allocate device memory for the range if pointer return by malloc is not yet back by > >>> > any pages. > >>> > > >>> > >>> So for CCIX, I don't think there is going to be an inline device > >>> driver that would allocate any memory for you. The expansion memory > >>> will become part of the system memory as part of the boot process. So, > >>> if the host DDR is 256GB and the CCIX expansion memory is 4GB, the > >>> total system mem will be 260GB. > >>> > >>> Assume that the 'mm' is taught to mark/anoint the ZONE_DEVICE(or > >>> ZONE_XXX) range from 256 to 260 GB. Then, for kmalloc it(mm) won't use > >>> the ZONE_DEV range. But for a malloc, it will/can use that range. > >> > >> HMM zone device memory would work with that, you just need to teach the > >> platform to identify this memory zone and not hotplug it. Again you > >> should rely on specific device driver API to allocate this memory. > >> > > > > @Jerome - a new linux-accelerator's list has just been created. I have > > CC'd that list since we have overlapping interests w.r.t CCIX. > > > > I cannot comment on surprise add/remove as of now ... will cross the > > bridge later. Note that this is not hotplug strictly speaking. Design today is that it is the device driver that register the memory. From kernel point of view this is an hotplug but for many of the target architecture there is no real hotplug ie device and its memory was present at boot time. Like i said i think for now we are better of having each device manage and register its memory. HMM provide a toolbox for that. If we see common trend accross multiple devices then we can think about making something more generic. For the NUMA discussion this is related to CPU less node ie not wanting to add any more CPU less node (node with only memory) and they are other aspect too. For instance you do not necessarily have good informations from the device to know if a page is access a lot by the device (this kind of information is often only accessible by the device driver). Thus the automatic NUMA placement is useless here. Not mentioning that for it to work we would need to change how it currently work (iirc there is issue when you not have a CPU id you can use). Cheers, J�r�me From 1584198340406960594@xxx Thu Nov 16 05:14:29 +0000 2017 X-GM-THRID: 1572843623662560165 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread