Received: by 2002:ac0:98c7:0:0:0:0:0 with SMTP id g7-v6csp932159imd; Fri, 26 Oct 2018 21:47:40 -0700 (PDT) X-Google-Smtp-Source: AJdET5cZ5KHJXZDxWFC4eeQE9jcp/xxWBj426/3l6K5sT2kWsAoj0Vv//FOjxWtyFJNkSfmsA/Kh X-Received: by 2002:a62:5b43:: with SMTP id p64-v6mr6660873pfb.122.1540615660652; Fri, 26 Oct 2018 21:47:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540615660; cv=none; d=google.com; s=arc-20160816; b=wHjfzVrsXHaT7tV0p9i44ghJr9J6hEKg1/CxKTjEWwX92MJCAcIDkMkel2BlRxhbTV CvvXjHS+I/vopAMZb11qBWxrRo+rmhwbB2oWMyR0DGMm+jeTPkv738VTA9BLtcvRbAvY jeVwZr5vUO6CxmHIfPp48NJrtjXldHwBIMswWUCzmjIa+3r7Mo3cThkhnJOmey8GB+Wh n4deYzyTpUMKVr2G57GCig1qV12rK//ezChgMdau1WrowyxWw+QJSn6rggh6tcpULkaw UcMgUKKlOMM8ou36BZIm90h4vE2hsPN7Saqf4w09MPRwvTLwRhtN6gZJ2wCKatk9Cjdl uBhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=MXawhzQWODaXCLjs2ljpcbscFVjI+kqDvBknjO5m0w8=; b=fTPCujjwud5Kle81dzUKx/nkFVOYBrhg7oW3yECDdViPmwRTp3L45Exp6jSN1sGeoj lp5hpmqa5XGkyj/uzuNp/5tcuOXsQ+XnjIywoydHfGu83erDHGuUg3bbOrW7nzaF6uv0 tUBMl6Y/DAukltVYwtcbry2WH48zRHC5//PEyLvlTN80k8SG5bg5TQSjz4DxU38bUCoh YbjirE+M4+7kgrtSXUm1MRwRcnEVuGar5qsE49v4f9OAz7zpqvGZfJK1wxIJBCjjzGEN 5h5DyZiMk9VYcKYq1cKBmdQsZ+k98ojq01cKNbl2VVCzjYZ+Oa+6+SNhIKiry7pXTtV5 1CTA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=04QRpWIq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m128-v6si11691786pfm.159.2018.10.26.21.47.00; Fri, 26 Oct 2018 21:47:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=04QRpWIq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727682AbeJ0NZY (ORCPT + 99 others); Sat, 27 Oct 2018 09:25:24 -0400 Received: from mail-ot1-f67.google.com ([209.85.210.67]:46418 "EHLO mail-ot1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726565AbeJ0NZY (ORCPT ); Sat, 27 Oct 2018 09:25:24 -0400 Received: by mail-ot1-f67.google.com with SMTP id u26so2942683otk.13 for ; Fri, 26 Oct 2018 21:45:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=MXawhzQWODaXCLjs2ljpcbscFVjI+kqDvBknjO5m0w8=; b=04QRpWIqRqq4voPiNKklfRVVOKAfXZ83nV6kfRLc0oe+jhZF64sX3KlITg2+TdaBXT nupnZpFHf8GS0R1PEOL2crL5kwfnv6APJ2DgoCWG4fOSm19I1v7W/mwThKDnsUb2d1Tp 3bUOUpf40tFEjLvJPH6kVs6snDyKin42UnNtjZe5ZN0ztuRLjj8u1xbs2Mu+ReK+m1/3 BLoDbU3zgMnD2q/6kcBsO3atnVMHa62rEbNXudn7zu7hWs+noZLDtj5rxQ8zpIQB21R6 R44jEA0DNOzjSBMRn/KjOp769fli6rpVpubxvRk3s5/YMQ/0t+fR7JNQerYOMr0e354I 4LOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=MXawhzQWODaXCLjs2ljpcbscFVjI+kqDvBknjO5m0w8=; b=Y72WgeY/RjQZ81XAcMgqQ8xQNf5gRWQDCR1NFG+U06uOCNLGY7Jedw+sjKNu80Clp9 7oD5Cif2WaBG6Usfy9VVLbWq5Fpx4TxtXQDFb1syA3i2n+WkHKWiW2J6cTrht+Va4qQB MAJl69o/9FCCdQ5VgQ7kggJayYDuM65U1+4r4g87Xrfi774y+ubNHVXToVC2eJC52Krk 36nq5ehy27SaKlM9V7VTbSKa3EsqsvLVVZV1dBKkgMpW3fJCIINOTQR8KSMLtgjjWO6b ELUVGlsWyHpkcfQEfqmkgh59H9jup5mS/AnONL2Qj10gxFtXqsBXhU51V2EfWLsXBtdx 0OrA== X-Gm-Message-State: AGRZ1gIFNh2OpsXI+vHT2gzdEm72TI4Vny/rO/1po69Lq215pFoQ3cqn ldWcMphp89zELu536HUtjvIza/2toDW4smaFdL611A== X-Received: by 2002:a9d:24c7:: with SMTP id z65mr3692245ota.229.1540615542466; Fri, 26 Oct 2018 21:45:42 -0700 (PDT) MIME-Version: 1.0 References: <20181022201317.8558C1D8@viggo.jf.intel.com> In-Reply-To: From: Dan Williams Date: Fri, 26 Oct 2018 21:45:30 -0700 Message-ID: Subject: Re: [PATCH 0/9] Allow persistent memory to be used like normal RAM To: Dave Hansen Cc: Linux Kernel Mailing List , Dave Jiang , zwisler@kernel.org, Vishal L Verma , Tom Lendacky , Andrew Morton , Michal Hocko , linux-nvdimm , Linux MM , "Huang, Ying" , Fengguang Wu Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 22, 2018 at 6:11 PM Dan Williams wrote: > > On Mon, Oct 22, 2018 at 6:05 PM Dan Williams wrote: > > > > On Mon, Oct 22, 2018 at 1:18 PM Dave Hansen wrote: > > > > > > Persistent memory is cool. But, currently, you have to rewrite > > > your applications to use it. Wouldn't it be cool if you could > > > just have it show up in your system like normal RAM and get to > > > it like a slow blob of memory? Well... have I got the patch > > > series for you! > > > > > > This series adds a new "driver" to which pmem devices can be > > > attached. Once attached, the memory "owned" by the device is > > > hot-added to the kernel and managed like any other memory. On > > > systems with an HMAT (a new ACPI table), each socket (roughly) > > > will have a separate NUMA node for its persistent memory so > > > this newly-added memory can be selected by its unique NUMA > > > node. > > > > > > This is highly RFC, and I really want the feedback from the > > > nvdimm/pmem folks about whether this is a viable long-term > > > perversion of their code and device mode. It's insufficiently > > > documented and probably not bisectable either. > > > > > > Todo: > > > 1. The device re-binding hacks are ham-fisted at best. We > > > need a better way of doing this, especially so the kmem > > > driver does not get in the way of normal pmem devices. > > > 2. When the device has no proper node, we default it to > > > NUMA node 0. Is that OK? > > > 3. We muck with the 'struct resource' code quite a bit. It > > > definitely needs a once-over from folks more familiar > > > with it than I. > > > 4. Is there a better way to do this than starting with a > > > copy of pmem.c? > > > > So I don't think we want to do patch 2, 3, or 5. Just jump to patch 7 > > and remove all the devm_memremap_pages() infrastructure and dax_region > > infrastructure. > > > > The driver should be a dead simple turn around to call add_memory() > > for the passed in range. The hard part is, as you say, arranging for > > the kmem driver to not stand in the way of typical range / device > > claims by the dax_pmem device. > > > > To me this looks like teaching the nvdimm-bus and this dax_kmem driver > > to require explicit matching based on 'id'. The attachment scheme > > would look like this: > > > > modprobe dax_kmem > > echo dax0.0 > /sys/bus/nd/drivers/dax_kmem/new_id > > echo dax0.0 > /sys/bus/nd/drivers/dax_pmem/unbind > > echo dax0.0 > /sys/bus/nd/drivers/dax_kmem/bind > > > > At step1 the dax_kmem drivers will match no devices and stays out of > > the way of dax_pmem. It learns about devices it cares about by being > > explicitly told about them. Then unbind from the typical dax_pmem > > driver and attach to dax_kmem to perform the one way hotplug. > > > > I expect udev can automate this by setting up a rule to watch for > > device-dax instances by UUID and call a script to do the detach / > > reattach dance. > > The next question is how to support this for ranges that don't > originate from the pmem sub-system. I expect we want dax_kmem to > register a generic platform device representing the range and have a > generic platofrm driver that turns around and does the add_memory(). I forgot I have some old patches that do something along these lines and make device-dax it's own bus. I'll dust those off so we can discern what's left.