Received: by 2002:ac0:aa62:0:0:0:0:0 with SMTP id w31-v6csp2871483ima; Mon, 22 Oct 2018 18:12:47 -0700 (PDT) X-Google-Smtp-Source: ACcGV63Q1XccvpZ6/aWwG5LCjzcjwhfAeUURzKGXJ5QB2OhlULwmBd/Jn04uHFMvhRcq/0gBs2ri X-Received: by 2002:a62:c68e:: with SMTP id x14-v6mr39929756pfk.151.1540257167306; Mon, 22 Oct 2018 18:12:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540257167; cv=none; d=google.com; s=arc-20160816; b=rTO4KQfhokvlsepBQoA5m4QkHqJg+XhXwN9ZxlpvtXur59JTPqX2+WMA6afy7ij5EP a/TP8SHkiyRLG1YhtaijdEj3mCqH0Kf1zWBtgm5ULSOmXKyRQWY6+gfbAPg45Txc2tuz s3ChfZo0HK9IlUSjZKDsACjA/jUISFnXbYHMdAuDOObFVPSlMattuqPz9HuqAPG3aGn6 SBkq/smH/GOMFz07O/x9ZK6NQ/1Pvkr4teigPgowvQugGNQGJcm6l3NGg/fp7r8VIsMp 2jzXG0HF7gIiHxv/KoHtly6B5AvULds1IihOaVDOMWxpAHAsukyLRCjlFGNJA6DHIc6R sI5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=/vbiFRMacvtoJch1EW1cVVUWmDejCJWS6fB1hOkqn7U=; b=XYz0+epYgiwcl+YripmQO9LSRQbk7YuG2JCnqDczLuJPIcqjl1WYsoebUKBNIfAV1H C0AbIx/1/5I+oev1ThQiIFl7jjhV7rZp1V9fky1b87I/bj6XECpylz/i2bZp/5lm7jQB AaiEp81vj6fE29bMH344mDCDV8cMhYQPgxf9dUZxSDJW8UQe9Ap6/vyB8BNP0Un8wHes /9XAVH4ftGfC57TJeSiIpJnwgqZ2lB00DLaMTx8+AIgFSLV8nIo2WAt84Ji8epkxpstl wdhzz088BI7ya06PGbJYnFI2WbUztfE5yMT/8tgFZrbQg/ClcwN1weMDbf58X4VcnnYf x8Jw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=jA7QiM6Q; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c11-v6si35898991pgj.409.2018.10.22.18.12.32; Mon, 22 Oct 2018 18:12:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=jA7QiM6Q; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727944AbeJWJc2 (ORCPT + 99 others); Tue, 23 Oct 2018 05:32:28 -0400 Received: from mail-ot1-f66.google.com ([209.85.210.66]:44051 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725883AbeJWJc2 (ORCPT ); Tue, 23 Oct 2018 05:32:28 -0400 Received: by mail-ot1-f66.google.com with SMTP id p23so41996669otf.11 for ; Mon, 22 Oct 2018 18:11:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=/vbiFRMacvtoJch1EW1cVVUWmDejCJWS6fB1hOkqn7U=; b=jA7QiM6QS5DOqLiSJecChF35p1vzfw09QuCCjMs4msMP1CTCi641iokgM3EP1Ai2In S4izbqmAyhCHhZwI+F3ietA/TEj3nCPHyDpNuJLXnUNSKBqQ0XIRPXqWi303BW9M+HGh CPBoUQsehJJqY+xbGzUL3ftJ+1WwaZK5z7VgTfykOypOi54MWY+3CXLRJQkirI/25cyL WCPw2MHJ794F8VhrjlShrNq9FvD7tL1+qKKj28oh9v3O1wP52P103KDijXKAnKPs2F3k eSEi17wAcJ3gbxIRHDyvx9ravlztq16q9YSFwecSRvi+EJhDHQsdfuwuaAU2DuTVmyX/ BnSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/vbiFRMacvtoJch1EW1cVVUWmDejCJWS6fB1hOkqn7U=; b=FfWvPBbEnFgjL8h3VpJb5sGEYarR/TECNYAz/tx+esBf1m/F6TN7OUVxdmGQEw8ovy nBbBtU8ZhVsGOGClezCOXyP1jkYFmpmw4PlsvIVAt/1zoROUnmYKOStIYuaOMpxc8nrC foZW1Qgq1ugLuKyRanbtsV880xzfStrCT2/P5lvdtJZ7QWMLv2j8wBrfGSJlr2fsG7VI zyNF75409WNXSQv2w4qYcDVFzYyWAx1cIJMMZKUHxdrcNvq0erYaIS+w5tBKQaeMgQCl cDRk4bbG40piLjA4G0MIhsqtWp27YPTO1uJasfoWpdBoBXm5PXnaO4ZF7z0gDwimdDes zicw== X-Gm-Message-State: ABuFfogTNVWa7USn9z66gtK3zF1VkfsTJalAVPEDZVFU8DSeSqLltwBR 1NGhYf3yTPnws7NiF9o5vOsUlueucgv6zeiCfFPaInzs X-Received: by 2002:a9d:7d93:: with SMTP id j19mr28532826otn.32.1540257088493; Mon, 22 Oct 2018 18:11:28 -0700 (PDT) MIME-Version: 1.0 References: <20181022201317.8558C1D8@viggo.jf.intel.com> In-Reply-To: From: Dan Williams Date: Mon, 22 Oct 2018 18:11:17 -0700 Message-ID: Subject: Re: [PATCH 0/9] Allow persistent memory to be used like normal RAM To: Dave Hansen Cc: Linux Kernel Mailing List , Dave Jiang , zwisler@kernel.org, Vishal L Verma , Tom Lendacky , Andrew Morton , Michal Hocko , linux-nvdimm , Linux MM , "Huang, Ying" , Fengguang Wu Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 22, 2018 at 6:05 PM Dan Williams wrote: > > On Mon, Oct 22, 2018 at 1:18 PM Dave Hansen wrote: > > > > Persistent memory is cool. But, currently, you have to rewrite > > your applications to use it. Wouldn't it be cool if you could > > just have it show up in your system like normal RAM and get to > > it like a slow blob of memory? Well... have I got the patch > > series for you! > > > > This series adds a new "driver" to which pmem devices can be > > attached. Once attached, the memory "owned" by the device is > > hot-added to the kernel and managed like any other memory. On > > systems with an HMAT (a new ACPI table), each socket (roughly) > > will have a separate NUMA node for its persistent memory so > > this newly-added memory can be selected by its unique NUMA > > node. > > > > This is highly RFC, and I really want the feedback from the > > nvdimm/pmem folks about whether this is a viable long-term > > perversion of their code and device mode. It's insufficiently > > documented and probably not bisectable either. > > > > Todo: > > 1. The device re-binding hacks are ham-fisted at best. We > > need a better way of doing this, especially so the kmem > > driver does not get in the way of normal pmem devices. > > 2. When the device has no proper node, we default it to > > NUMA node 0. Is that OK? > > 3. We muck with the 'struct resource' code quite a bit. It > > definitely needs a once-over from folks more familiar > > with it than I. > > 4. Is there a better way to do this than starting with a > > copy of pmem.c? > > So I don't think we want to do patch 2, 3, or 5. Just jump to patch 7 > and remove all the devm_memremap_pages() infrastructure and dax_region > infrastructure. > > The driver should be a dead simple turn around to call add_memory() > for the passed in range. The hard part is, as you say, arranging for > the kmem driver to not stand in the way of typical range / device > claims by the dax_pmem device. > > To me this looks like teaching the nvdimm-bus and this dax_kmem driver > to require explicit matching based on 'id'. The attachment scheme > would look like this: > > modprobe dax_kmem > echo dax0.0 > /sys/bus/nd/drivers/dax_kmem/new_id > echo dax0.0 > /sys/bus/nd/drivers/dax_pmem/unbind > echo dax0.0 > /sys/bus/nd/drivers/dax_kmem/bind > > At step1 the dax_kmem drivers will match no devices and stays out of > the way of dax_pmem. It learns about devices it cares about by being > explicitly told about them. Then unbind from the typical dax_pmem > driver and attach to dax_kmem to perform the one way hotplug. > > I expect udev can automate this by setting up a rule to watch for > device-dax instances by UUID and call a script to do the detach / > reattach dance. The next question is how to support this for ranges that don't originate from the pmem sub-system. I expect we want dax_kmem to register a generic platform device representing the range and have a generic platofrm driver that turns around and does the add_memory().