Received: by 2002:ac0:aa62:0:0:0:0:0 with SMTP id w31-v6csp2683966ima; Mon, 22 Oct 2018 14:06:59 -0700 (PDT) X-Google-Smtp-Source: ACcGV625EO63Il9nDvvj/RjooNrtPn9nyHcj2JObawjP0zmjP+bC2XEjxu8zh/G9vHtp6Zc9DtNK X-Received: by 2002:a65:47cb:: with SMTP id f11-v6mr45231002pgs.166.1540242419065; Mon, 22 Oct 2018 14:06:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540242419; cv=none; d=google.com; s=arc-20160816; b=XG48DYUsvi4nMqfOHizoEe3UiGcITNxxL14lelvH69GTJDhkZvbSx219Bi4U3PIMBY VDmz5JUWqX+5kjoY9k8r0TwDUGJBnSyKnA5g8y3nJd3ZIZJ7FIFnZjg5+tpE0FmiIRY2 OLQeklNuSz7gUdDVXe7BA/swsjejQDS2umiABxXNR9jRYrDTdKRd5hPQB1qWFG/qsog8 yHrZTKQ8RkI3DAl4tVAozC4STlxr7LY39jwVuJFW2QVuDDa7RlpVRHmR+1uiKYp6Zc1G IL5tcMJWwD60T/3WDSelfRRtmcyN0tYdCMLDbt/vCi8LoQaXgxHaw49BascIHXClOxhJ ICAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:from:cc:to:subject; bh=uYk5rP2umlt13bDHWk/ZvrxW3Md/SVJc/yTtulukFnA=; b=Yd7bdMrv8KfYkZY0oZVVlSm+VvybtpwSeBNasew/yqxKnegWyKOGaGN/oMIXGsFe7Y d6H4WbZ034jay+LxuMU9QXbtv6XWBQnqilUeLENkf/QBN4MPElPNLZYkrLFT9iC/63pV /224DmUK43df65ux9QwlVR7U2xrWI/8I+tkPnsi04p6xe4k6M+T/y3Tf7126WgtUdPnS azS9P/jzafz0HPdzqwxe8uP297LSlsdAyQGoDHYVTDD3GWThKkkk/ezgvxyRCI3sSDVf Cv3ONwLhQ2kcgqr4twsMuSbKUzPqIV2N7sfTFiha9mDRNyKMI2SfqlAzRo0Hs6c3FGHm zM/Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b2-v6si32788940pgw.310.2018.10.22.14.06.43; Mon, 22 Oct 2018 14:06:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727089AbeJWEie (ORCPT + 99 others); Tue, 23 Oct 2018 00:38:34 -0400 Received: from mga18.intel.com ([134.134.136.126]:36318 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725788AbeJWEie (ORCPT ); Tue, 23 Oct 2018 00:38:34 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Oct 2018 13:18:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,413,1534834800"; d="scan'208";a="83549665" Received: from viggo.jf.intel.com (HELO localhost.localdomain) ([10.54.77.144]) by orsmga008.jf.intel.com with ESMTP; 22 Oct 2018 13:18:35 -0700 Subject: [PATCH 0/9] Allow persistent memory to be used like normal RAM To: linux-kernel@vger.kernel.org Cc: Dave Hansen , dan.j.williams@intel.com, dave.jiang@intel.com, zwisler@kernel.org, vishal.l.verma@intel.com, thomas.lendacky@amd.com, akpm@linux-foundation.org, mhocko@suse.com, linux-nvdimm@lists.01.org, linux-mm@kvack.org, ying.huang@intel.com, fengguang.wu@intel.com From: Dave Hansen Date: Mon, 22 Oct 2018 13:13:17 -0700 Message-Id: <20181022201317.8558C1D8@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Persistent memory is cool. But, currently, you have to rewrite your applications to use it. Wouldn't it be cool if you could just have it show up in your system like normal RAM and get to it like a slow blob of memory? Well... have I got the patch series for you! This series adds a new "driver" to which pmem devices can be attached. Once attached, the memory "owned" by the device is hot-added to the kernel and managed like any other memory. On systems with an HMAT (a new ACPI table), each socket (roughly) will have a separate NUMA node for its persistent memory so this newly-added memory can be selected by its unique NUMA node. This is highly RFC, and I really want the feedback from the nvdimm/pmem folks about whether this is a viable long-term perversion of their code and device mode. It's insufficiently documented and probably not bisectable either. Todo: 1. The device re-binding hacks are ham-fisted at best. We need a better way of doing this, especially so the kmem driver does not get in the way of normal pmem devices. 2. When the device has no proper node, we default it to NUMA node 0. Is that OK? 3. We muck with the 'struct resource' code quite a bit. It definitely needs a once-over from folks more familiar with it than I. 4. Is there a better way to do this than starting with a copy of pmem.c? Here's how I set up a system to test this thing: 1. Boot qemu with lots of memory: "-m 4096", for instance 2. Reserve 512MB of physical memory. Reserving a spot a 2GB physical seems to work: memmap=512M!0x0000000080000000 This will end up looking like a pmem device at boot. 3. When booted, convert fsdax device to "device dax": ndctl create-namespace -fe namespace0.0 -m dax 4. In the background, the kmem driver will probably bind to the new device. 5. Now, online the new memory sections. Perhaps: grep ^MemTotal /proc/meminfo for f in `grep -vl online /sys/devices/system/memory/*/state`; do echo $f: `cat $f` echo online > $f grep ^MemTotal /proc/meminfo done Cc: Dan Williams Cc: Dave Jiang Cc: Ross Zwisler Cc: Vishal Verma Cc: Tom Lendacky Cc: Andrew Morton Cc: Michal Hocko Cc: linux-nvdimm@lists.01.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: Huang Ying Cc: Fengguang Wu