Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2295151imu; Thu, 17 Jan 2019 11:37:33 -0800 (PST) X-Google-Smtp-Source: ALg8bN7POARemfxkJFNyxgY/cSJlcgxCmQB1Nuyle+Ttm5J5CjYV2EGNPx3WpUKdTT/St6YY7LCV X-Received: by 2002:a63:ba4d:: with SMTP id l13mr914189pgu.194.1547753852998; Thu, 17 Jan 2019 11:37:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547753852; cv=none; d=google.com; s=arc-20160816; b=noA/eHVOgscpravaR3us8xGBzSGxJUENrKuYOX3zRykT2J1dixAPPBsOQD575VPSI7 uSScPBG5nWG5JsRnCFTRYcRDeFEykey6YhyFG5ORch2cU05UfizHFPne9nnJEcKBtzt9 FFhb/E3xG3YJYymcYqEWKN/duICEQBd6X8fv1gr0cfv6wkOREr3aj/Txp2r5GIT9JaJE p/4OcwMsAdl8ZHg2qF8WzdEdpRCK3Ou01m2E6atqtHMxXff29so5iDZkuoHZKiYlrKnj hdNE/IxpH9hNJHK9UDtNoXoNvX6pn1VkjIAI/c2QMzyPpb1XWBYd1dl2TQjb9hVeyJ21 mZ3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=BsSgWPnIzhiGfh+wEuHff2+6YHUKHQF9LyUyJaUsTiU=; b=rKzVP0Y9yFX82cjS7D/Uaue7pBaaKAEooqFb18SQOj6qcKfzWpBWpNl28L0B4gf5zK iZN1rSaynt4k7nnJakLef74uKUwwihOM1JPveGqiEBWm/+XdwIhTShTPkTUYiKzrqfpW ZwfjPz3gur+Il9iYY39VlvxLBD/sHouvdlgnkROivRARGNXiIl/I6r9cRsQERqNZJHN0 fj6S+bnY6URVpFrXugtjz/nqPiJseTGs+Z9RKkr2svGOh305+fGsgwV+ZVnPxp/AcsAx MreFsU44A8UN6xMMtqe0F/2WfPLQiKyMXgI1VceCbPHIc4GXDe3tjm0A2BLQ2iyJgRIA 2hZA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 30si2087026pgv.191.2019.01.17.11.37.17; Thu, 17 Jan 2019 11:37:32 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728706AbfAQTfS (ORCPT + 99 others); Thu, 17 Jan 2019 14:35:18 -0500 Received: from mga14.intel.com ([192.55.52.115]:63971 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726821AbfAQTfR (ORCPT ); Thu, 17 Jan 2019 14:35:17 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Jan 2019 11:35:16 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,489,1539673200"; d="scan'208";a="139181997" Received: from unknown (HELO localhost.localdomain) ([10.232.112.69]) by fmsmga001.fm.intel.com with ESMTP; 17 Jan 2019 11:35:16 -0800 Date: Thu, 17 Jan 2019 12:34:03 -0700 From: Keith Busch To: Jeff Moyer Cc: Dave Hansen , thomas.lendacky@amd.com, fengguang.wu@intel.com, dave@sr71.net, linux-nvdimm@lists.01.org, tiwai@suse.de, zwisler@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.com, baiyaowei@cmss.chinamobile.com, ying.huang@intel.com, bhelgaas@google.com, akpm@linux-foundation.org, bp@suse.de Subject: Re: [PATCH 0/4] Allow persistent memory to be used like normal RAM Message-ID: <20190117193403.GD31543@localhost.localdomain> References: <20190116181859.D1504459@viggo.jf.intel.com> <20190117164736.GC31543@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 17, 2019 at 12:20:06PM -0500, Jeff Moyer wrote: > Keith Busch writes: > > On Thu, Jan 17, 2019 at 11:29:10AM -0500, Jeff Moyer wrote: > >> Dave Hansen writes: > >> > Persistent memory is cool. But, currently, you have to rewrite > >> > your applications to use it. Wouldn't it be cool if you could > >> > just have it show up in your system like normal RAM and get to > >> > it like a slow blob of memory? Well... have I got the patch > >> > series for you! > >> > >> So, isn't that what memory mode is for? > >> https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/ > >> > >> Why do we need this code in the kernel? > > > > I don't think those are the same thing. The "memory mode" in the link > > refers to platforms that sequester DRAM to side cache memory access, where > > this series doesn't have that platform dependency nor hides faster DRAM. > > OK, so you are making two arguments, here. 1) platforms may not support > memory mode, and 2) this series allows for performance differentiated > memory (even though applications may not modified to make use of > that...). > > With this patch set, an unmodified application would either use: > > 1) whatever memory it happened to get > 2) only the faster dram (via numactl --membind=) > 3) only the slower pmem (again, via numactl --membind1) > 4) preferentially one or the other (numactl --preferred=) Yes, numactl and mbind are good ways for unmodified applications to use these different memory types when they're available. Tangentially related, I have another series[1] that provides supplementary information that can be used to help make these decisions for platforms that provide HMAT (heterogeneous memory attribute tables). > The other options are: > - as mentioned above, memory mode, which uses DRAM as a cache for the > slower persistent memory. Note that it isn't all or nothing--you can > configure your system with both memory mode and appdirect. The > limitation, of course, is that your platform has to support this. > > This seems like the obvious solution if you want to make use of the > larger pmem capacity as regular volatile memory (and your platform > supports it). But maybe there is some other limitation that motivated > this work? The hardware supported implementation is one way it may be used, and it's up side is that accessing the cached memory is transparent to the OS and applications. They can use memory unaware that this is happening, so it has a low barrier for applications to make use of the large available address space. There are some minimal things software may do that improve this mode, as Dan mentioned in his reply [2], but it is still usable even without such optimizations. On the downside, a reboot would be required if you want to change the memory configuration at a later time, like you decide more or less DRAM as cache is needed. This series has runtime hot pluggable capabilities. It's also possible the customer may know better which applications require more hot vs cold data, but the memory mode caching doesn't give them as much control since the faster memory is hidden. > - libmemkind or pmdk. These options typically* require application > modifications, but allow those applications to actively decide which > data lives in fast versus slow media. > > This seems like the obvious answer for applications that care about > access latency. > > * you could override the system malloc, but some libraries/application > stacks already do that, so it isn't a universal solution. > > Listing something like this in the headers of these patch series would > considerably reduce the head-scratching for reviewers. > > Keith, you seem to be implying that there are platforms that won't > support memory mode. Do you also have some insight into how customers > want to use this, beyond my speculation? It's really frustrating to see > patch sets like this go by without any real use cases provided. Right, most NFIT reporting platforms today don't have memory mode, and the kernel currently only supports the persistent DAX mode with these. This series adds another option for those platforms. I think numactl as you mentioned is the first consideration for how customers may make use. Dave or Dan might have other use cases in mind. Just thinking out loud, if we wanted an in-kernel use case, it may be interesting to make slower memory a swap tier so the host can manage the cache rather than the hardware. [1] https://lore.kernel.org/patchwork/cover/1032688/ [2] https://lore.kernel.org/lkml/154767945660.1983228.12167020940431682725.stgit@dwillia2-desk3.amr.corp.intel.com/