Received: by 2002:ac0:aa62:0:0:0:0:0 with SMTP id w31-v6csp472958ima; Fri, 26 Oct 2018 01:06:19 -0700 (PDT) X-Google-Smtp-Source: AJdET5fniYIjBKBRJtLkWP/q7yP2wJCoZVri1v8OyUTAONvOfbdtiFR/zKGGu1cYZS572ews4meo X-Received: by 2002:a17:902:b784:: with SMTP id e4-v6mr2472863pls.45.1540541179534; Fri, 26 Oct 2018 01:06:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540541179; cv=none; d=google.com; s=arc-20160816; b=LWx+26D/NKc+oRd/6XS1jolcPTWJsDdM7meHqO6/PIXpSyoVQe+blM8wfnzI8l3Xu+ ZjT2WS4JD4UJhu9b9ryYhUhLOrQkXIn/8f/dmTAx0CenOsKiyDL5GjhZxTN5JqxQPXDe eLe1jDV4Jrnr7lqeWIzWg4LeakYnlu6XeVBWewoQdWSc689QC68zfXwl6b3a6IFNRMYe lyJs4tp0in3n0METhW3dQeZMwITvtbc0JroAf+HQO4zmHnjQ70Pum6BuMc41WBeRWiuZ MXw4iTrfxVwhD6kgqdSfhy5U+XKhfuAEWsbV9iRIifr6CRjQr3mdh6bp5sVBbzcdsbYl 8nUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=Iii6pxs0o3Dq+7wL6FagJmNAA3Zr9SL7dypWcq5uNsM=; b=u0BsoTc2RoOW28UtuKqZJlm6A26gA+ZfE9bG6/pjxVzTij5QZOZEOp2AeolFFKmvBD jkA3BrzYQcG8Whpe+DmzP67E1tAP2lObxDchRTvyuwaFDoMWd3mFG+abhMe4OmXUqCgL ti0qsv8SUceJLY1xaxEg2b2LIu9wa9rl2TII7l/6Hz/EOjncDYrVofftQwOpkjZaCcxg Uq4zkGWXi2czqTHmYSk3JqD9Ap76UHpNb0bNqPblM1NsbI7/PoyIY/T4y5ZZdOJ+ebOH bhQn/AEYa34AWfcuHkywr+3iEH9sLsKDEvrvNOFLBMNtx3RPzrnR+vcxcUBP+vQJbL3N taeA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=lD74vVZ7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s186-v6si1893253pfb.129.2018.10.26.01.06.03; Fri, 26 Oct 2018 01:06:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=lD74vVZ7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726149AbeJZQkI (ORCPT + 99 others); Fri, 26 Oct 2018 12:40:08 -0400 Received: from mail-oi1-f194.google.com ([209.85.167.194]:43423 "EHLO mail-oi1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725907AbeJZQkI (ORCPT ); Fri, 26 Oct 2018 12:40:08 -0400 Received: by mail-oi1-f194.google.com with SMTP id s69-v6so226251oie.10 for ; Fri, 26 Oct 2018 01:04:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=Iii6pxs0o3Dq+7wL6FagJmNAA3Zr9SL7dypWcq5uNsM=; b=lD74vVZ7tqHrFMD3DvXrR0Jkqyo5J3w+7uKXnVsz7OfkKVM6m1xmr7MPSvlj0RT62V ysXSTBCkeNqAPhNthx/+LmD/szIJNl7zY6LG6LNdod4vc2JoKr1zm56hZFP62h1NQ8RA WuPcqBzVYsEk0WXcVlRJes36PO0yczteYtlg3vVyZoyetL6u+gtzaMioZwJyR4OjiekS dztUDUg6WJAZYuxOb7j/8dVu+8JcgtNKygVnAmEQcuvpXdgtC2F2RlGFr+b82ROI2HbM TV7HsBGbkpI4trAjalVRS3TsNCORjAUph+J4Czz/Wiw44hRAnZDdToMi/JJ/P0MqO9Ul jX0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Iii6pxs0o3Dq+7wL6FagJmNAA3Zr9SL7dypWcq5uNsM=; b=Ab1c7f+0oj3MYURua6QKFmfoWyMq+0+Ns9/rJmCMIUjswHV0jax9eGGN5qh0vVzCGj CpBRBsGCL9Ww7KaCdbIvw665yo9N1CWIWV3vaGrbvN9/otNqnLVF/SYF4wtfxyye8nTd 4k99oGO1XOtufgCAo21xehEwOh8s2IxQ+uvez100nnVvlDEPom26ekaDKmC3++0zvd0/ mlQ5IN5QWNdUgllCNHbgmo1bEq5evW0K3deODBUo9Ia/edCzjyyIVco+UDnQoVO5b5GE O2fGFQ+h6LLJ440khC6s8Sbxlw60W/nhpqNmW78v0++gU77bNng0gYUdB9Nzs+lSjL98 urqg== X-Gm-Message-State: AGRZ1gLDxybi7wprjKj1v32aFJ2YEGFF5y0zEzKI/gm6M6yC6EoftTsI yx2BgXEAIPXutX7Ik0QzVN8= X-Received: by 2002:aca:d841:: with SMTP id p62-v6mr1405412oig.118.1540541045900; Fri, 26 Oct 2018 01:04:05 -0700 (PDT) Received: from [10.211.55.3] ([47.89.83.47]) by smtp.gmail.com with ESMTPSA id q16sm3220752otf.35.2018.10.26.01.04.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 26 Oct 2018 01:04:05 -0700 (PDT) Subject: Re: [PATCH 0/9] Allow persistent memory to be used like normal RAM To: Dan Williams , Dave Hansen Cc: Linux Kernel Mailing List , Dave Jiang , zwisler@kernel.org, Vishal L Verma , Tom Lendacky , Andrew Morton , Michal Hocko , linux-nvdimm , Linux MM , "Huang, Ying" , Fengguang Wu , Xishi Qiu , zy107165@alibaba-inc.com References: <20181022201317.8558C1D8@viggo.jf.intel.com> From: Xishi Qiu Message-ID: <352acc87-a6da-65e4-bbe6-0dbffdc72acc@gmail.com> Date: Fri, 26 Oct 2018 16:03:38 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Dan, How about let the BIOS report a new type for kmem in e820 table? e.g. #define E820_PMEM 7 #define E820_KMEM 8 Then pmem and kmem are separately, and we can easily hotadd kmem to the memory subsystem, no disturb the existing code (e.g. pmem, nvdimm, dax...). I don't know whether Intel will change some hardware features for pmem which used like a volatility memory in the future. Perhaps faster than pmem, cheaper, but volatility, and no need to care about atomicity, consistency, L2/L3 cache... Another question, why call it kmem? what does the "k" mean? Thanks, Xishi Qiu On 2018/10/23 09:11, Dan Williams wrote: > On Mon, Oct 22, 2018 at 6:05 PM Dan Williams wrote: >> >> On Mon, Oct 22, 2018 at 1:18 PM Dave Hansen wrote: >>> >>> Persistent memory is cool. But, currently, you have to rewrite >>> your applications to use it. Wouldn't it be cool if you could >>> just have it show up in your system like normal RAM and get to >>> it like a slow blob of memory? Well... have I got the patch >>> series for you! >>> >>> This series adds a new "driver" to which pmem devices can be >>> attached. Once attached, the memory "owned" by the device is >>> hot-added to the kernel and managed like any other memory. On >>> systems with an HMAT (a new ACPI table), each socket (roughly) >>> will have a separate NUMA node for its persistent memory so >>> this newly-added memory can be selected by its unique NUMA >>> node. >>> >>> This is highly RFC, and I really want the feedback from the >>> nvdimm/pmem folks about whether this is a viable long-term >>> perversion of their code and device mode. It's insufficiently >>> documented and probably not bisectable either. >>> >>> Todo: >>> 1. The device re-binding hacks are ham-fisted at best. We >>> need a better way of doing this, especially so the kmem >>> driver does not get in the way of normal pmem devices. >>> 2. When the device has no proper node, we default it to >>> NUMA node 0. Is that OK? >>> 3. We muck with the 'struct resource' code quite a bit. It >>> definitely needs a once-over from folks more familiar >>> with it than I. >>> 4. Is there a better way to do this than starting with a >>> copy of pmem.c? >> >> So I don't think we want to do patch 2, 3, or 5. Just jump to patch 7 >> and remove all the devm_memremap_pages() infrastructure and dax_region >> infrastructure. >> >> The driver should be a dead simple turn around to call add_memory() >> for the passed in range. The hard part is, as you say, arranging for >> the kmem driver to not stand in the way of typical range / device >> claims by the dax_pmem device. >> >> To me this looks like teaching the nvdimm-bus and this dax_kmem driver >> to require explicit matching based on 'id'. The attachment scheme >> would look like this: >> >> modprobe dax_kmem >> echo dax0.0 > /sys/bus/nd/drivers/dax_kmem/new_id >> echo dax0.0 > /sys/bus/nd/drivers/dax_pmem/unbind >> echo dax0.0 > /sys/bus/nd/drivers/dax_kmem/bind >> >> At step1 the dax_kmem drivers will match no devices and stays out of >> the way of dax_pmem. It learns about devices it cares about by being >> explicitly told about them. Then unbind from the typical dax_pmem >> driver and attach to dax_kmem to perform the one way hotplug. >> >> I expect udev can automate this by setting up a rule to watch for >> device-dax instances by UUID and call a script to do the detach / >> reattach dance. > > The next question is how to support this for ranges that don't > originate from the pmem sub-system. I expect we want dax_kmem to > register a generic platform device representing the range and have a > generic platofrm driver that turns around and does the add_memory(). >