Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp802606imu; Fri, 25 Jan 2019 11:15:44 -0800 (PST) X-Google-Smtp-Source: ALg8bN5jFRj+7U33dUDkeESuNC7MJZSBpQekmRgUeD84Lcq4YF60dtgKiV0D1fL94qWNoxY48ScR X-Received: by 2002:a17:902:bd0b:: with SMTP id p11mr12316112pls.259.1548443744212; Fri, 25 Jan 2019 11:15:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548443744; cv=none; d=google.com; s=arc-20160816; b=YpfMM5hNR2AqS3yd8hSU/hLpckgSvx7xhkNosYz9suqpZGLaHUaBiCp4lNL/x4OccD 4BsjYSW0dTX0ea7LLK0TGEe+5ruTQilj9XwDEREbk05x2wtL5YIiCh153nhYwzw/JONQ Ryk+QJwGMhThpcUyghxvt4solFmGLH+b0hFT5sI6gLIx52czbUw6Q5091wED2OAj9xzh /nGobTs2M708xfaulQlKHyKTtY2pZPZZSTIEOay3ebkp0TZuPC/BIulEW300O9kgBGJm oSz/NP+DHv+Ds+5W7cTDPJO6r2pYx+v9NCnFgrP0+X8QF9ru4WutJKCzZwURDuHOf8TR 4Jmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=FXX004AUcxBjaHmlWfnuskZBXjN2zPH5jsYaxqyIBbA=; b=DW06JAL+JlFD+4FyVb2x7F0ACaqXPu0H4xUHZHeir6DkNwfx1mOwN29LTE+0hKopQH 4KKXXIOaP29z1eulJ4pB/t89VuxS/i8ittCAChEwzFGr6ibBE1tOYArwPJParDgvnuSH 3MdScPTV1BvF41nDLT9lhjMHuTQTHokS+dD8UZgM8b332i0BEYdoffjSyWmVCCLZhVBx wbgEx6bQR2z1f31ledNCuszoGl+q24OL8wdx2P7ycVR30tqMacVoBhKMSH/p9h7i4opu kzHEXH7zx7HcB9oDcgDMSG1hi1ZcFSrXenGfFP8bpdVq03XK1tcR7bSJnZWgD8XoPrfG 64kQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=j0naHyGZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k22si25569117pls.14.2019.01.25.11.15.28; Fri, 25 Jan 2019 11:15:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=j0naHyGZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728445AbfAYTPX (ORCPT + 99 others); Fri, 25 Jan 2019 14:15:23 -0500 Received: from mail-ot1-f66.google.com ([209.85.210.66]:44080 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726179AbfAYTPV (ORCPT ); Fri, 25 Jan 2019 14:15:21 -0500 Received: by mail-ot1-f66.google.com with SMTP id g16so5509373otg.11 for ; Fri, 25 Jan 2019 11:15:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FXX004AUcxBjaHmlWfnuskZBXjN2zPH5jsYaxqyIBbA=; b=j0naHyGZb7bTOZgS5cKphb1EaD6TDe8yYOuc3+W4EFKKF0dqoNYQZ0C4GxAOqGhQ0X pDtgQ8ti8VVxXkKWPEGq6sFIatbuZTQJsxj9O6U9667pmeSAPODPTPkR+ObAPZ7TYKAp b+S3oydSEbwteEh7fUofOubhcbLuzrle7wQ6ofNxh5mE5DmTJaj0CZHHIh1JePCh7Tpd no+lYtsUgMbOTmxgMlqB61uSNA2maTzIvV31tyt+a4Dj2bm94JzAvP/J3c+XsxUY15rR Hxif60hGnf80FZo9hzMvwmIq05FLlUTS0Ai7LIWG5rW6TKsGwyfUHyaPv9JrXdcZVs0/ 4tCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FXX004AUcxBjaHmlWfnuskZBXjN2zPH5jsYaxqyIBbA=; b=U6pPiqBvKn7Y7dSMBaY5MWkDbGuhQlH/VMm01WTAsDJCjkbVomaFlrz8HYtsTOcmyG uhR4it/P9GJZ8fd9USDA4iAaCTxeeNvglDcNM84p2UrnCVntZf4Mmt/DXjCP+RtSiMIw M7DuAt3hpCFaQmFH2rwhKhi1FDeOqLob6yvd6mvlX5jhTwkEsT7oJtsA1eTU+WjP+kzB TexfiMZnywgNEUEu90wepl2LvSwF+SNEVBlUqgNm0EgzCvCET6p2v+HrfZjrUTz0KrsO HrhUi59TsDwF2rlU4WMTgtXCyRJiYMB54FSgnswfqyoZ2fCNFSgG60W+YFl0cXU6qG/3 6KNg== X-Gm-Message-State: AJcUukd5e31weJORQcqobhMoxa7K63MiSoUSxGKkdAaL64lZffzzl9il 23IiDMUdE2HYSq/LvwyxFvEZjJrEGeeEqO1p3fPXog== X-Received: by 2002:a9d:6a50:: with SMTP id h16mr8256461otn.95.1548443719889; Fri, 25 Jan 2019 11:15:19 -0800 (PST) MIME-Version: 1.0 References: <20190124231441.37A4A305@viggo.jf.intel.com> <20190124231448.E102D18E@viggo.jf.intel.com> <0852310e-41dc-dc96-2da5-11350f5adce6@oracle.com> <5A90DA2E42F8AE43BC4A093BF067884825733A5B@SHSMSX104.ccr.corp.intel.com> In-Reply-To: From: Dan Williams Date: Fri, 25 Jan 2019 11:15:08 -0800 Message-ID: Subject: Re: [PATCH 5/5] dax: "Hotplug" persistent memory for use like normal RAM To: Jane Chu Cc: "Verma, Vishal L" , "Du, Fan" , "linux-kernel@vger.kernel.org" , "bp@suse.de" , "linux-mm@kvack.org" , "dave.hansen@linux.intel.com" , "tiwai@suse.de" , "akpm@linux-foundation.org" , "linux-nvdimm@lists.01.org" , "jglisse@redhat.com" , "zwisler@kernel.org" , "mhocko@suse.com" , "baiyaowei@cmss.chinamobile.com" , "thomas.lendacky@amd.com" , "Wu, Fengguang" , "Huang, Ying" , "bhelgaas@google.com" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 25, 2019 at 11:10 AM Jane Chu wrote: > > > On 1/25/2019 10:20 AM, Verma, Vishal L wrote: > > > > On Fri, 2019-01-25 at 09:18 -0800, Dan Williams wrote: > >> On Fri, Jan 25, 2019 at 12:20 AM Du, Fan wrote: > >>> Dan > >>> > >>> Thanks for the insights! > >>> > >>> Can I say, the UCE is delivered from h/w to OS in a single way in > >>> case of machine > >>> check, only PMEM/DAX stuff filter out UC address and managed in its > >>> own way by > >>> badblocks, if PMEM/DAX doesn't do so, then common RAS workflow will > >>> kick in, > >>> right? > >> > >> The common RAS workflow always kicks in, it's just the page state > >> presented by a DAX mapping needs distinct handling. Once it is > >> hot-plugged it no longer needs to be treated differently than "System > >> RAM". > >> > >>> And how about when ARS is involved but no machine check fired for > >>> the function > >>> of this patchset? > >> > >> The hotplug effectively disconnects this address range from the ARS > >> results. They will still be reported in the libnvdimm "region" level > >> badblocks instance, but there's no safe / coordinated way to go clear > >> those errors without additional kernel enabling. There is no "clear > >> error" semantic for "System RAM". > >> > > Perhaps as future enabling, the kernel can go perform "clear error" for > > offlined pages, and make them usable again. But I'm not sure how > > prepared mm is to re-accept pages previously offlined. > > > > Offlining a DRAM backed page due to an UC makes sense because > a. the physical DRAM cell might still have an error > b. power cycle, scrubing could potentially 'repair' the DRAM cell, > making the page usable again. > > But for a PMEM backed page, neither is true. If a poison bit is set in > a page, that indicates the underlying hardware has completed the repair > work, all that's left is for software to recover. Secondly, because > poison is persistent, unless software explicitly clear the bit, > the page is permanently unusable. Not permanently... system-owner always has the option to use the device-DAX and ARS mechanisms to clear errors at the next boot. There's just no kernel enabling to do that automatically as a part of this patch set. However, we should consider this along with the userspace enabling to control which device-dax instances are set aside for hotplug. It would make sense to have a "clear errors before hotplug" configuration option.