Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp1994184pxb; Thu, 4 Nov 2021 12:06:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwKWouPby5W2zlx2m34yfQTVNuU4odvJ/ajUowlSS6tZefOvkKkUhx/S4/+Cd1G3Stg+vo0 X-Received: by 2002:a17:906:180a:: with SMTP id v10mr64871469eje.112.1636052773199; Thu, 04 Nov 2021 12:06:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1636052773; cv=none; d=google.com; s=arc-20160816; b=WdNDzRwgfssaubc32sjZCVrUrp+d9RhXpznpiij/wdBuweiwE5RBJTJcVdX5IBYFzO 4UHqzU2Smyrv6yMuZjGpB/x1MA/2dI/8uzkImc+RwV0Zbt0Oi5zjEXGca2bj+ftJHJUp ioqUCX79Rvl5psHMvHJHtb3d9iCd3FaYmmu+FUi890O+ErDXpIML1i5Os8bgonhiBwPv 6Vrg47wDAfhG9o0afhWa4kO5oiethtKF4X7bA46j67PZUpE3Jnm3cJlqXMsd8mSs0csD BcLz7UADXQgIO97SaviUak/8VWBCYLE+OdqyQpreoLpWuWEzssm0+1AthKQSkC5913vs QsYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=m2k2ICcLxOkYpmYgks+aaOogS3Up01M0dUCMOamz1iE=; b=vMvG48wca9d6SsWWnYOJUr1pvA+IxyCPCJCm04/xalSM1W5cMNwLRjzohTGlZQ4yVk WfLloXC2hJdyv/5V6Qv7GP8nSyPIp7BK2P5v0uRHtwLxjNpPMV41s68/Hx1fP0tttiWP 7xkTa9FydSeosOn7pY1BOdZE/APSJMCZzxKwNJiFfczU/WaK6nFRtOAXOUFTXD+S1Mi2 1NINJZ2V2cU7ichRTQzHF/Q3+EmM72zhUU4optC1m5mSimjJvoRy8JLrDMNmS9i95zBt Us+6IUrmSA+89mK0BYp6L0iAam2QgGBeBd1CuhahWmEiVRLp83xGYCqodNwI4VDttEs9 X5aQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b=qztWzrOT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hg3si12026834ejc.295.2021.11.04.12.05.48; Thu, 04 Nov 2021 12:06:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b=qztWzrOT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233901AbhKDTDD (ORCPT + 99 others); Thu, 4 Nov 2021 15:03:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231823AbhKDTDC (ORCPT ); Thu, 4 Nov 2021 15:03:02 -0400 Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CDA8C061714 for ; Thu, 4 Nov 2021 12:00:24 -0700 (PDT) Received: by mail-pl1-x62e.google.com with SMTP id o14so8831244plg.5 for ; Thu, 04 Nov 2021 12:00:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=m2k2ICcLxOkYpmYgks+aaOogS3Up01M0dUCMOamz1iE=; b=qztWzrOTLAIveH0jJT+My1lV60S9NBBuoZGvf3bT2cCiy1amPUVAOa8U4/K7xanNV1 YhcFxVjtQd99EYpqKdvI2mpDLpo+IvoLQgmSfd36JPAmmGu0GSgIsg2c1k9wjL6Aw7IP v1+p7NJ2mn9mi1aSVMCmwnvf4Fvyss5eFxhoctfmhjUwls1GCBfLPg70Z6rCNN1KO8Vu UvcWJhXZOeNfxNXqMEDF+7ZbG4T8H2agXRtokUhs6BkR1P2QmLirhzwnP4wYxDFRiLXX uwbk3uQPR7MTMgd6i+sYQ0CN2HEBqUy40NqhvSGBIyc3xJ4NHnjlDR1x0j7rKg4WON2B OhSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=m2k2ICcLxOkYpmYgks+aaOogS3Up01M0dUCMOamz1iE=; b=0csavU30lSUs6bqayrTAVPb8NM3CLUsigFBd0qBC2ijRmiRi7Zf0kD+pFo++vm2nw7 xy21JZSNypLXQ8Lhbsx4tmSRPlV7d+X1yoNl/B0nRGjX5PUwGA57UZLfqpXT3ecQTVTi h36k+nzJsbSG8B1ozVwL891UFYtSxcY3liU9etQSacDgwq08NBRIBAhwsIQ37SdLcipP 6B+Ajs97kkd1EoF4jIOEiew4FMCFEQS6esHo/9ei1Qs+jde3rb+E+G1VW6z0ZjMIGyKk 6aojt7mNCtZxINqupxlwCoW80mq/6KVYvQg9fNVSdXQ+0nfVCClpJ+bzj+rErcYPhNLH hD0w== X-Gm-Message-State: AOAM530kkeACmT2kfBJofvHFLrhfLveqodZRgvEEaXwMkxYhQ6f6TOiL b95/KrWtQ9nVlZKeWxVQjxd3IdLvcreIUX+UImVdMg== X-Received: by 2002:a17:90b:1e49:: with SMTP id pi9mr1444333pjb.220.1636052423615; Thu, 04 Nov 2021 12:00:23 -0700 (PDT) MIME-Version: 1.0 References: <2102a2e6-c543-2557-28a2-8b0bdc470855@oracle.com> <20211028002451.GB2237511@magnolia> <6d21ece1-0201-54f2-ec5a-ae2f873d46a3@oracle.com> In-Reply-To: <6d21ece1-0201-54f2-ec5a-ae2f873d46a3@oracle.com> From: Dan Williams Date: Thu, 4 Nov 2021 12:00:12 -0700 Message-ID: Subject: Re: [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag To: Jane Chu Cc: Christoph Hellwig , "Darrick J. Wong" , "david@fromorbit.com" , "vishal.l.verma@intel.com" , "dave.jiang@intel.com" , "agk@redhat.com" , "snitzer@redhat.com" , "dm-devel@redhat.com" , "ira.weiny@intel.com" , "willy@infradead.org" , "vgoyal@redhat.com" , "linux-fsdevel@vger.kernel.org" , "nvdimm@lists.linux.dev" , "linux-kernel@vger.kernel.org" , "linux-xfs@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 4, 2021 at 11:34 AM Jane Chu wrote: > > Thanks for the enlightening discussion here, it's so helpful! > > Please allow me to recap what I've caught up so far - > > 1. recovery write at page boundary due to NP setting in poisoned > page to prevent undesirable prefetching > 2. single interface to perform 3 tasks: > { clear-poison, update error-list, write } > such as an API in pmem driver. > For CPUs that support MOVEDIR64B, the 'clear-poison' and 'write' > task can be combined (would need something different from the > existing _copy_mcsafe though) and 'update error-list' follows > closely behind; > For CPUs that rely on firmware call to clear posion, the existing > pmem_clear_poison() can be used, followed by the 'write' task. > 3. if user isn't given RWF_RECOVERY_FLAG flag, then dax recovery > would be automatic for a write if range is page aligned; > otherwise, the write fails with EIO as usual. > Also, user mustn't have punched out the poisoned page in which > case poison repairing will be a lot more complicated. > 4. desirable to fetch as much data as possible from a poisoned range. > > If this understanding is in the right direction, then I'd like to > propose below changes to > dax_direct_access(), dax_copy_to/from_iter(), pmem_copy_to/from_iter() > and the dm layer copy_to/from_iter, dax_iomap_iter(). > > 1. dax_iomap_iter() rely on dax_direct_access() to decide whether there > is likely media error: if the API without DAX_F_RECOVERY returns > -EIO, then switch to recovery-read/write code. In recovery code, > supply DAX_F_RECOVERY to dax_direct_access() in order to obtain > 'kaddr', and then call dax_copy_to/from_iter() with DAX_F_RECOVERY. I like it. It allows for an atomic write+clear implementation on capable platforms and coordinates with potentially unmapped pages. The best of both worlds from the dax_clear_poison() proposal and my "take a fault and do a slow-path copy". > 2. the _copy_to/from_iter implementation would be largely the same > as in my recent patch, but some changes in Christoph's > 'dax-devirtualize' maybe kept, such as DAX_F_VIRTUAL, obviously > virtual devices don't have the ability to clear poison, so no need > to complicate them. And this also means that not every endpoint > dax device has to provide dax_op.copy_to/from_iter, they may use the > default. Did I miss this series or are you talking about this one? https://lore.kernel.org/all/20211018044054.1779424-1-hch@lst.de/ > I'm not sure about nova and others, if they use different 'write' other > than via iomap, does that mean there will be need for a new set of > dax_op for their read/write? No, they're out-of-tree they'll adjust to the same interface that xfs and ext4 are using when/if they go upstream. > the 3-in-1 binding would always be > required though. Maybe that'll be an ongoing discussion? Yeah, let's cross that bridge when we come to it. > Comments? Suggestions? It sounds great to me!