Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp3875649pxb; Mon, 1 Nov 2021 23:22:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyEosIskKnme/MqZQ87dKD/Gyc2chjg7RObabb9Ie4m2U3XiDcIXi6MN5oLdAUi40wIB/M6 X-Received: by 2002:a05:6602:168e:: with SMTP id s14mr1196974iow.151.1635834141237; Mon, 01 Nov 2021 23:22:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635834141; cv=none; d=google.com; s=arc-20160816; b=0Dqu3WS2Fzfss4tLKcqTW3O0rIHDc1CLalWHXJxDjvfbj/kFw0xDNsN0MYDGPq5ygA PQthTnwSetUSNvnRcAQSocHSRuoKl31G4HmpaPSU1/Zn0FTI+f5QSwX7vxmY6I4oPoYP ZVQ3CXRWz82FbSY39SeyGmuMAT2p4gs0ZWFAy+soW3Gr11DUHsYmuLbbOtCik3tcTabJ wdqPhpKnsmyxXCqzn2K/A9MucjBHyky22R9VH2uxPp13ljvWOjm3zvSxDN6OXuaZoYcx geTBhR0gGQzMLzD7MXSdfFeKVReHYcCpCXiMlmD0zdU2AWuboUhywaLwEylCo7nuETFD uHtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=KdIhbeTBoLRUYR2IAPFmlM4VoN/TtUZV3//iIpzahwM=; b=Z03iZtQu5fPruTVZ3VHMrluZCOk4T6vErdP7St3rgLbv+FT1MYI5zmCmNDowbSDi7z xbsfLWOgTZbQi4B77ieOdLp4ce6SKK45k0O1to+/T79rDM+OKjSQcH8ivt6bT1Sq6rPY CI1p6e/yOnsedzhCKy29gR65xK6E+7gokvXYwNFCbTwfuDjt4JbLPEQKjDdjm8QN7G74 KUZVl/vcQFpBWXXIRXpZ55bjxZhLLsSC5sInMbIsk8Uipe6UvYxmcXduL1fpGQvBZ3QN 5uVuhHuKSVZzsEXrBZ0pVCp6U0e7echpb1NcfPs/5WqMgHpC2oI+bdPLTW+604XuERvK hrSA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=gEliM4Ly; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f11si6644059ila.39.2021.11.01.23.21.56; Mon, 01 Nov 2021 23:22:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=gEliM4Ly; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229616AbhKBGV7 (ORCPT + 99 others); Tue, 2 Nov 2021 02:21:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229497AbhKBGV7 (ORCPT ); Tue, 2 Nov 2021 02:21:59 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:e::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A2654C061714; Mon, 1 Nov 2021 23:19:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=KdIhbeTBoLRUYR2IAPFmlM4VoN/TtUZV3//iIpzahwM=; b=gEliM4LypGip4U+QMQe7n5WPg+ 8eF6rmeTZnEHXk8zsn5+LTsHo0ig9kZOyN4e+FFEa6YZuPTo4iSXZQcldSvGmvKpllSlorZqwbgkT HKffEXvbAOcbwfsYA2+uU33ejwxMPoLiSRNY+d4hmrEY3vg4fS1/7E6nKilyDH/PiQvr7n9mSYnNs jikQKAvqUFNTxA184hcFBlOilNj0XyLf3a0Uy2RsW8ZONVAYEDmx+GxVHvRm0MqozQH3lYB1ZPVt5 LxffLGlx8Gc5jttGyxDUerEAc3tqqv3LLUkq8lU5Hg3zMTTpnUOVYjoNmero8H6QeZnp+UkWZVzeg THCpn1ug==; Received: from hch by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1mhn8K-000eBZ-KX; Tue, 02 Nov 2021 06:18:56 +0000 Date: Mon, 1 Nov 2021 23:18:56 -0700 From: Christoph Hellwig To: "Darrick J. Wong" Cc: Christoph Hellwig , Jane Chu , "david@fromorbit.com" , "dan.j.williams@intel.com" , "vishal.l.verma@intel.com" , "dave.jiang@intel.com" , "agk@redhat.com" , "snitzer@redhat.com" , "dm-devel@redhat.com" , "ira.weiny@intel.com" , "willy@infradead.org" , "vgoyal@redhat.com" , "linux-fsdevel@vger.kernel.org" , "nvdimm@lists.linux.dev" , "linux-kernel@vger.kernel.org" , "linux-xfs@vger.kernel.org" Subject: Re: [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag Message-ID: References: <20211021001059.438843-1-jane.chu@oracle.com> <2102a2e6-c543-2557-28a2-8b0bdc470855@oracle.com> <20211028002451.GB2237511@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211028002451.GB2237511@magnolia> X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 27, 2021 at 05:24:51PM -0700, Darrick J. Wong wrote: > ...so would you happen to know if anyone's working on solving this > problem for us by putting the memory controller in charge of dealing > with media errors? The only one who could know is Intel.. > The trouble is, we really /do/ want to be able to (re)write the failed > area, and we probably want to try to read whatever we can. Those are > reads and writes, not {pre,f}allocation activities. This is where Dave > and I arrived at a month ago. > > Unless you'd be ok with a second IO path for recovery where we're > allowed to be slow? That would probably have the same user interface > flag, just a different path into the pmem driver. Which is fine with me. If you look at the API here we do have the RWF_ API, which them maps to the IOMAP API, which maps to the DAX_ API which then gets special casing over three methods. And while Pavel pointed out that he and Jens are now optimizing for single branches like this. I think this actually is silly and it is not my point. The point is that the DAX in-kernel API is a mess, and before we make it even worse we need to sort it first. What is directly relevant here is that the copy_from_iter and copy_to_iter APIs do not make sense. Most of the DAX API is based around getting a memory mapping using ->direct_access, it is just the read/write path which is a slow path that actually uses this. I have a very WIP patch series to try to sort this out here: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dax-devirtualize But back to this series. The basic DAX model is that the callers gets a memory mapping an just works on that, maybe calling a sync after a write in a few cases. So any kind of recovery really needs to be able to work with that model as going forward the copy_to/from_iter path will be used less and less. i.e. file systems can and should use direct_access directly instead of using the block layer implementation in the pmem driver. As an example the dm-writecache driver, the pending bcache nvdimm support and the (horribly and out of tree) nova file systems won't even use this path. We need to find a way to support recovery for them. And overloading it over the read/write path which is not the main path for DAX, but the absolutely fast path for 99% of the kernel users is a horrible idea. So how can we work around the horrible nvdimm design for data recovery in a way that: a) actually works with the intended direct memory map use case b) doesn't really affect the normal kernel too much ?