Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp4448815pxb; Tue, 2 Nov 2021 09:53:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw0ue5/ZAYkfrGtxXrgsuox6nQIYb+v+2t4vjBCbC4O7tN48Kb0EuTsWi7nFbKU6XTg5Idd X-Received: by 2002:a05:6638:1494:: with SMTP id j20mr28888049jak.58.1635872035155; Tue, 02 Nov 2021 09:53:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635872035; cv=none; d=google.com; s=arc-20160816; b=W6J6Q7PfwgE1nZ2IiaA3bRAlV5xqVo60hzTxfbKA+xdX4tVTYscj5KQXfPz3XBc1jk 9jofT47P3eRmX0vKrJCH1h3x88FPJeNiFEXeL9LYSDAwZsvG6XS1pMx7rl9q8Xzp9ULA WtPQu4fWd6lC7ZvpLu1+9I6a/Ea11GnCn1Eze/2n1QzmKVX6QbnWBitWY21IX2MTHlpb HLQAmmQztN9zXLKIsxraMelDMDe1bD3bcPsMmSkE2jBjjfDAEihWfHaFyAuze+U/Clud bfC3xnYxsctG1Y8ifussNJ3XYCU2Uho9ITMq/8Vr5rlhS41ZEbu3JxKzWZGnFifyb7RX 8skQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=pIALiqswCfSwk+/MEarCA9yLgBnLpJVqWRaUmM6jp5w=; b=tK1+oUnp7mhCJ1VDwMTsxFBfH79JId1qgLv+l4jK1DVgjCbKak8Mj2RTGzdCnSZHAd 5nULc1IvMf19Kxpi4Ie6dQ3vuQn01G3HEnJmvb65/NF8GHptFtB5Bj6dBaEvCSaz2mk9 neXU38v2AGpab+KRNKo6Yw0tB57YZnrzb4DkRpD6lW0fyaKGo2ZY7BZFiqB1qNtsPbUe +0BI017gK3QTE87ooottXmQ0dh9Vz5XpYUUP5GOXzzg9lpD50ExGVhiTtTcQK5IdXPWI k0N5cgDf9aH7mhDSJy5cJZX4WvK4wCc2kds3iE4nbhRY57FZS/pU9Kn+eRQpCFhg14BG +8VQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b=YDMPEv8W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f9si4005845iow.105.2021.11.02.09.53.40; Tue, 02 Nov 2021 09:53:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b=YDMPEv8W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235182AbhKBQwk (ORCPT + 99 others); Tue, 2 Nov 2021 12:52:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45716 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234682AbhKBQv5 (ORCPT ); Tue, 2 Nov 2021 12:51:57 -0400 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABAFFC061193 for ; Tue, 2 Nov 2021 09:12:59 -0700 (PDT) Received: by mail-pl1-x633.google.com with SMTP id k4so7195699plx.8 for ; Tue, 02 Nov 2021 09:12:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=pIALiqswCfSwk+/MEarCA9yLgBnLpJVqWRaUmM6jp5w=; b=YDMPEv8WJX5C5t8J+KvkqjymPpcW6bISPU13EEZn9i3ppfDrXqCBOkIiZ+qG6d/0W9 oMQLSczBNDPwvPGAeBCQCFCpFym+2LBzFX/2J1emFCX8/PiT0sSrN/EKaU5MKnn4a0Kp 2TPNyYUqhhBEvMdCFQyeipttIkdwn8Tm3bBSASjQz/q8pbYMW+PXgXDYQ9zDnbsNtNKj tEA+auAFo01awfsN/txIuLZzv2x3qAeCbidvO2qL7Rnrtfwe8Rz+IyyDyOLljZkYawzP JPuyPodJwGTYZZ1N8kC66Kge83QF/+vX/h8zKe7PGTocddZq4FaAT4hFeKM3Pxy/O7TW pClQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=pIALiqswCfSwk+/MEarCA9yLgBnLpJVqWRaUmM6jp5w=; b=GsbWzoSZEIapHZ+/Jt+GRNXL2rvG2Vo5UaNbl8RrKjYh/84S0tH5aWLjOsIBZpPmaQ xm6kPqHNIt4rqquiQ7TPxP9MV7KSWYm4ZLzbVgjKqPRNck5jgiKNi246A2sJkY0MZHwf MdQGXA1u4NkH/AYCejtkF7qFT9AcO9SmdRsukhim8pm5GlZnJ95E28gj7wURYjzE0tVR L/T+uTAk7+DNysmgmWL8j6/Fej3xOGoAZAAMe2h3h0D4azy6b/EokngtUwo2cU3pHdvn d5X3Na/dMBPztoJjwZwzcJwvh1YwSjXkTHOC51hgO5InILBBOhBwG8CVVtSjRhYYXOZr na3w== X-Gm-Message-State: AOAM531j9oFJRBvV219sBbzk/ZXZfISg0HQSKUVYqrNOEZ2I+oa8a/qo 3rRNGqvThtjq5h1TFo1TVXD7hWx+GGUEh4vm+IYdQQ== X-Received: by 2002:a17:902:ab50:b0:13f:4c70:9322 with SMTP id ij16-20020a170902ab5000b0013f4c709322mr32351492plb.89.1635869579139; Tue, 02 Nov 2021 09:12:59 -0700 (PDT) MIME-Version: 1.0 References: <20211021001059.438843-1-jane.chu@oracle.com> <2102a2e6-c543-2557-28a2-8b0bdc470855@oracle.com> <20211028002451.GB2237511@magnolia> In-Reply-To: <20211028002451.GB2237511@magnolia> From: Dan Williams Date: Tue, 2 Nov 2021 09:12:48 -0700 Message-ID: Subject: Re: [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag To: "Darrick J. Wong" Cc: Christoph Hellwig , Jane Chu , "david@fromorbit.com" , "vishal.l.verma@intel.com" , "dave.jiang@intel.com" , "agk@redhat.com" , "snitzer@redhat.com" , "dm-devel@redhat.com" , "ira.weiny@intel.com" , "willy@infradead.org" , "vgoyal@redhat.com" , "linux-fsdevel@vger.kernel.org" , "nvdimm@lists.linux.dev" , "linux-kernel@vger.kernel.org" , "linux-xfs@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 27, 2021 at 5:25 PM Darrick J. Wong wrote: > > On Tue, Oct 26, 2021 at 11:49:59PM -0700, Christoph Hellwig wrote: > > On Fri, Oct 22, 2021 at 08:52:55PM +0000, Jane Chu wrote: > > > Thanks - I try to be honest. As far as I can tell, the argument > > > about the flag is a philosophical argument between two views. > > > One view assumes design based on perfect hardware, and media error > > > belongs to the category of brokenness. Another view sees media > > > error as a build-in hardware component and make design to include > > > dealing with such errors. > > > > No, I don't think so. Bit errors do happen in all media, which is > > why devices are built to handle them. It is just the Intel-style > > pmem interface to handle them which is completely broken. > > Yeah, I agree, this takes me back to learning how to use DISKEDIT to > work around a hole punched in a file (with a pen!) in the 1980s... > > ...so would you happen to know if anyone's working on solving this > problem for us by putting the memory controller in charge of dealing > with media errors? What are you guys going on about? ECC memory corrects single-bit errors in the background, multi-bit errors cause the memory controller to signal that data is gone. This is how ECC memory has worked since forever. Typically the kernel's memory-failure path is just throwing away pages that signal data loss. Throwing away pmem pages is harder because unlike DRAM the physical address of the page matters to upper layers. > > > > errors in mind from start. I guess I'm trying to articulate why > > > it is acceptable to include the RWF_DATA_RECOVERY flag to the > > > existing RWF_ flags. - this way, pwritev2 remain fast on fast path, > > > and its slow path (w/ error clearing) is faster than other alternative. > > > Other alternative being 1 system call to clear the poison, and > > > another system call to run the fast pwrite for recovery, what > > > happens if something happened in between? > > > > Well, my point is doing recovery from bit errors is by definition not > > the fast path. Which is why I'd rather keep it away from the pmem > > read/write fast path, which also happens to be the (much more important) > > non-pmem read/write path. > > The trouble is, we really /do/ want to be able to (re)write the failed > area, and we probably want to try to read whatever we can. Those are > reads and writes, not {pre,f}allocation activities. This is where Dave > and I arrived at a month ago. > > Unless you'd be ok with a second IO path for recovery where we're > allowed to be slow? That would probably have the same user interface > flag, just a different path into the pmem driver. > > Ha, how about a int fd2 = recoveryfd(fd); call where you'd get whatever > speshul options (retry raid mirrors! scrape the film off the disk if > you have to!) you want that can take forever, leaving the fast paths > alone? I am still failing to see the technical argument for why RWF_RECOVER_DATA significantly impacts the fast path, and why you think this is somehow specific to pmem. In fact the pmem effort is doing the responsible thing and trying to plumb this path while other storage drivers just seem to be pretending that memory errors never happen.