Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp1014106pxb; Wed, 27 Oct 2021 17:26:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwd20YjBO8U6/SVHHJn8yG41gTbLVKKicorfFfgmbxFVJgchAnDGBl14CKRS19Ez5YI+WTK X-Received: by 2002:a17:903:2312:b0:141:6a7b:f5e with SMTP id d18-20020a170903231200b001416a7b0f5emr906654plh.57.1635380775530; Wed, 27 Oct 2021 17:26:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635380775; cv=none; d=google.com; s=arc-20160816; b=efpflZgFIvxtGtXAivPHjqf2Ej4O/WlnxajMrug+fAqJkB1qa+BGmSEG5JxDTP4hmQ 0K6vP/XImugNwHCeKf5S2tYMQVBrx+yrB06dmFQePgp5zvC5GI9iIGPuG1rInASFXL7y QVBHTu8NX7GvZiJQJMh/zt7kkC0uot7hGTDM5sOv4kySm3Yxwfv5+vS2BgcKX0vp1k7D VuVGV4uzc4/ZlvghK6ftUcUpVzqyaa6Cf8f2mngekHG4dJnz0RawdiEDmBk786QzCZ1b PGvp5YapA4UX7rZJkm5IAl/X5nu9WU5B6oorFpu/5FJzhYl1mb8ZqOfvNxtTjIF5YmT/ zXgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=ktnV0eYaX3LEsj0idOftEXzFUsD9Y6Ot9VrQ+TGhUnA=; b=N5FI6FslYtjssumOwoN3o8lpr3ZfHl3WIs62VvAL8EjT+0102pzPBWx0LNNUPtuQW/ Lis1NSoGU6IuWtDn851I8H9aGPwdNqELzW1yGo8GX6/w2CyBFBKQmlrEnvlCVwcMDHu5 2EHyuFVztTPSaU1Ne0Fsnrp4HQM5/MEUU+uwBCSZDSvPX66zEQbQKQVlEE5wJmaxIcn0 o8EauDlPcvWmdsRnQAfVSGRXs+pFeY5q3QnRc+HwaSaU9YZ9EWmYH3LhlHONgdmdTbvU ln0p40IXItQHnowQ1lnFUs1tkSQdkoemu8pGlw7KQV4D4uzTMbKBFMH4TFR6V4w9QBlv nKVw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=LLiu9hL0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k3si2114341plk.357.2021.10.27.17.26.01; Wed, 27 Oct 2021 17:26:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=LLiu9hL0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229537AbhJ1A1S (ORCPT + 99 others); Wed, 27 Oct 2021 20:27:18 -0400 Received: from mail.kernel.org ([198.145.29.99]:33276 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229469AbhJ1A1R (ORCPT ); Wed, 27 Oct 2021 20:27:17 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id AF4FB60F9B; Thu, 28 Oct 2021 00:24:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1635380691; bh=FiS4PSp9dq5XSaCsWhfnvL9XlrVWGShYyfrLJTdu9vE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=LLiu9hL0Nnfj0wrmhGnnxL5+8GTJJbXO4vG40Y4IEyaBbUa88fsVA0M+rE2wPP55K VLA6cRwCiUhxECHQOahortB3Yo2VP+oFbMEG0uElQs81v4jqhfp8kRYJAeUnnA16zJ b+HS7hPoUelooieVdCl2Nx6xZkxw/UW2E15yHmAfP3nhWY0ybavp+pTPnAHxX+G4UW F0ZV+G9Tqf5vDpTUtdu8DAFnC69xwsGEyN/qag6omHGpOOd3XLSLw822wOCWuz45t8 xD8mrOpLmMT4K4s0abdIP5Oo4QMzSeb0EDc/7elPxTr+C57KbXjFo3PtV8fNtBQS9d bD3r72wnTUFYA== Date: Wed, 27 Oct 2021 17:24:51 -0700 From: "Darrick J. Wong" To: Christoph Hellwig Cc: Jane Chu , "david@fromorbit.com" , "dan.j.williams@intel.com" , "vishal.l.verma@intel.com" , "dave.jiang@intel.com" , "agk@redhat.com" , "snitzer@redhat.com" , "dm-devel@redhat.com" , "ira.weiny@intel.com" , "willy@infradead.org" , "vgoyal@redhat.com" , "linux-fsdevel@vger.kernel.org" , "nvdimm@lists.linux.dev" , "linux-kernel@vger.kernel.org" , "linux-xfs@vger.kernel.org" Subject: Re: [dm-devel] [PATCH 0/6] dax poison recovery with RWF_RECOVERY_DATA flag Message-ID: <20211028002451.GB2237511@magnolia> References: <20211021001059.438843-1-jane.chu@oracle.com> <2102a2e6-c543-2557-28a2-8b0bdc470855@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 26, 2021 at 11:49:59PM -0700, Christoph Hellwig wrote: > On Fri, Oct 22, 2021 at 08:52:55PM +0000, Jane Chu wrote: > > Thanks - I try to be honest. As far as I can tell, the argument > > about the flag is a philosophical argument between two views. > > One view assumes design based on perfect hardware, and media error > > belongs to the category of brokenness. Another view sees media > > error as a build-in hardware component and make design to include > > dealing with such errors. > > No, I don't think so. Bit errors do happen in all media, which is > why devices are built to handle them. It is just the Intel-style > pmem interface to handle them which is completely broken. Yeah, I agree, this takes me back to learning how to use DISKEDIT to work around a hole punched in a file (with a pen!) in the 1980s... ...so would you happen to know if anyone's working on solving this problem for us by putting the memory controller in charge of dealing with media errors? > > errors in mind from start. I guess I'm trying to articulate why > > it is acceptable to include the RWF_DATA_RECOVERY flag to the > > existing RWF_ flags. - this way, pwritev2 remain fast on fast path, > > and its slow path (w/ error clearing) is faster than other alternative. > > Other alternative being 1 system call to clear the poison, and > > another system call to run the fast pwrite for recovery, what > > happens if something happened in between? > > Well, my point is doing recovery from bit errors is by definition not > the fast path. Which is why I'd rather keep it away from the pmem > read/write fast path, which also happens to be the (much more important) > non-pmem read/write path. The trouble is, we really /do/ want to be able to (re)write the failed area, and we probably want to try to read whatever we can. Those are reads and writes, not {pre,f}allocation activities. This is where Dave and I arrived at a month ago. Unless you'd be ok with a second IO path for recovery where we're allowed to be slow? That would probably have the same user interface flag, just a different path into the pmem driver. Ha, how about a int fd2 = recoveryfd(fd); call where you'd get whatever speshul options (retry raid mirrors! scrape the film off the disk if you have to!) you want that can take forever, leaving the fast paths alone? (Ok, that wasn't entirely serious...) --D