Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp4308456pxb; Tue, 2 Mar 2021 11:40:03 -0800 (PST) X-Google-Smtp-Source: ABdhPJx3b3rVo8WB5ChaucKgMG2HOOD3obu00v8yherZeDm2003WysjNTnFAhMx4hOahbcxXoDUu X-Received: by 2002:aa7:db53:: with SMTP id n19mr3257262edt.330.1614714003495; Tue, 02 Mar 2021 11:40:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614714003; cv=none; d=google.com; s=arc-20160816; b=o/JKFhxcwM6PMdsqsUxWkh5qUdv/48Izul37s5gdV+pxAzjIE9mkJTBf5VTMXdKNbD 3E+ykFlLYmcipSJB8IhhyBb/F/gs1oxNyUzIPDaop6hfUrR1tWbFsSCNdN+QF0gP3WK4 Nt5YvJcftKnwknVEUCHSsin9NmlqER+JrkXQrYKKcU+w3hQhLOptaAFEwsntP4O6YrYl FQovfm1BjmnxBV4+a1zub08sYgnr8ZyhJA/trLeNXgoyLZ0LXEM720oOeeet1MHGcN6D wQluiZwFAfQirE0lnaOL8BtJoHzdbWXqUdHCoSdp+PMvEcuxd4XSiYjqKvz1rHW62lEu /1eA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=mNZE31gswus170slngXjiwYWAkhOfZUEI/tWqeY6+Ew=; b=KPPQp2UJI7BApKeS6cB0IQi8i9OSFEgclW25NDy+cXcnHkhq41w7fOiL8CgDQygMq4 iPTorhdg3BIUasPjWnaSgpszyEKw4Z6zL1vopSvRWX0Hqn5TQ4cmT+ecpTKO8zBDrbtW OYkLBNzIZ9M3m5lWruNoEC/LRWMyvlcoki9ZuubT0tPDV05mUgrfvmEW87lDPFn2wOHd +Y1UC69jXSUG7hNCq0A4q5PsGSDdX5c688qO8Hx4etDERvCt8ey/rUb6TRxw9UOrsvTh M00Od79SqXeVW/J5dolDnhkntbxsZtjJ994NDPL9PNjvMNkp7Zl5DFYqacncqV2gHlbr 65Fg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b="Q/OzweNB"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gx14si7164732ejc.478.2021.03.02.11.39.39; Tue, 02 Mar 2021 11:40:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b="Q/OzweNB"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1836316AbhCBHAh (ORCPT + 99 others); Tue, 2 Mar 2021 02:00:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347637AbhCBFvw (ORCPT ); Tue, 2 Mar 2021 00:51:52 -0500 Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC56DC061756 for ; Mon, 1 Mar 2021 21:50:30 -0800 (PST) Received: by mail-ej1-x630.google.com with SMTP id dx17so5349426ejb.2 for ; Mon, 01 Mar 2021 21:50:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mNZE31gswus170slngXjiwYWAkhOfZUEI/tWqeY6+Ew=; b=Q/OzweNBKTN/p31Gfi5+pQqOQLOS8ZN89cCgWPq5hATdbuSvGCLn+xZgiTqHaV8ovA WPKN7gxWkd/+30M9nOOTji8DLoErcoaIiwcYKBsZU/JkB5jhdpz4wr0X2R9eV7Om8Q0N RILeOr2BWoJJHRoqZ4VrOk84X5EAfMP2kLyBknuDlFE1TifyhnA4CM+YMvmSNBdnnmEE gh0wmMjxGG9kl+Zi/KvsWuhE9ss4/c2eFFp6TBVG9D6DSlyPgmcR3v+BtzohWvyCOGjR 5pmqvqEAeig8HYJ40mUttD9ivnt7wJthGN3a7/JqWUToJ2QNAehPFT6Lplucu2vvBVXy 2Bsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mNZE31gswus170slngXjiwYWAkhOfZUEI/tWqeY6+Ew=; b=uV2WalxoCAKvr4XW3xbrKhtU55GhbGioXduIVinZGoWwUwdWSlCSCpD5ORBIyTM2OB G4FJ7mZuQ3VjV01embEBbtVXws2Q101vIO06rZYHAI0UY43SlMj/x0v6b3RzbVDZPv+e jCwabTrXdO65W5au1+IJdqix9oF5zs5FJnm7S55Lav5LE3dsBzrvQY8gt5dQxQBZ9ds1 E/rTZnAAatnZZXkaKBL/5MItccE/58Ijl7fmjdwdpZG5C/jqh9GlXSz/pF+Uwl2XEmRh GnR12I1eyjz26bfXrHPayByWq01L+i8ZZW+7cPYA4vN2UIPXmBK3pbcl1ZrqYl2Ax+K5 1MEg== X-Gm-Message-State: AOAM532dYPHY2vM9rOCSfd1buR2oktDMgHR+7yQ0juhb7ln6nDOHDkIj NF+3fSIaSgZJUvA2VaHZnu3ony9hw6W1jiEjyS7KIA== X-Received: by 2002:a17:906:6088:: with SMTP id t8mr19715072ejj.323.1614664229502; Mon, 01 Mar 2021 21:50:29 -0800 (PST) MIME-Version: 1.0 References: <20210226212748.GY4662@dread.disaster.area> <20210227223611.GZ4662@dread.disaster.area> <20210228223846.GA4662@dread.disaster.area> <20210301224640.GG4662@dread.disaster.area> <20210302024227.GH4662@dread.disaster.area> <20210302053828.GI4662@dread.disaster.area> In-Reply-To: <20210302053828.GI4662@dread.disaster.area> From: Dan Williams Date: Mon, 1 Mar 2021 21:50:21 -0800 Message-ID: Subject: Re: Question about the "EXPERIMENTAL" tag for dax in XFS To: Dave Chinner Cc: "Darrick J. Wong" , "ruansy.fnst@fujitsu.com" , "linux-kernel@vger.kernel.org" , "linux-xfs@vger.kernel.org" , "linux-nvdimm@lists.01.org" , "linux-fsdevel@vger.kernel.org" , "darrick.wong@oracle.com" , "willy@infradead.org" , "jack@suse.cz" , "viro@zeniv.linux.org.uk" , "linux-btrfs@vger.kernel.org" , "ocfs2-devel@oss.oracle.com" , "hch@lst.de" , "rgoldwyn@suse.de" , "y-goto@fujitsu.com" , "qi.fuli@fujitsu.com" , "fnstml-iaas@cn.fujitsu.com" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 1, 2021 at 9:38 PM Dave Chinner wrote: > > On Mon, Mar 01, 2021 at 07:33:28PM -0800, Dan Williams wrote: > > On Mon, Mar 1, 2021 at 6:42 PM Dave Chinner wrote: > > [..] > > > We do not need a DAX specific mechanism to tell us "DAX device > > > gone", we need a generic block device interface that tells us "range > > > of block device is gone". > > > > This is the crux of the disagreement. The block_device is going away > > *and* the dax_device is going away. > > No, that is not the disagreement I have with what you are saying. > You still haven't understand that it's even more basic and generic > than devices going away. At the simplest form, all the filesystem > wants is to be notified of is when *unrecoverable media errors* > occur in the persistent storage that underlies the filesystem. > > The filesystem does not care what that media is build from - PMEM, > flash, corroded spinning disks, MRAM, or any other persistent media > you can think off. It just doesn't matter. > > What we care about is that the contents of a *specific LBA range* no > longer contain *valid data*. IOWs, the data in that range of the > block device has been lost, cannot be retreived and/or cannot be > written to any more. > > PMEM taking a MCE because ECC tripped is a media error because data > is lost and inaccessible until recovery actions are taken. > > MD RAID failing a scrub is a media error and data is lost and > unrecoverable at that layer. > > A device disappearing is a media error because the storage media is > now permanently inaccessible to the higher layers. > > This "media error" categorisation is a fundamental property of > persistent storage and, as such, is a property of the block devices > used to access said persistent storage. > > That's the disagreement here - that you and Christoph are saying > ->corrupted_range is not a block device property because only a > pmem/DAX device currently generates it. > > You both seem to be NACKing a generic interface because it's only > implemented for the first subsystem that needs it. AFAICT, you > either don't understand or are completely ignoring the architectural > need for it to be provided across the rest of the storage stack that > *block device based filesystems depend on*. No I'm NAKing it because it's the wrong interface. See my 'struct badblocks' argument in the reply to Darrick. That 'struct badblocks' infrastructure arose from MD and is shared with PMEM. > > Sure, there might be dax device based fielsystems around the corner. > They just require a different pmem device ->corrupted_range callout > to implement the notification - one that directs to the dax device > rather than the block device. That's simple and trivial to > implement, but such functionaity for DAX devices does not replace > the need for the same generic functionality to be provided across a > *range of different block devices* as required by *block device > based filesystems*. > > And that's fundamentally the problem. XFS is block device based, not > DAX device based. We require errors to be reported through block > device mechanisms. fs-dax does not change this - it is based on pmem > being presented as a primarily as a block device to the block device > based filesystems and only secondarily as a dax device. Hence if it > can be trivially implemented as a block device interface, that's > where it should go, because then all the other block devices that > the filesytem runs on can provide the same functionality for similar > media error events.... Sure, use 'struct badblocks' not struct block_device and block_device_operations. > > > The dax_device removal implies one > > set of actions (direct accessed pfns invalid) the block device removal > > implies another (block layer sector access offline). > > There you go again, saying DAX requires an action, while the block > device notification is a -state change- (i.e. goes offline). There you go reacting to the least generous interpretation of what I said. s/pfns invalid/pfns offline/ > > This is exactly what I said was wrong in my last email. > > > corrupted_range > > is blurring the notification for 2 different failure domains. Look at > > the nascent idea to mount a filesystem on dax sans a block device. > > Look at the existing plumbing for DM to map dax_operations through a > > device stack. > > Ummm, it just maps the direct_access call to the underlying device > and calls it's ->direct_access method. All it's doing is LBA > mapping. That's all it needs to do for ->corrupted_range, too. > I have no clue why you think this is a problem for error > notification... > > > Look at the pushback Ruan got for adding a new > > block_device operation for corrupted_range(). > > one person said "no". That's hardly pushback. Especially as I think > Christoph's objection about this being dax specific functionality > is simply wrong, as per above. It's not wrong when we have a perfectly suitable object for sector based error notification and when we're trying to disentangle 'struct block_device' from 'struct dax_device'.