Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1538502pxb; Fri, 26 Feb 2021 13:31:47 -0800 (PST) X-Google-Smtp-Source: ABdhPJxnMvAec+Yyw6eiE7k62zZeVDLmYxBRb1rSiMVaJcV7KabMG3M+83lXNaeNrggPNFVG8fsi X-Received: by 2002:aa7:c6d2:: with SMTP id b18mr5592656eds.183.1614375106957; Fri, 26 Feb 2021 13:31:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614375106; cv=none; d=google.com; s=arc-20160816; b=q1g0zXkq0GYhod7+0NXDqGV+NFP6oe6JOpSLEh2doFpTsKydE88sQPKytx4SW52GiI C6w8mMR/CUn6/Lo1m7j2PKxJ14lwsqhiTgmUOVhDjHMeQZ+zYXKDS9hcKJLw9bRDDdW8 C2oYUgEzt9L0RE1Vkw3xKjSqed9NDIuBGihUhTTGZ2WFDEUIXfcjBmQTH/Pje1KJt+XV aQPNHNv91qMgZcPUvAiqbJd+pZJ2L1u2Pasn8vkJZWWWC9EbWAB5CbZiienhtoJztkE9 byjETAn5Ppdrzih3vZrnAWVBQKVtx9HYtPaX7j1oU2aphR049KYYklUA0s/rUDl0EFd3 OX5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=YKU5FLr1UJ94RXLa/ZoSZ38U1cU1j8lMkYPS5zE9VJs=; b=hnoDFYZbKsLpAyBmC4TkIhIy2ropPYB0CmtcI2Y8LddcOG5bajlqWs198JV7+QGOp6 b/YZjo5k+AW1HyySDSDsWIcjCRty9saXVXFp/IT1Lnk3/Hfb7+zQ0S8KYYOTcMS6fjxP IEPIWVFi7uKsFdzLiNh/oDsq8n/iAM7txTfy5sF+QHwn8r2+kaJlPUMN6hA0m08Ce28z 6sp1qGOUWnlN2mAJPkrgI/275tGF6qJOD9FwxTGNWQ8qRySsHHRtPAJMzth/xn1qQ5OH /xH3MZclF+owzYp8ufkb1zZ9uFvl2NHQIEeIgn4Mu8IyJ8D+CX7nioqzXCAqCDGXkSlM kshg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i6si3964886ejk.722.2021.02.26.13.31.24; Fri, 26 Feb 2021 13:31:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230230AbhBZV2i (ORCPT + 99 others); Fri, 26 Feb 2021 16:28:38 -0500 Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:49311 "EHLO mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229550AbhBZV2g (ORCPT ); Fri, 26 Feb 2021 16:28:36 -0500 Received: from dread.disaster.area (pa49-179-130-210.pa.nsw.optusnet.com.au [49.179.130.210]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 24F8410414BD; Sat, 27 Feb 2021 08:27:50 +1100 (AEDT) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1lFkeK-005pca-R8; Sat, 27 Feb 2021 08:27:48 +1100 Date: Sat, 27 Feb 2021 08:27:48 +1100 From: Dave Chinner To: Dan Williams Cc: "Darrick J. Wong" , "ruansy.fnst@fujitsu.com" , "linux-kernel@vger.kernel.org" , "linux-xfs@vger.kernel.org" , "linux-nvdimm@lists.01.org" , "linux-fsdevel@vger.kernel.org" , "darrick.wong@oracle.com" , "willy@infradead.org" , "jack@suse.cz" , "viro@zeniv.linux.org.uk" , "linux-btrfs@vger.kernel.org" , "ocfs2-devel@oss.oracle.com" , "hch@lst.de" , "rgoldwyn@suse.de" , "y-goto@fujitsu.com" , "qi.fuli@fujitsu.com" , "fnstml-iaas@cn.fujitsu.com" Subject: Re: Question about the "EXPERIMENTAL" tag for dax in XFS Message-ID: <20210226212748.GY4662@dread.disaster.area> References: <20210226002030.653855-1-ruansy.fnst@fujitsu.com> <20210226190454.GD7272@magnolia> <20210226205126.GX4662@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=YKPhNiOx c=1 sm=1 tr=0 cx=a_idp_d a=JD06eNgDs9tuHP7JIKoLzw==:117 a=JD06eNgDs9tuHP7JIKoLzw==:17 a=kj9zAlcOel0A:10 a=qa6Q16uM49sA:10 a=7-415B0cAAAA:8 a=VwQbUJbxAAAA:8 a=omOdbC7AAAAA:8 a=pGLkceISAAAA:8 a=_t27rb478EYBD0RmCmEA:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 a=AjGcO6oz07-iQ99wixmX:22 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 26, 2021 at 12:59:53PM -0800, Dan Williams wrote: > On Fri, Feb 26, 2021 at 12:51 PM Dave Chinner wrote: > > > > On Fri, Feb 26, 2021 at 11:24:53AM -0800, Dan Williams wrote: > > > On Fri, Feb 26, 2021 at 11:05 AM Darrick J. Wong wrote: > > > > > > > > On Fri, Feb 26, 2021 at 09:45:45AM +0000, ruansy.fnst@fujitsu.com wrote: > > > > > Hi, guys > > > > > > > > > > Beside this patchset, I'd like to confirm something about the > > > > > "EXPERIMENTAL" tag for dax in XFS. > > > > > > > > > > In XFS, the "EXPERIMENTAL" tag, which is reported in waring message > > > > > when we mount a pmem device with dax option, has been existed for a > > > > > while. It's a bit annoying when using fsdax feature. So, my initial > > > > > intention was to remove this tag. And I started to find out and solve > > > > > the problems which prevent it from being removed. > > > > > > > > > > As is talked before, there are 3 main problems. The first one is "dax > > > > > semantics", which has been resolved. The rest two are "RMAP for > > > > > fsdax" and "support dax reflink for filesystem", which I have been > > > > > working on. > > > > > > > > > > > > > > > > > So, what I want to confirm is: does it means that we can remove the > > > > > "EXPERIMENTAL" tag when the rest two problem are solved? > > > > > > > > Yes. I'd keep the experimental tag for a cycle or two to make sure that > > > > nothing new pops up, but otherwise the two patchsets you've sent close > > > > those two big remaining gaps. Thank you for working on this! > > > > > > > > > Or maybe there are other important problems need to be fixed before > > > > > removing it? If there are, could you please show me that? > > > > > > > > That remains to be seen through QA/validation, but I think that's it. > > > > > > > > Granted, I still have to read through the two patchsets... > > > > > > I've been meaning to circle back here as well. > > > > > > My immediate concern is the issue Jason recently highlighted [1] with > > > respect to invalidating all dax mappings when / if the device is > > > ripped out from underneath the fs. I don't think that will collide > > > with Ruan's implementation, but it does need new communication from > > > driver to fs about removal events. > > > > > > [1]: http://lore.kernel.org/r/CAPcyv4i+PZhYZiePf2PaH0dT5jDfkmkDX-3usQy1fAhf6LPyfw@mail.gmail.com > > > > Oh, yay. > > > > The XFS shutdown code is centred around preventing new IO from being > > issued - we don't actually do anything about DAX mappings because, > > well, I don't think anyone on the filesystem side thought they had > > to do anything special if pmem went away from under it. > > > > My understanding -was- that the pmem removal invalidates > > all the ptes currently mapped into CPU page tables that point at > > the dax device across the system. THe vmas that manage these > > mappings are not really something the filesystem really manages, > > but a function of the mm subsystem. What the filesystem cares about > > is that it gets page faults triggered when a change of state occurs > > so that it can remap the page to it's backing store correctly. > > > > IOWs, all the mm subsystem needs to when pmem goes away is clear the > > CPU ptes, because then when then when userspace tries to access the > > mapped DAX pages we get a new page fault. In processing the fault, the > > filesystem will try to get direct access to the pmem from the block > > device. This will get an ENODEV error from the block device because > > because the backing store (pmem) has been unplugged and is no longer > > there... > > > > AFAICT, as long as pmem removal invalidates all the active ptes that > > point at the pmem being removed, the filesystem doesn't need to > > care about device removal at all, DAX or no DAX... > > How would the pmem removal do that without walking all the active > inodes in the fs at the time of shutdown and call > unmap_mapping_range(inode->i_mapping, 0, 0, 1)? Which then immediately ends up back at the vmas that manage the ptes to unmap them. Isn't finding the vma(s) that map a specific memory range exactly what the rmap code in the mm subsystem is supposed to address? Cheers, Dave. -- Dave Chinner david@fromorbit.com