Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1521586pxb; Fri, 26 Feb 2021 13:02:15 -0800 (PST) X-Google-Smtp-Source: ABdhPJyEf4DS8H6dMDMftVUjWk8i0pxHNOKdQBeKwtcdRrogk2pXB0SKHYI61SxPfVghoxzqTdcG X-Received: by 2002:a50:9dcd:: with SMTP id l13mr5534492edk.220.1614373335543; Fri, 26 Feb 2021 13:02:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614373335; cv=none; d=google.com; s=arc-20160816; b=mxmaRqD246bmiDyiU7YoeAiS9GmvEl3wGnocCVsDecxPi6P+XJ+ZY7yqqrRyzuMBwR uY56mG7z1N8DhsA6pgfjjhTJ8msbBgC1JdkseRh1zN4DYy8kAfY9ex9EK46C2BpotvTF TBsxEUjX0/xjmWdpG633CQ7Z3Ek8TdhJHAV7BW0XXYRFDDHlTdmkEHz5IaPmTs1TrdJE kIkZImtYXndiNKSRsda8JZBkc4t1IYiHcx1BZiC2I89UMxaYgXSYP2ItO53NVyULCAt+ VdsXlxkxTX7ej86Uw4PMYD46jDWGOKeS4fRNhxNeFFvS1UITvsnCpE8dDzxf+xQm7CQS It0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=zcQVf2L16DF1R7/sAErfdlf5yiNv7ZgQmWWZvBx0O7Y=; b=LfpzVLgwr7PFL/aoU9umutxR0STEuYllOx0WBMy4nrTx2LU3pdP/czy6VlTm0k+HmT P4X6z3YICgZbbfpz+5I0KVoZ8MtTywyPWFMo06vNG0MoutciMgVQds6z+e0lqRn+axnG cp3PShwvO0LIk+eYhuJmn1zUcKssFiWx5oUh6Ey5Az/ErzNFbnjBQ+YxFLBlqrStqSo5 w3oBbHZjOl5UHmRN0x3Op5uK5K4iyUyTAwKtMn/o2ACS5gY9z0qqg40LHGB4DfVv0yaQ B1q0r5HmmUUxu+p3CQXEg3q/Yho4AUaseglqJr6UbDl3zLc0KNTooZeaztRzJWQWzIkB Qunw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=i9J2NEsa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b10si6258974edx.575.2021.02.26.13.01.53; Fri, 26 Feb 2021 13:02:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=i9J2NEsa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230402AbhBZVAu (ORCPT + 99 others); Fri, 26 Feb 2021 16:00:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230185AbhBZVAr (ORCPT ); Fri, 26 Feb 2021 16:00:47 -0500 Received: from mail-ej1-x631.google.com (mail-ej1-x631.google.com [IPv6:2a00:1450:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59117C061574 for ; Fri, 26 Feb 2021 13:00:05 -0800 (PST) Received: by mail-ej1-x631.google.com with SMTP id r17so16962254ejy.13 for ; Fri, 26 Feb 2021 13:00:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=zcQVf2L16DF1R7/sAErfdlf5yiNv7ZgQmWWZvBx0O7Y=; b=i9J2NEsaXdVW+xwyoC4uKryJBgK9gHlQ7rn0quR2L+4Oc/C8oI27RfDLQZQo0NCs/g hnrG1r6gm7njrx0fVEjP15LRadpIIXN/fOV5yKa01mFTbo1gcBDGjhwbZ7ZOLVe9f3cs rSg899zbhjmj9QpjjBl4C7469jymgZ1pywhm0Yi9YbtPNPwZ2dPcm+SZW7JxR+if3mxW 2ovsyRLMabpzn+UfK4PuDGgo/cUZoppPCV+qLm/F+ETuObciepwGF7IQ1xtRVWFJLB+b KZEQJrNlzzrVi641tIumtAdQnvy83AqgijwIGL3hqkDEuk1oRuBS/V/pxBDajS2Srnrr 4QGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=zcQVf2L16DF1R7/sAErfdlf5yiNv7ZgQmWWZvBx0O7Y=; b=np4cXOFlP3TctT52zWBmHAGIJ+o3jOY4DMR7t60eztvWB6TipBm8LO1zu/Ri6wAyeB Hn4XZCp750ToJacIE6CM1/xZyt3ECqZkGrSwsuJiRtaw/0S/MgFsDsbF8+oTms7V6Y3z y8bTlUT6fpPyknODjha5MlKbCe/dEsn7gcEpI577uxlNVRAd5szqP71cFGlf3BS8MoYP huvTLAffJjr8Lm5WX00XC58YVMZbATGhRDardrv6jlA44F+vBzgBgZawczK/DEcqgndM B7oHgj4OLHCqdrqR8LYjJh6nwLA+GK38mtZpsadJG6Hy1W6wWbpRup/9hPCMutYiAfSe 2piA== X-Gm-Message-State: AOAM532c0aoT9to/5IfpYbDJ5jBg05NWVRxvklXI0cS+m8ETH35NZYz8 aKlPqruVNN7dJ/6kJ3vcqAdJL47CAnCAGMqKqIN8mA== X-Received: by 2002:a17:906:6088:: with SMTP id t8mr5436602ejj.323.1614373204093; Fri, 26 Feb 2021 13:00:04 -0800 (PST) MIME-Version: 1.0 References: <20210226002030.653855-1-ruansy.fnst@fujitsu.com> <20210226190454.GD7272@magnolia> <20210226205126.GX4662@dread.disaster.area> In-Reply-To: <20210226205126.GX4662@dread.disaster.area> From: Dan Williams Date: Fri, 26 Feb 2021 12:59:53 -0800 Message-ID: Subject: Re: Question about the "EXPERIMENTAL" tag for dax in XFS To: Dave Chinner Cc: "Darrick J. Wong" , "ruansy.fnst@fujitsu.com" , "linux-kernel@vger.kernel.org" , "linux-xfs@vger.kernel.org" , "linux-nvdimm@lists.01.org" , "linux-fsdevel@vger.kernel.org" , "darrick.wong@oracle.com" , "willy@infradead.org" , "jack@suse.cz" , "viro@zeniv.linux.org.uk" , "linux-btrfs@vger.kernel.org" , "ocfs2-devel@oss.oracle.com" , "hch@lst.de" , "rgoldwyn@suse.de" , "y-goto@fujitsu.com" , "qi.fuli@fujitsu.com" , "fnstml-iaas@cn.fujitsu.com" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 26, 2021 at 12:51 PM Dave Chinner wrote: > > On Fri, Feb 26, 2021 at 11:24:53AM -0800, Dan Williams wrote: > > On Fri, Feb 26, 2021 at 11:05 AM Darrick J. Wong wrote: > > > > > > On Fri, Feb 26, 2021 at 09:45:45AM +0000, ruansy.fnst@fujitsu.com wrote: > > > > Hi, guys > > > > > > > > Beside this patchset, I'd like to confirm something about the > > > > "EXPERIMENTAL" tag for dax in XFS. > > > > > > > > In XFS, the "EXPERIMENTAL" tag, which is reported in waring message > > > > when we mount a pmem device with dax option, has been existed for a > > > > while. It's a bit annoying when using fsdax feature. So, my initial > > > > intention was to remove this tag. And I started to find out and solve > > > > the problems which prevent it from being removed. > > > > > > > > As is talked before, there are 3 main problems. The first one is "dax > > > > semantics", which has been resolved. The rest two are "RMAP for > > > > fsdax" and "support dax reflink for filesystem", which I have been > > > > working on. > > > > > > > > > > > > > So, what I want to confirm is: does it means that we can remove the > > > > "EXPERIMENTAL" tag when the rest two problem are solved? > > > > > > Yes. I'd keep the experimental tag for a cycle or two to make sure that > > > nothing new pops up, but otherwise the two patchsets you've sent close > > > those two big remaining gaps. Thank you for working on this! > > > > > > > Or maybe there are other important problems need to be fixed before > > > > removing it? If there are, could you please show me that? > > > > > > That remains to be seen through QA/validation, but I think that's it. > > > > > > Granted, I still have to read through the two patchsets... > > > > I've been meaning to circle back here as well. > > > > My immediate concern is the issue Jason recently highlighted [1] with > > respect to invalidating all dax mappings when / if the device is > > ripped out from underneath the fs. I don't think that will collide > > with Ruan's implementation, but it does need new communication from > > driver to fs about removal events. > > > > [1]: http://lore.kernel.org/r/CAPcyv4i+PZhYZiePf2PaH0dT5jDfkmkDX-3usQy1fAhf6LPyfw@mail.gmail.com > > Oh, yay. > > The XFS shutdown code is centred around preventing new IO from being > issued - we don't actually do anything about DAX mappings because, > well, I don't think anyone on the filesystem side thought they had > to do anything special if pmem went away from under it. > > My understanding -was- that the pmem removal invalidates > all the ptes currently mapped into CPU page tables that point at > the dax device across the system. THe vmas that manage these > mappings are not really something the filesystem really manages, > but a function of the mm subsystem. What the filesystem cares about > is that it gets page faults triggered when a change of state occurs > so that it can remap the page to it's backing store correctly. > > IOWs, all the mm subsystem needs to when pmem goes away is clear the > CPU ptes, because then when then when userspace tries to access the > mapped DAX pages we get a new page fault. In processing the fault, the > filesystem will try to get direct access to the pmem from the block > device. This will get an ENODEV error from the block device because > because the backing store (pmem) has been unplugged and is no longer > there... > > AFAICT, as long as pmem removal invalidates all the active ptes that > point at the pmem being removed, the filesystem doesn't need to > care about device removal at all, DAX or no DAX... How would the pmem removal do that without walking all the active inodes in the fs at the time of shutdown and call unmap_mapping_range(inode->i_mapping, 0, 0, 1)? The core-mm does tear down the ptes in the direct map, but user mappings to pmem are not afaics in xfs_do_force_shutdown().