Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp3333938rwb; Mon, 19 Sep 2022 20:04:54 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6s/U1q/bTuB1Dmrvo0GSvx4kjEphKkdWCScEIWvGwgvjK1XltCux5twYoCaBwfusSqWMRA X-Received: by 2002:a62:6347:0:b0:531:c5a7:b209 with SMTP id x68-20020a626347000000b00531c5a7b209mr22003208pfb.60.1663643093942; Mon, 19 Sep 2022 20:04:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663643093; cv=none; d=google.com; s=arc-20160816; b=OLWo8mwcZ9YtgFlNZDkBBmHws2I52/IJCrT+N1NUfFfG6XxkVaqI6vT4sDHmeycLjh bbJoIcxMCqI3XBfmymZNqDld6GdccR4TTtavk0fjIrO2omqtdVHi6d+/M0VQT0YKqqeV 84lUgdRtNo8Tzfs2csVnbhqpJ4jvMDRQTiMPYgKduOZATPpRRZZjK9sdwrf1d5dB5ICg 8vRdMW28ZPkqYjv5z9fHfSJWDt25e4yyt6pp/7HMypGRJomwkoFna6lpWi0oMTPtOBLY Qx1DKPUY8Jf91sFE2xMDIs23hB64GR4IYVw98FhKmUJFiG1eTkOmlbWL+sZrfQZ4MRgK DlvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=QN4xU8j7kdx/pQ/DU++cj52c/2hyJuMEnipjZxIeVS0=; b=YcdEIk5P2XWJoifxZXNIcsHk6kreYjo3BihYw0i9M7Q+E+TqZ3cDavs+NwHTdJWPrN FNZnlf7V5jx3K6eQQboOdBwGe21vv0HUJOLaGMH9mBl75erQrQM9co/eu0uTMKea+CXg QoNdsMr0n5Mp8n+619GLOo8K4qQTYsaUhk8jj2yoKC99/nD5mzJs1BGuuHtuX/dH0o3B G3DDlPoUUhAkO3OKSh6TyqOchjpnn8XUkDmXxw2fkfV57Lt7o6vwa/4ujI2YnMh7Tu+u 259Vf7ObBQCCfWZI1lMmA1uee579JRcbQ+sXRki268Zd08l2ChETSRdrSBsIHYmI85TZ A4GQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u18-20020a170903125200b00178a9b343f5si396865plh.16.2022.09.19.20.04.42; Mon, 19 Sep 2022 20:04:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230094AbiITCp3 (ORCPT + 99 others); Mon, 19 Sep 2022 22:45:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230055AbiITCpZ (ORCPT ); Mon, 19 Sep 2022 22:45:25 -0400 Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C56A8491E0; Mon, 19 Sep 2022 19:45:23 -0700 (PDT) Received: from dread.disaster.area (pa49-180-183-60.pa.nsw.optusnet.com.au [49.180.183.60]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id A23661100975; Tue, 20 Sep 2022 12:45:21 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1oaTGB-009qB6-UQ; Tue, 20 Sep 2022 12:45:19 +1000 Date: Tue, 20 Sep 2022 12:45:19 +1000 From: Dave Chinner To: Shiyang Ruan Cc: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, nvdimm@lists.linux.dev, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, djwong@kernel.org, dan.j.williams@intel.com, hch@infradead.org, jane.chu@oracle.com Subject: Re: [PATCH 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind Message-ID: <20220920024519.GQ3600936@dread.disaster.area> References: <9e9521a4-6e07-e226-2814-b78a2451656b@fujitsu.com> <1662114961-66-1-git-send-email-ruansy.fnst@fujitsu.com> <1662114961-66-4-git-send-email-ruansy.fnst@fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1662114961-66-4-git-send-email-ruansy.fnst@fujitsu.com> X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=e9dl9Yl/ c=1 sm=1 tr=0 ts=63292942 a=mj5ET7k2jFntY++HerHxfg==:117 a=mj5ET7k2jFntY++HerHxfg==:17 a=kj9zAlcOel0A:10 a=xOM3xZuef0cA:10 a=VwQbUJbxAAAA:8 a=QyXUC8HyAAAA:8 a=omOdbC7AAAAA:8 a=7-415B0cAAAA:8 a=_WoWtd2e78iMft2-j4wA:9 a=CjuIK1q_8ugA:10 a=AjGcO6oz07-iQ99wixmX:22 a=biEYGPWJfzWAr4FL6Ov7:22 X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 02, 2022 at 10:36:01AM +0000, Shiyang Ruan wrote: > This patch is inspired by Dan's "mm, dax, pmem: Introduce > dev_pagemap_failure()"[1]. With the help of dax_holder and > ->notify_failure() mechanism, the pmem driver is able to ask filesystem > (or mapped device) on it to unmap all files in use and notify processes > who are using those files. > > Call trace: > trigger unbind > -> unbind_store() > -> ... (skip) > -> devres_release_all() # was pmem driver ->remove() in v1 > -> kill_dax() > -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE) > -> xfs_dax_notify_failure() > > Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove > event. So do not shutdown filesystem directly if something not > supported, or if failure range includes metadata area. Make sure all > files and processes are handled correctly. > > [1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/ > > Signed-off-by: Shiyang Ruan > --- > drivers/dax/super.c | 3 ++- > fs/xfs/xfs_notify_failure.c | 23 +++++++++++++++++++++++ > include/linux/mm.h | 1 + > 3 files changed, 26 insertions(+), 1 deletion(-) > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c > index 9b5e2a5eb0ae..cf9a64563fbe 100644 > --- a/drivers/dax/super.c > +++ b/drivers/dax/super.c > @@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev) > return; > > if (dax_dev->holder_data != NULL) > - dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0); > + dax_holder_notify_failure(dax_dev, 0, U64_MAX, > + MF_MEM_PRE_REMOVE); > > clear_bit(DAXDEV_ALIVE, &dax_dev->flags); > synchronize_srcu(&dax_srcu); > diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c > index 3830f908e215..5e04ba7fa403 100644 > --- a/fs/xfs/xfs_notify_failure.c > +++ b/fs/xfs/xfs_notify_failure.c > @@ -22,6 +22,7 @@ > > #include > #include > +#include > > struct xfs_failure_info { > xfs_agblock_t startblock; > @@ -77,6 +78,9 @@ xfs_dax_failure_fn( > > if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) || > (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) { > + /* The device is about to be removed. Not a really failure. */ > + if (notify->mf_flags & MF_MEM_PRE_REMOVE) > + return 0; > notify->want_shutdown = true; > return 0; > } > @@ -182,12 +186,23 @@ xfs_dax_notify_failure( > struct xfs_mount *mp = dax_holder(dax_dev); > u64 ddev_start; > u64 ddev_end; > + int error; > > if (!(mp->m_super->s_flags & SB_BORN)) { > xfs_warn(mp, "filesystem is not ready for notify_failure()!"); > return -EIO; > } > > + if (mf_flags & MF_MEM_PRE_REMOVE) { > + xfs_info(mp, "device is about to be removed!"); > + down_write(&mp->m_super->s_umount); > + error = sync_filesystem(mp->m_super); > + drop_pagecache_sb(mp->m_super, NULL); > + up_write(&mp->m_super->s_umount); > + if (error) > + return error; If the device is about to go away unexpectedly, shouldn't this shut down the filesystem after syncing it here? If the filesystem has been shut down, then everything will fail before removal finally triggers, and the act of unmounting the filesystem post device removal will clean up the page cache and all the other caches. IOWs, I don't understand why the page cache is considered special here (as opposed to, say, the inode or dentry caches), nor why we aren't shutting down the filesystem directly after syncing it to disk to ensure that we don't end up with applications losing data as a result of racing with the removal.... Cheers, Dave. -- Dave Chinner david@fromorbit.com