Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp812207pxp; Fri, 11 Mar 2022 15:50:17 -0800 (PST) X-Google-Smtp-Source: ABdhPJzDJS8naBenVH7Z/Jpg34hKRlrP2qmQ1rl863PDFgRvvKlZ06mrpop5298eIJsyxfDTVo3x X-Received: by 2002:a17:902:e748:b0:14f:969b:f6b6 with SMTP id p8-20020a170902e74800b0014f969bf6b6mr12452008plf.15.1647042617533; Fri, 11 Mar 2022 15:50:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1647042617; cv=none; d=google.com; s=arc-20160816; b=yDbIJqWu/F0nqzQ3kYxZdlI/n9h+MZYv5w+bDA7j/h0dCvnXDcl8DgoJl2buqQ0wEt anWMzjrWfqAST0s0ziWnuYecNZL5ZlnFKO7sky8LXvDRGHwpp5ACMA9WBdNxknTOUFPT Nt4KtccmKQtwyNvxITFwsE3K0vHofXxCzZQf0EAOeTfk4bFUiQi0C6iFKhImd4IXCnyh vLcLVFNACpWS9TP3B5n9wrEBwm9vNFae8DJmB4Q0Y15LLrYKsSJJUzZdA5Qld8qr6ytG R9TliJllgRpmuQGR3RXqMkw1SmyCdkgMfChKj03dVjGNJpeVZRFTcveftO+mmpyr8VXh 0Zrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=13t7T+1H2gbtDRvuwDVr1bm3a+2J2xIzpet95fdBuh4=; b=ufMWJlYa0QcyBYMUTWlFSXH/WGMY/KEYbcyk9gur8n26ZCNk2vV4HKB2ruufe8ZLfX De4CqdHjm7YftL8aVuOZIADxQzxLYhaVmZkEkIZfv15B5axZzQ57F8YEvOrk2DASAYKu 8zIjbKIOKtlaYdlyjQIDYFvp3XnYBg/IPNxJ5WoxxKroaqTTNYeKya0bJg5ny53yYfUi Qc+caCTWi6mFWMJjV8fCNc6kRYy14zOc3aYs+YelEqLHVbEsEPgqdISCSGmwmJu3Yy2s rbJGRZKu4a9lGyVKEIRDLAmd2cpMGhQ0zfSNkD/0OVGXS4cltGAuPbXS+C80PWaZqFa/ LgjQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b=RIgXQdVf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 143-20020a630095000000b0037ffe05b8b8si9297005pga.772.2022.03.11.15.50.04; Fri, 11 Mar 2022 15:50:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b=RIgXQdVf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229685AbiCKXgb (ORCPT + 99 others); Fri, 11 Mar 2022 18:36:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229509AbiCKXga (ORCPT ); Fri, 11 Mar 2022 18:36:30 -0500 Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62F7AF1EA9 for ; Fri, 11 Mar 2022 15:35:25 -0800 (PST) Received: by mail-pg1-x52c.google.com with SMTP id z4so8665274pgh.12 for ; Fri, 11 Mar 2022 15:35:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=13t7T+1H2gbtDRvuwDVr1bm3a+2J2xIzpet95fdBuh4=; b=RIgXQdVfVb6lODawzvdTxr4kAtwLPsRjREsyVUC9K9FXvakpJNymy17e6FkbZx5Gvv LeDcUKOKw2VoVa83HdzrEPE+n4lq8xuCULunoupKYxugg5JDuOq5trFf3fm42gMytdUt ERkjzkjwpAU0zoJJj8eO/F2dpsyBXR6QjMjtDGcOsDfDr2CJipG41gyKlmh4HlNGNwWJ vwkcU6O57nxaJWLRTl+0Hya9nMI7GHfWxpWRkjw+wUNIj2S8tgExKh9huQ4lWfIYS5K0 A8C1eX/+5cBiho3K4ItispCSGtWUy9zYfhJ3aJCa4TY2pJ8nVxCb77Wvrz7OmBeIoOjA 1QIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=13t7T+1H2gbtDRvuwDVr1bm3a+2J2xIzpet95fdBuh4=; b=8FTGG9gC7rKLF4aeFyWvE2v35GIDhVI1RfUXZ9v53LUbV4re47YY8V8VZlO/4GvgMR PseZ8L6xL0TY1Vk0NIbR6jsH0Mu0Qhu4r7wOR+YfYLNw6d49Mb3I+eL5HuP4//Xz3sQa I2S71UgzZN5Kj0OuydJWSBW4w+MMGKHdUVi3/bA70VSzNg1AEm3m4DXlQ6ALKpTFsd8I IvWMXFhlVXBWbdoubhsuT5cY3RgYdyb4Q7K2ccHhLrGiHfCkmTIuxYSd0yf6PfvorY4P 5kce93fN7bxqJvps9WAYL5CeUeUN2vDtnTIxmHemUchlDY/Vf4QbwRQflGkkzKf7AQOa 7MOg== X-Gm-Message-State: AOAM532LWltvtTPqBPjjmiVk/3YIqx7axtDqKB8RiIhqt9fdg5Ad5tM8 xDOXxWvawX9V1xavbHyvzhjwcEIfZCNPKkGLUIsJFw== X-Received: by 2002:a62:1481:0:b0:4f6:38c0:ed08 with SMTP id 123-20020a621481000000b004f638c0ed08mr12796868pfu.86.1647041724808; Fri, 11 Mar 2022 15:35:24 -0800 (PST) MIME-Version: 1.0 References: <20220227120747.711169-1-ruansy.fnst@fujitsu.com> <20220227120747.711169-2-ruansy.fnst@fujitsu.com> In-Reply-To: <20220227120747.711169-2-ruansy.fnst@fujitsu.com> From: Dan Williams Date: Fri, 11 Mar 2022 15:35:13 -0800 Message-ID: Subject: Re: [PATCH v11 1/8] dax: Introduce holder for dax_device To: Shiyang Ruan Cc: Linux Kernel Mailing List , linux-xfs , Linux NVDIMM , Linux MM , linux-fsdevel , "Darrick J. Wong" , david , Christoph Hellwig , Jane Chu Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Feb 27, 2022 at 4:08 AM Shiyang Ruan wrote: > > To easily track filesystem from a pmem device, we introduce a holder for > dax_device structure, and also its operation. This holder is used to > remember who is using this dax_device: > - When it is the backend of a filesystem, the holder will be the > instance of this filesystem. > - When this pmem device is one of the targets in a mapped device, the > holder will be this mapped device. In this case, the mapped device > has its own dax_device and it will follow the first rule. So that we > can finally track to the filesystem we needed. > > The holder and holder_ops will be set when filesystem is being mounted, > or an target device is being activated. > > Signed-off-by: Shiyang Ruan > --- > drivers/dax/super.c | 89 +++++++++++++++++++++++++++++++++++++++++++++ > include/linux/dax.h | 32 ++++++++++++++++ > 2 files changed, 121 insertions(+) > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c > index e3029389d809..da5798e19d57 100644 > --- a/drivers/dax/super.c > +++ b/drivers/dax/super.c > @@ -21,6 +21,9 @@ > * @cdev: optional character interface for "device dax" > * @private: dax driver private data > * @flags: state and boolean properties > + * @ops: operations for dax_device > + * @holder_data: holder of a dax_device: could be filesystem or mapped device > + * @holder_ops: operations for the inner holder > */ > struct dax_device { > struct inode inode; > @@ -28,6 +31,8 @@ struct dax_device { > void *private; > unsigned long flags; > const struct dax_operations *ops; > + void *holder_data; > + const struct dax_holder_operations *holder_ops; > }; > > static dev_t dax_devt; > @@ -193,6 +198,29 @@ int dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff, > } > EXPORT_SYMBOL_GPL(dax_zero_page_range); > > +int dax_holder_notify_failure(struct dax_device *dax_dev, u64 off, > + u64 len, int mf_flags) > +{ > + int rc, id; > + > + id = dax_read_lock(); > + if (!dax_alive(dax_dev)) { > + rc = -ENXIO; > + goto out; > + } > + > + if (!dax_dev->holder_ops) { > + rc = -EOPNOTSUPP; I think it is ok to return success (0) for this case. All the caller of dax_holder_notify_failure() wants to know is if the notification was successfully delivered to the holder. If there is no holder present then there is nothing to report. This is minor enough for me to fix up locally if nothing else needs to be changed. > + goto out; > + } > + > + rc = dax_dev->holder_ops->notify_failure(dax_dev, off, len, mf_flags); > +out: > + dax_read_unlock(id); > + return rc; > +} > +EXPORT_SYMBOL_GPL(dax_holder_notify_failure); > + > #ifdef CONFIG_ARCH_HAS_PMEM_API > void arch_wb_cache_pmem(void *addr, size_t size); > void dax_flush(struct dax_device *dax_dev, void *addr, size_t size) > @@ -268,6 +296,10 @@ void kill_dax(struct dax_device *dax_dev) > > clear_bit(DAXDEV_ALIVE, &dax_dev->flags); > synchronize_srcu(&dax_srcu); > + > + /* clear holder data */ > + dax_dev->holder_ops = NULL; > + dax_dev->holder_data = NULL; Isn't this another failure scenario? If kill_dax() is called while a holder is still holding the dax_device that seems to be another ->notify_failure scenario to tell the holder that the device is going away and the holder has not released the device yet. > } > EXPORT_SYMBOL_GPL(kill_dax); > > @@ -409,6 +441,63 @@ void put_dax(struct dax_device *dax_dev) > } > EXPORT_SYMBOL_GPL(put_dax); > > +/** > + * dax_holder() - obtain the holder of a dax device > + * @dax_dev: a dax_device instance > + > + * Return: the holder's data which represents the holder if registered, > + * otherwize NULL. > + */ > +void *dax_holder(struct dax_device *dax_dev) > +{ > + if (!dax_alive(dax_dev)) > + return NULL; It's safe for the holder to assume that it can de-reference ->holder_data freely in its notify_handler callback because dax_holder_notify_failure() arranges for the callback to run in dax_read_lock() context. This is another minor detail that I can fixup locally. > + > + return dax_dev->holder_data; > +} > +EXPORT_SYMBOL_GPL(dax_holder); > + > +/** > + * dax_register_holder() - register a holder to a dax device > + * @dax_dev: a dax_device instance > + * @holder: a pointer to a holder's data which represents the holder > + * @ops: operations of this holder > + > + * Return: negative errno if an error occurs, otherwise 0. > + */ > +int dax_register_holder(struct dax_device *dax_dev, void *holder, > + const struct dax_holder_operations *ops) > +{ > + if (!dax_alive(dax_dev)) > + return -ENXIO; > + > + if (cmpxchg(&dax_dev->holder_data, NULL, holder)) > + return -EBUSY; > + > + dax_dev->holder_ops = ops; > + return 0; > +} > +EXPORT_SYMBOL_GPL(dax_register_holder); > + > +/** > + * dax_unregister_holder() - unregister the holder for a dax device > + * @dax_dev: a dax_device instance > + * @holder: the holder to be unregistered > + * > + * Return: negative errno if an error occurs, otherwise 0. > + */ > +int dax_unregister_holder(struct dax_device *dax_dev, void *holder) > +{ > + if (!dax_alive(dax_dev)) > + return -ENXIO; > + > + if (cmpxchg(&dax_dev->holder_data, holder, NULL) != holder) > + return -EBUSY; > + dax_dev->holder_ops = NULL; > + return 0; > +} > +EXPORT_SYMBOL_GPL(dax_unregister_holder); > + > /** > * inode_dax: convert a public inode into its dax_dev > * @inode: An inode with i_cdev pointing to a dax_dev > diff --git a/include/linux/dax.h b/include/linux/dax.h > index 9fc5f99a0ae2..262d7bad131a 100644 > --- a/include/linux/dax.h > +++ b/include/linux/dax.h > @@ -32,8 +32,24 @@ struct dax_operations { > int (*zero_page_range)(struct dax_device *, pgoff_t, size_t); > }; > > +struct dax_holder_operations { > + /* > + * notify_failure - notify memory failure into inner holder device > + * @dax_dev: the dax device which contains the holder > + * @offset: offset on this dax device where memory failure occurs > + * @len: length of this memory failure event Forgive me if this has been discussed before, but since dax_operations are in terms of pgoff and nr pages and memory_failure() is in terms of pfns what was the rationale for making the function signature byte based? I want to get this series merged into linux-next shortly after v5.18-rc1. Then we can start working on incremental fixups rather resending the full series with these long reply cycles.